Clever, Crafty, Content Profiling of Objects (c3po) is a software tool, which uses meta data extracted from files of a digital collection as input to generate a profile of the content set. It is designed in a way so that different meta data formats originating from different tools can be easily integrated. Currently it supports FITS meta data and Apache TIKA meta data.
The tool follows a three part profiling process and provides facilities for data export and further analysis of the content, such as helpful visualisations of the meta data characteristics, partitioning of the collection into homogeneous sets based on any known characteristic. For each chosen partition of the content, a special machine-readable profile can be generated that contains aggregations and distributions for many of the properties. The profile optionally contains the set of chosen sample objects that are representative.
Please refer to BinTray
Please refer to the Usage Guide.
Please refer to the Dev Guide.
You can find more information in the following links:
- consolidate based on resource name
- bundle optional! FITS execution in c3po (to make it easier for demo purposes)
- create a consistent REST API
- refactor the web app to use the new REST API and the new core
- read data from memory instead of file system and allow adaptors to skip the memory read
- make use of a controlled vocabulary for properties. If nothing better exists, then use FITS as default.
- implement HBASE backend
- scale to half a billion objects