Clever, Crafty Content Profiling of Objects
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
c3po-api
c3po-cmd
c3po-core
c3po-webapi
format
scripts
.gitignore
.travis.yml
CHANGELOG.html
LICENSE
README.md
pom.xml

README.md

C3PO

Build Status

Clever, Crafty, Content Profiling of Objects (c3po) is a software tool, which uses meta data extracted from files of a digital collection as input to generate a profile of the content set. It is designed in a way so that different meta data formats originating from different tools can be easily integrated. Currently it supports FITS meta data and Apache TIKA meta data.

The tool follows a three part profiling process and provides facilities for data export and further analysis of the content, such as helpful visualisations of the meta data characteristics, partitioning of the collection into homogeneous sets based on any known characteristic. For each chosen partition of the content, a special machine-readable profile can be generated that contains aggregations and distributions for many of the properties. The profile optionally contains the set of chosen sample objects that are representative.

Releases

Please refer to BinTray

Setup

Please refer to the Usage Guide.

Development

Please refer to the Dev Guide.

Screenshot

Collection Overview

More Information

You can find more information in the following links:

Road Map

  • consolidate based on resource name
  • bundle optional! FITS execution in c3po (to make it easier for demo purposes)
  • create a consistent REST API
  • refactor the web app to use the new REST API and the new core
  • read data from memory instead of file system and allow adaptors to skip the memory read
  • make use of a controlled vocabulary for properties. If nothing better exists, then use FITS as default.
  • implement HBASE backend
  • ...
  • scale to half a billion objects