Skip to content
bstefanescu edited this page May 7, 2011 · 23 revisions

Now that we've seen how ECR runtime works, how bundles are wired and how components can be declared we can start talking about the core part and the most important one - the content repository.

Motivation

Why needing a content repository? Can't we just use a database to store our data inside? Yes, but ... in almost all content based applications you need not only to store raw structured data - but you also need semantics for access control on your data, versioning and other stuff not implemented by databases. Also you don't want to spend your time on re-inventing the weel and re-implementing things like versioning, access control, database abstraction and optimization. Many programmers are tempted to start from zero and create their logic and re-implenting all the stuff around a content repository but this is useless - and now that standards like CMIS are around - try to use them!

The content repository allows you to store, version, protect and search your data. The data you store is structured - even if you only want to store a binary file you must specify some common properties like the name (or title), an optional description, the content type of the file, an optional ACL for protecting your data etc. Thus, in ECR you can store anything - from binary files to structured data containing simple or complex properties, text or any other stuff you need.

Note that ECR provides a CMIS bridge to access ECR repositories using CMIS semantics.

ECR Documents

The data is stored in ECR as an unit called a document. A document always have a type - the document type, a set of properties (that can be scalar properties - like strings, dates, numbers or complex properties - like maps, lists). You can also attach binary files - that are stored in the document as special properties called blob properties. Also, documents access can be protected by adding an ACL.

Document hierarchy

Documents are stored in the repository in a hierarchical way - thus any document have a parent document. The root document is the only document that doesn't have a parent. This is a special document that is created the first time the repository is initialized. You cannot remove it.

Also each document has an Unique Identifier and a name. The name is a sort of local ID and it is used to identify a document inside it parents (like file names in a file system). So, the name is always unique inside the document parent.

Note that documents cannot have multiple parents - but ECR provides a way to create document links so you can put a reference to a document in another parent.

We will now discuss about each feature related to an ECR document.

Document type

A document type is defining how a document is structured and what are its capabilities. Document types can be extended to create new types that inherit the parent type structure and capabilities.

The document type structure is defined by using document schemas. A document type may have multiple schemas. This approach is letting you reusing the schema definitions between document types. Instead of re-defining each time the same properties that belongs to a same use case - you can group these properties in logical units - schemas - and then reuse them in your document types.

Example

I will take a simple example to illustrate how document schemas can be reused. Let say you want to store in the same repository two type of documents: photos and books.

For the photo document type you want to provide the following informations:

  1. a title
  2. a description
  3. the author
  4. the place were the photo was taken
  5. the format of the attached image.
  6. the attached image itself
  7. and some other photo related properties.

For the boot document type you want to have:

  1. a title
  2. a description
  3. the author
  4. the place were the book was written
  5. the format of the attached book file (PDF etc.)
  6. and some other book related properties

You can see that the first 4 properties are present in both the photo and the book type. So to not waste you time on redefining the type of the properties you can simply create 4 different schemas: a common schema that groups the first 4 properties, a file schema that contain the property for the attached file, a schema for photos specific properties, and another one for specific book properties.

Built-in schemas

Because many type of documents make use of the same properties (like in our example title, description, author etc.) ECR is already providing some common schemas that can reuse when you are defining your document types.

Here is a list of some of these schemas:

  • dublincore schema - see http://dublincore.org
  • file schema - for attaching a blob property
  • files schema - for attaching a list of blobs

The dublincore schema is one of the most important schema since almost all document types may use it.

Note that the dublincore schema provided by ECR only contains a subset of the standard dublincore schema.

Document schemas

Document facets

Document properties

Scalar properties

Complex properties

Blob properties

Access control - protecting your documents

ACP - Access Control Policy

ACL - Access Control List

ACE - Access Control Entry

ACP Inheritance

Document versioning

Document search - NXQL Queries

Document links - Publishing

Clone this wiki locally