Skip to content

Loading…

Dublin Core feature example missing #128

Closed
martincastell opened this Issue · 7 comments

2 participants

@martincastell

I'm trying to take advantage of the dublin core functionality, but there isn't an example for this in the wiki, the docs say:

All Dublin Core attributes are indexed and stored if detected in the attachment.

What does this means? Should we add the attributes to the attachment in the db like:

{
   "_id": "0069be6d-98ac-4d91-818e-4f5d40e6a9ec",
   "_rev": "1-6f7371307ea1b3bcdd1b4a5205f4ab9f",
   "attachmentId": "3fbf75ca-9d45-4aa5-96eb-0bc56e1f650b",
   "name": "all_star.png",
   "type": "document",
   "_attachments": {
       "3fbf75ca-9d45-4aa5-96eb-0bc56e1f650b": {
           "content_type": "image/png",
           "revpos": 1,
           "length": 112157,
           "stub": true,
           "_dc.creator": "Converse"
       }
   }
}

Or in the index function like (assuming the creator attribute exists in the couchdb document):

if (doc._attachments) {
    for (var i in doc._attachments) {
        ret.attachment('default', i);
        ret.add(doc.creator, { 'field': '_dc.creator');
    }

And once they are indexed and stored, how can you query by them?

Thank you very much, and sorry if this doesn't belong here.

@rnewson
Owner

The document in couchdb is not altered by couchdb-lucene. The additional fields are in the Lucene index only. Please reopen if you find that is not the case. :)

@rnewson rnewson closed this
@martincastell

The question is basically how do you get couchdb-lucene to put those additional fields in the index and once in the index how do you query by them?
For example, if you try this you get an error:
http://localhost:5984/database/_fti/_design/foo/by_subject?q=_dc.creator:Converse

@rnewson
Owner

couchdb-lucene will add it if Tika detects metadata from your attachments.

What error do you get for that query?

@martincastell

I'm sorry for not making myself clear, I really appreciate you taking the time to answer my questions.

How should I format/create my attachment in order to have the couch db values in it, I tried this: http://dublincore.org/documents/dcq-html/ but it didn't work. Could you please provide a sample attachment?

The error goes like this:

{
reason: "Bad query syntax: Cannot parse '_dc.title:Expressing Dublin Core': Field '_dc.title' not recognized."
code: 400
}

I tried using other fields that don't exist like:
http://localhost:5984/database/_fti/_design/foo/by_subject?q=foo:Converse
And I get an empty result set as usual.

@rnewson
Owner

I see where the confusion is. You don't specify this yourself at all. Apache Tika will detect this data when it indexes your attachments. If it finds a 'title' in a PDF, say, it will add _dc.title for you. It can then be searched as normal, unless there's a bug. This is probably the least used feature of couchdb-lucene, even though I think it's really cool.

@rnewson
Owner

Indeed, _ and . were not allowed in field names, so the _dc.title wasn't searchable. Fixed now.

@martincastell

Thank you very much, regards.

@mmm444 mmm444 pushed a commit to mmm444/couchdb-lucene that referenced this issue
Robert Newson allow _ and . in field names. (closes #128) 9460025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.