Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with GridFS features [DATAMONGO-6] #944

Closed
spring-projects-issues opened this issue Dec 1, 2010 · 6 comments
Closed

Integration with GridFS features [DATAMONGO-6] #944

spring-projects-issues opened this issue Dec 1, 2010 · 6 comments
Assignees
Labels
in: core Issues in core support type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link

Mark Pollack opened DATAMONGO-6 and commented

Provide Spring Resource abstraction for GridFS.
Helper class around common use cases.

File upload servlet?


Referenced from: commits 572c4bb

8 votes, 9 watchers

@spring-projects-issues
Copy link
Author

Matthias Scudlik commented

fast content search would be great

@spring-projects-issues
Copy link
Author

Mark Pollack commented

can you elaborate a bit? Something like this http://nosql.mypopescu.com/post/383437318/integrating-mongodb-with-solr ?

I believe that mongodb itself will be offering some text search feature in the future but that might not be around gridfs....

@spring-projects-issues
Copy link
Author

Matthias Scudlik commented

Since GridFS is dealing with binary data it certainly makes not sense for all kind of data.

I suggest that there should be standard functionality for text files (text, xml, html,..)
and the possibility to add "adapters" for custom data.

This should probably be done by indexing the binary content. The "adapter" should be
able to add custom indexes. For example if you have a pdf document you could implement a
custom adapter that opens the pdf and adds the text of the pdf to the index.

On the other hand you could also have an zip archive that has some content you are looking for.
For the standard functionality the mimetype (GridFS is aware of that) should be enough (text, xml,..) to be able to determine how the index should be created.

For zip files the mimetype is not enough. Imagine you have different kinds of zip archives.
One may contain images, another may have word or openoffice documents or even a mix.
So my idea is that you can add such a custom adapter depending on the mimetype and a filepattern.

iamges_001.zip -> dont index
documents_001.zip -> extract zip file, add index for pdf documents.

Something like that. I think is is a common usecase and it would be really great to have that.

Using Lucene i would create that index in order to retrieve the id of the file/data and then get that file/data. But i didn't evaluate which technologie i would use for that and actually i haven't use
Lucene. As far as i know Solr is a full text search server and i would not recommend that because in my opinion a server is not a must have.

I hope this was understandable

@spring-projects-issues
Copy link
Author

Matthias Scudlik commented

I just had another idea that would be nice to have, but is probably hard: MimeType Sniffing for InputStreams that you can store

@spring-projects-issues
Copy link
Author

Oliver Drotbohm commented

Current draft is at this GitHub branch. Feedback is welcome!

@spring-projects-issues
Copy link
Author

Oliver Drotbohm commented

Just merged the initial draft into the master branch and deployed a snapshot build. No integration with any indexing support yet. We might want to create a separate ticket for that if there's demand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core support type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

2 participants