Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Kill the jcr type system #588
Instead of adding type information to the actual String value, what are your thoughts on storing that information as JCR.node properties?
Let me get this right: fcrepo4 needs to provide for a repository administrator who has no reliable search index (and no way to generate one), and needs to do a keyword search against everything (with very uncertain relevancy and recall) to find a single object?
Just for fun, I took the example above and ran it against our (somewhat tuned) search index and came up with 250 results (out of 500k items). I would not want to search through that set to find my item, and this is a best-case scenario for the repository (presumably, an fcr:search feature couldn't operate an object-level granularity, and would have to return nodes anywhere in the tree).
So far, I think I've heard two proponents of a baked-in repository search (AIC and UVa), and I don't think a keyword search satisfies either of them.
If the user is in the hydra community (and, probably, Islandora as well?), those frameworks provide sufficiently reliable indexing strategies, simple ways to reindex the world, and discovery that actually has the potential to return relevant results at the correct granularity. If we're talking about supporting users without similar frameworks in place, I'd urge us to either enhance the built-in external indexing functionality or postpone the feature until after 4.0.
I've also heard a third use case, for a new repository user (working against fcrepo4 directly, for some reason) who puts some items in and wants to find them again. I'd argue, once that user gets beyond the realm of trolling the haystack, they're best served by being introduced to an external search index.
And, maybe this reveals a pragmatic reason I'm antagonistic of this feature.. it seems to have an ill-defined user base, an ill-defined response ("whatever our storage layer feels like"?), and an implementation we want to actively discourage people from using. Maybe it belongs in an extension, maybe somewhere in core (post-launch), but I'd hate to develop a feature for 4.0 that we have to drag forward in the future.
I've been thinking about this, and I can't come up with a reasonable use case for internal search. I agree with @cbeer's assessement that a typical user query would be much better served by an external search service (be that Solr/Elastic or a triplestore) with a well-defined query syntax, API, etc. Most of the other discovery scenarios would be better handled by OAI-PMH, ResourceSync, or just walking through the entire repository.
So I'm left with a vague unease that boils down to feeling like internal search is something a repository should have, or feeling like ripping out features at the 11th hour will surprise and bother people. I heard a comment to that effect recently when the sitemap was removed (because the Hydra global-reindexing was using it). So on balance, I don't think any of this is a good reason to keep internal search. But I think going forward, a living, breathing customer who is willing to do acceptance testing should be a requirement for new features.