Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill the jcr type system #588

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
4 participants
@cbeer
Copy link
Member

commented Oct 24, 2014

No description provided.

@cbeer cbeer force-pushed the cbeer-rdf-types branch from 432159f to 666e7c3 Oct 24, 2014

@cbeer cbeer force-pushed the kill-the-jcr-type-system branch from 4d84078 to 492a390 Oct 24, 2014

@cbeer cbeer force-pushed the cbeer-rdf-types branch from 666e7c3 to adc2677 Oct 24, 2014

@cbeer cbeer force-pushed the kill-the-jcr-type-system branch from 492a390 to 69abd97 Oct 24, 2014

@awoods

This comment has been minimized.

Instead of adding type information to the actual String value, what are your thoughts on storing that information as JCR.node properties?

This comment has been minimized.

Copy link
Contributor

replied Oct 26, 2014

The pattern to which we get pushed is for every literal to be stored as a node with properties {type, lang, lexical value}. This would be kind of a pain, but also pretty flexible and uniform.

This comment has been minimized.

Copy link
Member

replied Oct 26, 2014

Exactly. Also, not appending the actual value with type information would allow fcr:sparql to still function, no?

This comment has been minimized.

Copy link
Contributor

replied Oct 26, 2014

No, it's similar issues. Either queries that involve literal types wouldn't work or we would have to work hard to make them work. For example. It still might be the best option.

This comment has been minimized.

Copy link
Member

replied Oct 26, 2014

Thanks for the clarification, @ajs6f

This comment has been minimized.

Copy link
Member Author

replied Oct 26, 2014

You can see what is in the repository by just looking at what is in the repository, no search feature needed. With the hierarchy in Fedora 4, every resource is somehow linked to from the root object.

This comment has been minimized.

Copy link
Member

replied Oct 26, 2014

Indeed, however, that is a bit different than "search". That is more like "troll the haystack".
I am trying to pare this down as much as I absolutely can. We need a minimal search capability.

This comment has been minimized.

Copy link
Member Author

replied Oct 27, 2014

Let me get this right: fcrepo4 needs to provide for a repository administrator who has no reliable search index (and no way to generate one), and needs to do a keyword search against everything (with very uncertain relevancy and recall) to find a single object?

Just for fun, I took the example above and ran it against our (somewhat tuned) search index and came up with 250 results (out of 500k items). I would not want to search through that set to find my item, and this is a best-case scenario for the repository (presumably, an fcr:search feature couldn't operate an object-level granularity, and would have to return nodes anywhere in the tree).

So far, I think I've heard two proponents of a baked-in repository search (AIC and UVa), and I don't think a keyword search satisfies either of them.

If the user is in the hydra community (and, probably, Islandora as well?), those frameworks provide sufficiently reliable indexing strategies, simple ways to reindex the world, and discovery that actually has the potential to return relevant results at the correct granularity. If we're talking about supporting users without similar frameworks in place, I'd urge us to either enhance the built-in external indexing functionality or postpone the feature until after 4.0.

I've also heard a third use case, for a new repository user (working against fcrepo4 directly, for some reason) who puts some items in and wants to find them again. I'd argue, once that user gets beyond the realm of trolling the haystack, they're best served by being introduced to an external search index.

And, maybe this reveals a pragmatic reason I'm antagonistic of this feature.. it seems to have an ill-defined user base, an ill-defined response ("whatever our storage layer feels like"?), and an implementation we want to actively discourage people from using. Maybe it belongs in an extension, maybe somewhere in core (post-launch), but I'd hate to develop a feature for 4.0 that we have to drag forward in the future.

This comment has been minimized.

Copy link
Contributor

replied Oct 27, 2014

Let me just add a note of detail: I was unaware until recently that UVa had put in that request and I haven't had a chance since then to consult with @mikedurbin. We may want to reopen the discussion.

This comment has been minimized.

Copy link
Contributor

replied Oct 27, 2014

I've been thinking about this, and I can't come up with a reasonable use case for internal search. I agree with @cbeer's assessement that a typical user query would be much better served by an external search service (be that Solr/Elastic or a triplestore) with a well-defined query syntax, API, etc. Most of the other discovery scenarios would be better handled by OAI-PMH, ResourceSync, or just walking through the entire repository.

So I'm left with a vague unease that boils down to feeling like internal search is something a repository should have, or feeling like ripping out features at the 11th hour will surprise and bother people. I heard a comment to that effect recently when the sitemap was removed (because the Hydra global-reindexing was using it). So on balance, I don't think any of this is a good reason to keep internal search. But I think going forward, a living, breathing customer who is willing to do acceptance testing should be a requirement for new features.

@cbeer cbeer force-pushed the cbeer-rdf-types branch 3 times, most recently from ad4480a to 839b3cd Oct 27, 2014

@cbeer

This comment has been minimized.

Copy link
Member Author

commented Oct 27, 2014

Needs to be redone on top of #587

@cbeer cbeer closed this Oct 27, 2014

@osmandin osmandin deleted the kill-the-jcr-type-system branch Aug 28, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.