Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate duplicates in expansion #129

Closed
lanthaler opened this issue May 24, 2012 · 8 comments

Comments

@lanthaler
Copy link
Member

commented May 24, 2012

In a recent update to the test suite Dave changed the behavior of expansion to remove duplicates in sets. Is this what we wanna do?

So, e.g., "prop": [ 1, 2, 2, 2, 2, 3 ] will now be expanded to "prop": [ 1, 2, 3 ] (of course as @value objects). Is this what we wanna do? Or is this something we should do as part of framing resp. subject map generation?

My concern is that we introduce a lot of overhead in expansion with very little advantages. An application will have to eliminate duplicates again as sets aren't merged at that phase in the processing piplelined yet. A subject could be represented several times in the expanded output - each of which could hold a subset of "the set".

In contrast, in the subject map generation algorithm we collect all data that belongs to a subject and so it makes sense to eliminate duplicates there.

@lanthaler

This comment has been minimized.

Copy link
Member Author

commented May 24, 2012

The more I think about this, the less it makes sense to remove duplicates. It's just too complex.. For example consider this (property being coerced to @id):

"property": [
   "http://example.com/me",
   { "@id": "http://example.com/me", "name": "Markus" },
   { "@id": "http://example.com/me", "lastname": "Lanthaler" },
   { "@id": "http://example.com/me", "name": "Markus" },
]

What should the result be? Should the three objects be merged? That would be the only thing that would make sense but it would make expansion extremely complex.. the properties of such objects could contain a quite deep tree.

@dlongley

This comment has been minimized.

Copy link
Member

commented Jun 12, 2012

I think I've come around to possibly accepting the idea that we ignore duplicates during expansion.

I think having some sort of merging feature in the JSON-LD API is important, but we do have it -- it's in framing. Furthermore, if data for the same subject appears in different properties it wouldn't be merged during expansion. There are some unfortunate side-effects of deciding not to do any set merging, however, including:

  1. Possibly having to use different helper functions/flags to add objects to subjects in application vs. in the internal algorithms (as you would probably want your standard helper functions to do the merging).
  2. Either complicating toRDF() or ignoring that it can emit duplicate triples (which isn't necessarily an easy thing to avoid in the iterative/streaming interface).
  3. Adding an additional pass to the normalization algorithm to remove duplicate triples after/during sorting.
@gkellogg

This comment has been minimized.

Copy link
Member

commented Jun 12, 2012

For toRDF(), removing duplicates should not be a problem, as this is the responsibility of the Triple Store (at least in my implementations). Turtle has the same issue, and can also be used to generate duplicate triples.

However, I can see that doing this in flattening would be useful.

@lanthaler

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2012

We would still eliminate duplicates when generating the subject map - which is also used for flattening. Since the normalization algorithm is not specific to JSON-LD anymore, it has to deal with duplicates anyway.. Turtle, RDF/XML, RDFa... none of them prevent duplicate triples AFAIK.

Perhaps toRDF() could be simplified by taking the subject as input instead of the expanded document!? Gregg's probably the best one to tell..

@gkellogg

This comment has been minimized.

Copy link
Member

commented Jun 13, 2012

Given the graph name context that's required for doing the RDF conversion, I'm not sure how this would work. However, generating RDF from a single subject is pretty much the same as generating it from a set of subjects.

However, I don't want to introduce new API dependencies at this time; right now we just depend on expansion. Framing and flattening, which is what I think you're suggesting, would introduce dependencies from a spec which we specifically separated out because we didn't think it was stable enough.

@lanthaler

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2012

RESOLVED: remove text relating to removing duplicates when expanding JSON-LD documents

1 similar comment
@gkellogg

This comment has been minimized.

Copy link
Member

commented Jun 19, 2012

RESOLVED: remove text relating to removing duplicates when expanding JSON-LD documents

lanthaler added a commit that referenced this issue Jun 19, 2012
@lanthaler

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2012

Closing the issue. The spec doesn't say anything about removing duplicates during expansion (it says "merge" which should be fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.