Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF to JSON should use @annotation, where possible #11

Closed
gkellogg opened this issue Jan 30, 2021 · 5 comments · Fixed by #13
Closed

RDF to JSON should use @annotation, where possible #11

gkellogg opened this issue Jan 30, 2021 · 5 comments · Fixed by #13

Comments

@gkellogg
Copy link
Member

The existing algorithm only uses embedded nodes.

@pchampin
Copy link
Collaborator

pchampin commented Feb 3, 2021

I see two things that can make this tricky:

  • the annotating triple may be encountered before the asserted annotated triple, so this would need to be done in a second phase
  • annotations themselves can be nested.

Here's an idea:

  • first generate the node maps normally (using embedded nodes)
  • for each node map (corresponding to a named graph or the default graph):
    • look for all subjects that are embedded nodes, and sort them by decreasing "depth" in annotation_candidates
    • for each embedded node en in annotation_candidates
      • search the node map for the corresponding asserted triple
      • if found, move the en entry of the node map as an @annotation (without its @id)

Example: we start with:

{
    "<< << ex:a ex:b ex:c >> ex:d ex:e >>": {
        "@id": { "@id": { "@id": "ex:a", "ex:b": {"@id": "ex:c"} }, "ex:d": {"@id": "ex:e"} },
        "ex:f": [ { "@id": "ex:g" } ]
    },
    "<< ex:a ex:b ex:c >>": {
        "@id": { "@id": "ex:a", "ex:b": {"@id": "ex:c"} },
        "ex:d": [ {"@id": "ex:e"} ]
    },
    "ex:a": {
        "ex:b": {"@id": "ex:c"}
    }
}

We find that the first entry has a match in the second entry, so we move it there as an annotation:

{
    "<< ex:a ex:b ex:c >>": {
        "@id": { "@id": "ex:a", "ex:b": {"@id": "ex:c"} },
        "ex:d": [ {"@id": "ex:e",
            "@annotation": { "ex:f": [ { "@id": "ex:g" } ] }
        } ]
    },
    "ex:a": {
        "ex:b": {"@id": "ex:c"}
    }
}

and then we find that the (originally) second entry has a match in the (originally) third entry, so we move it there as an annotation:

{
    "ex:a": {
        "ex:b": {"@id": "ex:c",
            "@annotation": {
                "ex:d": [ {"@id": "ex:e",
                    "@annotation": { "ex:f": [ { "@id": "ex:g" } ] }
                } ]
            }
        }
    }
}

@gkellogg
Copy link
Member Author

gkellogg commented Feb 3, 2021

That's essentially the same method I had been thinking about, although I think we can simply reverse sort by length to get the same effect. Of course, there may be some corner cases where annotations are used at the beginning and ending of a chain, but not in the middle that could be tricky.

Thanks for thinking this through!

@pchampin
Copy link
Collaborator

pchampin commented Feb 4, 2021

If by "length" you mean the string-length of the key, I find this too brittle.. A simple triple with a very long term may be longer than a deeply nested triple with short terms.
If by "length" you mean the number of atomic terms, then yes, it might be a good proxy for depth, but is that much easier to check.

Idea: could we encode the depth of a triple in their key in the node map? That way, the sorting would be much easier ;-)

@gkellogg
Copy link
Member Author

gkellogg commented Feb 4, 2021

My reasoning is that the goal is to look for more deeply embedded triples before more shallowly, and that terms that might be chosen before deeper embedded terms would not interfere. It's possible that subject- vs object- embedding could be fooled, though. Did you have some examples of where this would go wrong?

Idea: could we encode the depth of a triple in their key in the node map? That way, the sorting would be much easier ;-)

Yes, we could come up with an algorithm for creating the key, which would get back a measure of the depth of embedding in addition to the canonicalized form, and that would probably work okay.

@pchampin
Copy link
Collaborator

pchampin commented Feb 5, 2021

My reasoning is that the goal is to look for more deeply embedded triples before more shallowly,

yes, we agree on that

and that terms that might be chosen before deeper embedded terms would not interfere

Oh, I see it now! Of course, length is not a good proxy for depth in general, that is not a problem. We are actually only interested in the partial order "contains", for which length is a good proxy!

So yes, key length is much simpler, and doing the job perfectly. I stand corrected 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants