Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed behavior makes .frame's results hard to work with #119

Closed
jmandel opened this issue May 10, 2012 · 10 comments
Closed

Embed behavior makes .frame's results hard to work with #119

jmandel opened this issue May 10, 2012 · 10 comments

Comments

@jmandel
Copy link
Contributor

jmandel commented May 10, 2012

Executive summary

The framing algorithm's approach to "multiple embeds" makes it hard for developers to work with framed results.

Background

Developers want to frame JSON-LD payloads in ways that make them simple to work with. For example:

  • discover subjects of interest
  • loop over these subjects
  • resolve nested data with consistent paths

But in the current framing algorithm, machinery for avoiding circularity and avoiding verbose output introduces complexity for developers. Best to understand with an example.

Example

I'll illustrate with MedicationLists that have Medications that have DrugCodes with titles and identifiers:
Framing Problem: example in Playground

How developers want framing to work:

jsonld.frame(raw_data, function(err, response){
    response['@graph'].forEach(function(medlist){
        medlist.hasMedications.forEach(function(med){
            console.log("Drug: " + med.drugCode.title + "::" + med.drugCode.identifier);
        });
    });
});

... but in the example above, when we hit ['@graph'][0].hasMedication[2].drugCode we find a reference, not an embed! It takes severely defensive progrmaming to avoid this.

How developers need to work around the current framing behavior:

Since framed results don't reliably re-embed resources, developers need to check at each step whether an object is a reference or an embed. This means first creating a hash of known embeds, and then looking up values in this hash at every step through the framed result.

jsonld.frame(raw_data, medframe, function(err, response) {

    // identify an embed for each subject to resolve references 
    var subjects = {}
    findSubjects(subjects, med_response['@graph']);

    response['@graph'].forEach(function(medlist){
        medlist.hasMedications.forEach(function(med){

            // need to ensure drugCode is an embed, not a reference
            var drugCode = subjects[med.drugCode['@id']];

            console.log("Drug code: " + drugCode.title + "::" + drugCode.identifier);
        });
    });
});

// pseudocode for finding subject embds in framed results
function findSubects(subjects, subtree) {
    if (_isArray(subtree)) {
        subtree.forEach(function(elt){
            findSubject(subjects, elt);
        });

        return;
    }

    if (_isEmbed(subtree)) {
        subjects[subtree['@id']] = subtree;
    }

    if (_isObject(subtree)) {
        for (k in subtree) {
            findSubjects(subjects, subtree[k]);
        }
    }
};

And the workaround isn't complete

This workaround presents limitations. For instance:

  • How to deal with subjects that are supposed to be framed in different ways?
  • How to properly implement _isEmbed?

Proposal: aggressive re-embedding

I'd recommend re-embedding resources aggressively -- right up to (but not crossing) the point of creating circular references. There are some risks here, including an explosion in the framing output size for graphs rich in bidirectional links. Does anyone have ideas for mitigating this explosion?

(One alternative approach is to allow a mode of operation that doesn't produce a serializable framing output, but instead produces an in-memory structure with potential circularity. For many applications, this in-memory, potentially circular structure is a very natural fit for developers' goals. This could be separate from framing, if there were a simple, consistent way to take a serialized framed result and convert to an appropriate in-memory structure.)

@gkellogg
Copy link
Member

Agreed, I was tripped up by this. I think when using @embed, it should always embed. My example was a bit simpler: http://tinyurl.com/7jzaqj3

Basically, given an object with two properties (doap:developer and dc:creator) I want them both to expand, not just one of them.

Input:

 {
  "@context": {
    "doap:developer": {
      "@type": "@id",
      "@container": "@set"
    },
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dc:creator": {
      "@type": "@id",
      "@container": "@set"
    },
    "doap": "http://usefulinc.com/ns/doap#",
    "dc": "http://purl.org/dc/terms/",
    "@language": "en"
  },
  "@graph": [
  {
    "@id": "http://rubygems.org/gems/json-ld",
    "@type": "doap:Project",
    "dc:creator": ["http://greggkellogg.net/foaf#me"],
    "doap:developer": [
    {
      "@id": "http://greggkellogg.net/foaf#me",
      "@type": "foaf:Person",
      "foaf:homepage": "http://greggkellogg.net/",
      "foaf:name": "Gregg Kellogg"
    }],
    "doap:name": "JSON::LD"
  }]
}

Frame:

{
  "@context": {
    "@language": "en",
    "doap": "http://usefulinc.com/ns/doap#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dc": "http://purl.org/dc/terms/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "dc:creator": {"@type": "@id","@container": "@set"},
    "doap:homepage": {"@type": "@id"},
    "doap:implements": {"@type": "@id","@container": "@set"},
    "doap:developer": {"@type": "@id","@container": "@set"},
    "doap:helper": {"@type": "@id","@container": "@set"},
    "doap:created": {"@type": "xsd:date"},
    "foaf:homepage": {"@type": "@id"}
  },
  "@explicit": true,
  "@type": "doap:Project",
  "dc:creator": {
    "@explicit": true,
    "@embed": true,
    "@type": "foaf:Person",
    "foaf:name": {},
    "foaf:homepage": {}
  },
  "doap:developer": {
    "@explicit": true,
    "@embed": true,
    "@type": "foaf:Person",
    "foaf:name": {},
    "foaf:homepage": {}
  },
  "doap:name": {}
}

@lanthaler
Copy link
Member

I think the easiest way to fix this would be to keep a list of stuff that has already been embedded (from the root to the current property). If a subject wasn't embedded yet, embed it by default. If @embed is set, embed it also if it was already embedded in the path from the root to the current property - which requires to break circular references. Maybe just holding references in the last embed would already solve this..

Thoughts?

@dlongley
Copy link
Member

We already keep a list of what has been embedded.

When Josh first brought this up in #json-ld, I told him that I had been thinking of changing the framing behavior to do this anyway, as it would help solve a couple of issues: the unusual behavior of removing existing embeds, and that there is a bug in that algorithm involving traversing the path to the root through arrays.

In any case, I'd support re-embedding information and avoiding cycles. I think re-embedding whenever possible will be preferred behavior, and we might want to have a "strict" flag to throw an exception when a cycle would occur and a re-embed was avoided.

@gkellogg
Copy link
Member

+1

Cycles that would lead to recursive embedded declarations should probably just turn into subject references.

@dlongley
Copy link
Member

We might be able to alter the current algorithm to just keep track of the root of the path being currently processed and for for each embed (instead of its immediate parent) and compare those when making embed decisions to avoid cycles by using subject references or to throw exceptions when in strict mode. We'll probably need to make a couple of other changes, but hopefully nothing too drastic.

I'm not sure how we want to handle conflicts between auto-embeds and frame-specific (@embed: true) embeds. The older frame algorithm used to replace the auto-embeds with subject references -- which we could now do only if a cycle would be created. However, if we replace those auto-embeds then we'll have to keep the existing embed replacement code (and dangling embed clean up code) which I'd prefer not to. If we can work around that and still produce something that matches what we think people will expect from framing that would be best.

@lanthaler
Copy link
Member

I think this issue (conflicts between auto-embeds and frame-specific (@embed: true) embeds would be solved by not automatically including the whole "subtree" as proposed in #110. As frame will never have an infinite depth, I think it would be OK to embed something several times in one path if the frame author wants that.

+1 to get rid of the existing embed replacement code

@dlongley
Copy link
Member

Actually, we could just view the default behavior as @embed: true (which is really what happens now anyway) and then there's no conflict either. If you add @embed: true to your frame when that's the default option, it's just like repeating yourself so there's no issue. I think either way, if we decided @embed: false or @embed: true is the default, we can remove the embed replacement code and just check for cycles.

@lanthaler
Copy link
Member

I just uploaded the latest version of my processor which supports agressive re-embedding. To automatically include the whole sub-tree (which is not the default behavior), add "@embedChildren": true to the frame.

I uploaded a modified version of the playground so that you try it without needing to download/install anything.

lanthaler added a commit that referenced this issue Aug 29, 2012
@lanthaler
Copy link
Member

RESOLVED: Do not support .frame() in JSON-LD 1.0 API.

@gkellogg
Copy link
Member

This is handled in 5a3e506 to allow more values for @embed:

  • @always,
  • @last,
  • @link, -- Not really testable, as it's in-memory only
  • @never

In addition to true and false, which map to @always and @never.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants