New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parallel relationships #1361

Open
gabesullice opened this Issue Jan 2, 2019 · 9 comments

Comments

Projects
None yet
3 participants
@gabesullice
Copy link
Contributor

gabesullice commented Jan 2, 2019

The spec is silent about parallel relationships (that is, more than one resource identifier object with the same type and id combo) in all but one case:

If a client makes a POST request to a URL from a relationship link, the server MUST add the specified members to the relationship unless they are already present. If a given type and id is already in the relationship, the server MUST NOT add it again.

I think it would be useful for this spec to provide explicit support for parallel relationships. This would mean relaxing the above statement and then clarifying their treatment and serialization.

Let's think about a real-world example:

  • I have a resource object with the type: organization
  • I have a relationship field named members.
  • I have a resource object with the type: user
  • I have a user who is both a "founder" of the organization and the "treasurer" & these roles have start and end dates.

A request to /organizations/1/relationships/members should return:

HTTP/1.1 200 Okay

{
  "data": [{
    "type": "user",
    "id": 42,
    "meta": {
      "role": "founder",
      "since": "2012-09-01T15:53:00+06:00",
      "until": "heat death of the universe",
    }
  }, {
    "type": "user",
    "id": 42,
    "meta": {
      "role": "treasurer",
      "since": "2017-01-01T00:00:00+06:00",
      "until": "2021-01-01T00:00:00+06:00"
    }
  }]
}

Obviously, one could model these memberships as standalone resource objects, however this can pose a significant technical challenge on systems which do not natively model relationships this way (my scenario). Moreover, in a traditional relational database, modeling relationships in that way can lead to application performance issues due to an increase in the average number of JOINs required for its typical queries.

We currently work around this shortcoming by adding a new member to the meta object of resource objects that we call arity:

{
    "type": "user",
    "id": 42,
    "meta": {
      "arity": 0
    }
  }, {
    "type": "user",
    "id": 42,
    "meta": {
      "arity": 1
    }
  }

We use this member to help a client disambiguate parallel relationships, to identify the relationship to delete on relationship DELETE requests and to provide a stable mechanism to add a parallel relationship via POST (this is the part that's in conflict with the spec).

I found this issue while attempting to author a profile for this mechanism, but realized it was in conflict with the spec. This made me think that it might actually be more valuable to add support for parallel relationships in the base spec and not in a profile since it's so closely related to core functionality.

So, what's my proposal?

Add an arity member to a resource identifier object with the following language:

A “resource identifier object” MUST contain an arity member when it is in an array that includes another resource identifier object with the same type and id values. The arity member MUST NOT be present if the resource identifier object is unique within the array.

  • The value of the arity member MUST be a non-negative integer.
  • The value of the first occurrence of an arity member MUST be zero for each set of resource identifier objects in the array with shared type and id values. For each subsequent resource identifier object within that set, the value MUST be incremented by one.
  • The arity member MUST NOT be used to indicate an order for resource identifier objects in an array.
  • When an array of resource identifier objects is reordered, the value of each arity member MUST be reindexed.

For example, it is possible for two resource identifier objects to have the same arity as long as they do not share the same type and id value; unrepeated resource identifier objects will not have an arity member; and arity does not correspond to array order:

{
  "data": [
    {"type": "color", "id": "red", "arity": 0},
    {"type": "color", "id": "blue", "arity": 0},
    {"type": "color", "id": "green"},
    {"type": "color", "id": "red", "arity": 1},
    {"type": "color", "id": "red", "arity": 2},
    {"type": "color", "id": "blue", "arity": 1}
  ]
}

This is proposal is exactly what we have been doing successfully in our own implementation (except we put arity under the meta member). In doing so, we've worked all the gritty details out for complete CRUD support too. More information here: https://www.drupal.org/node/2992674

@wimleers

This comment has been minimized.

Copy link
Contributor

wimleers commented Jan 3, 2019

Great write-up, thanks @gabesullice!

I think the third and fourth bullet (both talking about ordering) could be clarified: the third says arity does not indicate ordering, the fourth then does talk about ordering.

One thing that hasn't been mentioned yet: @FlorentTorregrosa opened #1156 about this ~1.5 year ago. We implemented the recommendation that @beauby made at #1156 (comment).

@gabesullice

This comment has been minimized.

Copy link
Contributor Author

gabesullice commented Jan 3, 2019

I think the third and fourth bullet (both talking about ordering) could be clarified: the third says arity does not indicate ordering, the fourth then does talk about ordering.

I'm struggling to think of a way to clarify if, maybe you can help?

In the third bullet, what I'm trying to say is that arity is not to be confused with the concept of "weight". IOW, that you can't use arity to reorder items in an array.

In the fourth bullet, I'm trying to indicate that arity isn't a mutable or storable value, it's deterministic and computed according to the order of the resource identifiers in the array. IOW, arity is just supposed to be used to disambiguate repeated resource identifiers, I thought "reindexing" would force clients and servers to implement it that way.

@gabesullice

This comment has been minimized.

Copy link
Contributor Author

gabesullice commented Jan 3, 2019

I updated the example, perhaps that's enough?

@wimleers

This comment has been minimized.

Copy link
Contributor

wimleers commented Jan 11, 2019

Just stumbled upon #1332, which is definitely related. See
#1332 (comment).

@wimleers

This comment has been minimized.

Copy link
Contributor

wimleers commented Jan 11, 2019

I think the fourth bullet conflicts with
https://jsonapi.org/format/#document-resource-object-linkage, which says:

Note: The spec does not impart meaning to order of resource identifier objects in linkage arrays of to-many relationships, although implementations may do that. Arrays of resource identifier objects may represent ordered or unordered relationships, and both types can be mixed in one response object.

That explicitly says that the order may be meaningful to the server, but it's up to the server implementation.

Third bullet: clarify the order aspect in relation to elsewhere in the spec

*The arity member MUST NOT be used indicate an order for resource identifier objects in an array, since the spec does not impart meaning to order in linkage arrays, although specific implementations may do that. arity is independent of order.

Fourth bullet: do we really need it?

Why does the arity, if present, need to be monotonically increasing with parsing order of resource identifier objects? Does this truly make a meaningful difference to the client? Why?

In other words: Why even have the fourth bullet?

@gabesullice

This comment has been minimized.

Copy link
Contributor Author

gabesullice commented Jan 11, 2019

@wimleers, I think you're misinterpreting bullets 3 and 4. Did you see my comment about them?(#1361 (comment)). It's also possible I don't understand what you're trying to point out.

@gabesullice

This comment has been minimized.

Copy link
Contributor Author

gabesullice commented Jan 11, 2019

Just stumbled upon #1332, which is definitely related. See
#1332 (comment).

I do not see the connection.

@ethanresnick

This comment has been minimized.

Copy link
Member

ethanresnick commented Jan 24, 2019

Something similar has been considered a few times.

The tricky part with proposals like this is always: how do you update the attributes associated with each resource identifier object (i.e., the data that you put in meta in the OP)?

This question inevitably comes up because the whole point of allowing the same resource identifier object to appear multiple times is to associate different metadata with each occurrence; if you only cared about representing the number of times that a given resource is included in a relationship, you would actually just include the target resource's type/id pair once, with meta.arity holding the total count for the number of times that that resource is in the relationship. (This was actually @beauby's original proposal, iiuc.)

So, again, any proposal like this needs some way to update the attributes associated with each resource identifier object. And, to do that, clients need to have some way to identify the identifier object they want to target for an update.

In your proposal, iiuc, the client does that by sending the arity value with a PATCH/DELETE request. And that works because type/id/arity together form a unique identifier for the object. The problem with this is that arity is not a stable value; if Client A deletes an occurrence of the resource from the relationship, then many of the arities will shift, and Client B may end up updating the wrong identifier object's attributes by mistake.

Sure, you could use some form of optimistic concurrency control to prevent this, but that probably shouldn't be a requirement; instead, a good design would probably allow the server to assign a stable id to each identifier object.

And then another question comes up: what if you want one of the pieces of data associated with your identifier object to actually be a pointer to another JSON:API resource? E.g., in your team members example, what if there are role resources (with, e.g., descriptions about the role, eligibility criteria, etc.), and you want role to be a pointer to one of those?

At this point, I personally start to think "Gee, these identifier objects are becoming a lot like standard resource objects".

I'm gonna write a longer comment about this soon, but a key thing I've been thinking about is "What should have identity in JSON:API?"

Key point from that link:

By [an] identity I mean a stable logical entity associated with a series of different values over time... Note that by identities I don’t mean names (I call my mother Mom, but you wouldn’t).

So, for this discussion, an identity is an entity that has a state, which is its value at a point in time. And a value is something that doesn’t change. 42 doesn’t change. June 29th 2008 doesn’t change... Even aggregates are values. The set of my favorite foods doesn’t change, i.e. if I prefer different foods in the future, that will be a different set. [At that point, though, an identity representing my favorite foods would take on that new set as its value.]

Once we start talking about having mutable attributes on resource identifier objects, at a fundamental level, we're talking about turning resource identifier objects from values into identities.

Personally, I think JSON:API already has way to many distinct concepts -- look at the mess around links vs. relationships -- so I'm loathe to add another one when a similar one already seems to exist. In this case, I think that's resource objects, which already are JSON:API's key identities (they're named by their type/id, with changing fields over time).

~~

You say that modeling these as standard resource objects is challenging technically or raises performance issues, but I don't really understand that part. Can you explain it to me?

If you already have a facility to store and update the metadata, it would seem that whether these show up as identifier objects or as distinct resources is more a question of serialization logic than data storage or anything else. I.e., I'm imagining you have a table like:

team_members
------------------------------------------------------------
| id  | team_id | member_id | role | start_date | end_date |
| ...   ....      .......      ...    .........   .......  |
------------------------------------------------------------

Can't that just as easily become:

{
   "type": "team-members"
   "id": team_members.id,
   "attributes": {
     "role": team_members.role,
     "from": team_members.start_date,
     "until": team_members.end_date
   },
   "relationships": {
     "team": { 
        data: { "type": "user", "id": team_members.team_id }
      }, 
      "member": {
        "data": { "type": "user", "id": team_members.member_id } 
      }
   }
}

Maybe the tricky part here is now for the client, which needs to add an extra ?include to see the user id? But honestly, that doesn't seem to bad to me.

What I do think would be very useful is a profile that standardizes this pattern of "lifting up" identifier objects into full resource objects to attach attributes to them. So, for example, the profile might say that these resource objects should always use the same type value (or have a specific key in meta) and should always have at least two relationships — e.g. owner and target — that point to the two sides of the relationship, or something like that.

@gabesullice

This comment has been minimized.

Copy link
Contributor Author

gabesullice commented Jan 24, 2019

Thanks for the detailed reply! I'm only able to respond to it in part ATM.

In your proposal, iiuc, the client does that by sending the arity value with a PATCH/DELETE request.

Partially correct. For DELETE, yes. For PATCH there's no reason to care particularly about arity except to disambiguate any parallel relationships in the request document. There's no real need to identify relationship values on the server since all values in the relationship are replaced in a relationship PATCH request.

The problem with this is that arity is not a stable value; if Client A deletes an occurrence of the resource from the relationship, then many of the arities will shift, and Client B may end up updating the wrong identifier object's attributes by mistake.

This is true, but partially mitigated by the fact that in this proposal each set of parallel relationships have their own series of arities. IOW, if Client A deletes a resource identifier that is not parallel to the relationship that Client B is updating, there would be no cross-effect. What you say is true if the relationships are parallel though and I admit this falls down without either a stable identifier or Etag + If-Match.


It seems concurrency control is a recurring impediment (I've seen it mentioned in a couple issues now). The obvious answer seems to be to recommend the use of the Etag + If-Match headers, but I guess there must be some reason this has not been considered suitable in the past?

UPDATE: Looks like there was a lot of discussion about Etag and If-Match in #600 and #824, I'll read those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment