Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Exluding fields with Sparse Fieldsets #1632

Open
ThorstenSuckow opened this issue Jun 18, 2022 · 18 comments
Open

RFC: Exluding fields with Sparse Fieldsets #1632

ThorstenSuckow opened this issue Jun 18, 2022 · 18 comments
Labels
extension Related to existing and proposed extensions as well as extensions in general

Comments

@ThorstenSuckow
Copy link

JSON:API RFC: Exluding fields with Sparse Fieldsets

Introduction

This RFC proposes an addition to the syntax used with Sparse Fieldsets, namely to add the asterisk character (U+002A ASTERISK, "*") to the fields[TYPE] parameter, allowing for

  • specifying a wildcard as a representative for ALL available fields of a resource object
    and
  • specifying a list of fields to exclude in a resource object
GET /documents/1?fields[document]=*,title HTTP/1.1
Accept: application/vnd.api+json
The above example requests all fields to be included with the resource object, but not the field `title`.

Proposal

The following is defined in the JSON:API specification (V1.0, also upcoming V1.1) by the time of writing this RFC:

  • The value of the fields parameter MUST be a comma-separated (U+002C COMMA, “,”) list that refers to the name(s) of the fields to be returned.

and

  • An empty value indicates that no fields should be returned.

furthermore:

  • If a client does not specify the set of fields for a given resource type, the server MAY send all fields, a subset of fields, or no fields for that resource type.

While the specifications gives the client a clear indication of how to define the fields included with the resource object in the response, it is not possible to fall back to a set of "default fields" that exclude a specific set of fields at the same time.
The only option to fall back to a predefined list of fields is by omitting the fields[TYPE] parameter. The implementing API will then take care of computing the fields to return; this may also lead to no fields being returned at all, as per specification.

Drawbacks of the current specification

The current specification makes it hard to maintain large set of fields which are part of larger resource objects, specially DTOs/value objects that aggregate data without providing the option to query related resources for data.
Take for example a fictional document represented by a resource object that has a list of 20 fields, field_1 .. field_20. The computing of the values for field_18 .. field_20 are costly and require complex operations that produce more load on the server: The client uses these fields only on some occasions, e.g. in master/detail views: A lean representation of the resource object in a grid vs. the resource object loaded in its entirety for the detail view. Instead of designing the API to use specifics of the represented entities and provide different resource locations for the resource object (Object A containing field_1 .. field_ 17, Object B containing field_1 .. field_20) , sparse fieldsets are used.

The current specifications require the following syntax to make sure field_18, field_19 and field_20 are NOT included with the resource object in its response:

GET /documents/1?fields[document]=field_1,field_2,field_3,field_4,field_5,field_6,field_7,field_8,field_9,field_10,field_11,field_12,field_13,field_14,field_15,field_16,field_17 HTTP/1.1
Accept: application/vnd.api+json

The same could be achieved with the asterisk character (U+002A ASTERISK, "*") character by inversing the list of included fields, and providing a syntax for excluding fields:

GET /documents/1?fields[document]=*,field_18,field_19,field_20 HTTP/1.1
Accept: application/vnd.api+json

Changes that affect the current specification

This changes the specification of the JSON:API in its current version to the following:

BEGIN
The value of the fieldsparameter MUST be a comma-separated (U+002C COMMA, ",") list that refers to the name(s) of the fields to be returned. An empty value indicates that no fields should be returned.
If the value of the fields parameter is not a comma-separated list of fields, and it is not empty, it MUST be an asterisk (U+002A ASTERISK, "*"), or a comma-separated list of fields starting with an asterisk (U+002A ASTERISK, "*"). The asterisk MUST be treated as if the client had specified ALL fields of the requested resource object, optionally followed by the list of fields that MUST NOT be included with the resource object in its response.
END

The following lets the server decide which fields to return with the resource object as per current specification:

GET /documents/1 HTTP/1.1
Accept: application/vnd.api+json

The following extends the current specification by requesting ALL available fields of the resource object by using a wildcard with fields[TYPE]

GET /documents/1?fields[document]=* HTTP/1.1
Accept: application/vnd.api+json

The following extends the current specification to return the resource object with all available fields, excluding the title field, which MUST NOT be included in the resource object in its response:

GET /documents/1?fields[document]=*,title HTTP/1.1
Accept: application/vnd.api+json

Additions

According to https://www.rfc-editor.org/rfc/rfc3986#appendix-A, it should be safe to use the asterisk character unencoded with the query string.

@auvipy
Copy link

auvipy commented Jun 19, 2022

@jelhan

@jelhan jelhan closed this as completed Jun 19, 2022
@jelhan
Copy link
Contributor

jelhan commented Jun 19, 2022

Thanks a lot for this proposal and the clear structure in which it is provided.

Not being able to request all fields of a resource without knowing them upfront has been raised as a limitation of the JSON:API specification often. Mainly motivated by discovery. Often in context of questions if server may response with a sparse fieldset by default if fields query parameter is not present. Adding * as a keyword to request all fields addresses these limitation. Sounds reasonable to me.

Requesting specific fields using an exclude lists seems reasonable as well. I'm not aware of any requests for such a feature though. We should be careful to not specify something without actual use case. Otherwise it may just not be implemented by most libraries.

I find the syntax fields=*,text confusing. It's surprising that text is excluded if lists start with * but included if not. Also it limits flexibility as client would not be able to request default fields except a few. It would be more flexible if syntax to request all fields and syntax to exclude a field, would be independent from each other. Maybe by prefixing a field with - a client could request to not include this field.

Some examples to illustrate that idea:

Request all fields of a resource post:

GET ?fields[post]=*

Request all fields except text:

GET ?fields[post]=*,-text

Request that server excludes text field but leave it up to server implementation which other fields to include:

GET ?fields[post]=-text

If adding to base specification, it must be optional. Otherwise it would be a breaking change for any server-side implementation. That means a server may support sparse fieldsets but may not support * to include all fields or excluding specific fields.

An alternative may be adding the feature using an extension. Haven't fully thought through that one yet.

@jelhan jelhan reopened this Jun 19, 2022
@Juanmcuello
Copy link

I have doubts about the use of the * to retrieve all fields. An alternative could be for the * to behave as if we were specifying the default fields and leave the server to decide which are these defaults. This way we can exclude the computed and expensive fields from the defaults. If we want these fields in the response, we can append them to the *. This is what PostgREST does with computed columns.

So basically, maybe we can do something like this:

Request all default fields (same as not sending the fields param):

GET ?fields[post]=*

Request all default fields and also field_18, field_19 and field_20:

GET ?fields[post]=*,field_18,field_19,field_20

Regarding the exclusion of certain fields, I'm not sure if we should provide a way to do it, as maybe the use of the * for default fields covers most of the use cases. You can always be explicit and send the exact fields you want, avoiding the fields you don't.

With this approach we still don't have a way to discover all available fields, but I'm not sure if the fields param should be the way to do it.

@jugaadi
Copy link

jugaadi commented Jun 20, 2022

We too need this feature. However, I find the * as the first value confusing.

Suggestion:

Current specification specifies Exclude All Except Given fields. Similarly, we can have a Include All Except Given by using a - in the type.
Example: fields[-post]=field_18,field_19,field_20

@ThorstenSuckow
Copy link
Author

Thank you all for your feedback so far. Let me add a few details to explain the intent behind the proposal.

Given the current spec, omitting the fields[TYPE] parameter leaves the decision of the fields being returned with the resource object to the server:

If a client does not specify the set of fields for a given resource type, the server MAY 
send all fields, a subset of fields, or no fields for that resource type.

as @jelhan pointed out, I also think that

Not being able to request all fields of a resource without knowing them upfront 
has been raised as a limitation of the JSON:API specification often.

I guess we can all agree on the fact that the stability (or idempotent result) of the structure of a requested resource object must be guaranteed by a well maintained, documented and versioned API: The list of fields returned with a resource object should only change across major versions, if the fields[TYPE] parameter is omitted in the query path.

However, there are use cases where the client needs the server to explicitly agree on the list of fields that should be returned with the resource object.
A contract like this can already be provided by utilizing fields[TYPE] as follows:

fields[TYPE]= 

is a contract for including no fields in the resource object, i.e. send the resource object with an empty attributes-field, or no attributes-field at all

fields[TYPE]=field_1,field_2

include the specified fields field_1, field_2 with the resource object.

Syntax

Now, by allowing the client to exclusively use a wildcard instead of a definition of the list of fields, the server MUST agree on the following:

Given the current version of the server API, include ALL fields in the resource object that could also be requested individually by the client.

Example: If the resource object exposes the fields field_1, field_2, field_3 in version A, and these fields are queryable by the client, the fields[TYPE] parameter with a value set to *

fields[TYPE]=*

MUST return the fields field_1, field_2, field_2 with the resource object for at least version A.

The ASTERISK is used here as as a representative for the ALL-semantics, since this character is already a known and accepted character for this purpose in informatics, so developers are given known tools at hand. For example,

SELECT * FROM table

requests a result sets containing all the fields from a table in sql.

If this syntax exists for including ALL fields of a resource object, we could expand on this by allowing to submit a list of fields which denotes the list of fields that should be EXCLUDEd from the resource object:

fields[TYPE] = fieldQuery

fieldQuery = all_query / exclude_query / field_query

exclude_query = (all_query ",") 1*field_query

field_query = WORD [*("," WORD)] 

all_query = "*"

(I do not provide a formal definition for WORD, but I'm sure you get the idea.

In short:

  • a leading wildcard character (* in this case) represents the "ALL-fields" query
  • a subsequent list of fields will be considered as the list of fields to EXCLUDE

Given the above example, where the resource object exposes the fields field_1, field_2, field_3

fields[TYPE]=*,field_1

reads as {field_1, field_2, field_3} \ {field_1} and yields {field_2, field_3}:

include ALL fields FROM type EXCEPT (field_1)

Marking fields as excluded

I tempered around with marking fields as excluded, but I thought that the syntax becomes too ambiguous.

@jelhan thought about introducing a hyphen - as a prefix for fields that should be excluded

fields[TYPE]=*,-field_1

whereas @jugaadi proposed to introduce the "inversion" of the fields[TYPE] parameter, like so:

fields[-TYPE]=field_1

(note the hyphen - as prefix for TYPE now)

Both examples provide the same meaning, just with a different syntax: EXCLUDE field_1 from the resource object that is requested (provided TYPE in @jugaadi's example is returned with the same set of default fields as @jelhan requests with the * character).

Both examples make constructs like this possible - and, given my own experience - more likely to happen:

fields[TYPE]=field_1,field_2,-field_1

and:

fields[-TYPE]=field_1,field_2&fields[TYPE]=field_2,field_3

The question here is: what's given precedence, if any? Of course, edge cases like these can be clearly stated as invalid and require the server to respond with a 400 Bad Request (since the client is obviously confused about what it really wants), and admittedly, using the wildcard invalid queries like

fields[TYPE]=field_1,*,field_2

are also possible and should be responded with a 400 Bad Request, but the logic leveraged to the server for validity checks (if the client fails with assembling correct query strings) should be more easy when a wildcard-character is only allowed as a leading character with the fields[TYPE] parameter: The more clear the syntax can be defined upfront, the less confusion about its usage ends up in the specification, and lessens the number of developers actually battling the implementation of the specifications.

Drawbacks of the Wildcard-character

Across versions, a contract between the client and the API regarding ALL fields given a wildcard character is not stable. Given the fact that fieldsets may change across API versions anyway, I think this is negligible.

For discussion

The following points could also be up for discussion:

  • providing endpoints for discovering the fields exposed by a TYPE (resource object)
  • and/or: bundle the proposal (along with querying available fields) into a QUERY-extension, if applicable. This could prevent trigger-happy clients from over-requesting endpoints when too much functionality is added to the base specification.

@dgeb
Copy link
Member

dgeb commented Jun 20, 2022

Thanks for the proposal @ThorstenSuckow and for the discussion everyone!

I can appreciate the benefit of allowing fields to be specified that are relative to a default fieldset in addition to allowing only a fixed set of fields. It seems important that relative fields not be mixed with fixed fields to simplify processing rules and clarify intention.

I've considered a syntax to handle this over the years. The cleanest approach that I can arrive at is to require that + or - to prefix every field that is relative to the default fieldset for a type. And it must be invalid to mix any relative fields with any fixed fields for a given type.

For example, if type1 has default fields a and b and optional fields c and d, the following would be valid ways to request the fields a and c:

// VALID - fixed fieldset
fields[type1]=a,c

// VALID - relative fields
fields[type1]=-b,+c

However, mixing of relative and fixed fields would be invalid:

// NOT VALID - mixing fixed and relative fields
fields[type1]=a,-b,c

Does anyone have concerns with this slight modification to the proposal?

@ThorstenSuckow
Copy link
Author

ThorstenSuckow commented Jun 20, 2022

Hey @dgeb,

thanks for the response.

I think some kind of indicator to request ALL (i.e. a disjunction of default & optional) fields of a resource object would remove some of the opaqueness of the current specification.

Other than that I think your modification fits the use case and requirement very well as it provides a clear definition of how a client could add more granularity to its requests:

A sparse fieldset MUST either contain only fixed fields, or only relative fields.

FIXED FIELDSETS denote the list of fields that should exclusively be returned with the response object and are already specified with Sparse Fieldsets.

RELATIVE FIELDSETS denote the list of fields which are optionally included with or excluded from the resource object, based on a default fieldset defined by the implementing API, and as such available with its documentation (see addendum to this comment).

Fields which should be optionally included MUST be prefixed with + (U+002B PLUS SIGN, “+”), fields which should be excluded from the resource object MUST be prefixed with - (U+002D HYPHEN-MINUS, “-“) (
Section 5.8.2, Reserved Characters, must be updated accordingly, then.)

If a relative fieldset is specified and the server cannot include a requested field, the server MUST respond with a 400 Bad Request (similiar to 6.3. Inclusion of Related Resources, when a related resource that should be included cannot be identified by the server).

If a relative fieldset is specified and the server fails to identify an excluded field, the server MAY respond with a 400 Bad Request.

Addendum

I think this would even more justify an endpoint for discovering the default fields of a resource object, e.g.

"links": {
    "self": "http://example.com/articles",
    "discover": "http://example.com/articles/discover"
  }

Notes

Does the current specification regarding an empty field[TYPE] parameter (which equals to NO fields included with the resource object for TYPE) not lead to more confusion when we're speaking of specifying a fieldset relative to a default fieldset? Is it too ambiguous?

No fields returned:

articles/1?field[article]=

Fields returned relative to the default fieldset:

articles/1?field[article]=+date,-title

The intention is clear. Is the language for transposing this concept clear enough?

@dgeb
Copy link
Member

dgeb commented Jun 20, 2022

I think some kind of indicator to request ALL (i.e. a disjunction of default & optional) fields of a resource object would remove some of the opaqueness of the current specification.

One of my concerns with this is that, for consistency, such a wildcard would also be allowed to be specified on its own to request default fields, which would make explicit what was formerly implicit. Clients would then be tempted to specify fields[article]=* instead of just leaving off this param, and that would be incompatible with 1.0 servers.

Also, I'm not sure that * properly connotes "default". It reads more like "all" to me.

For these reasons, I'd just as soon leave this additional complexity out of the proposal.

If a relative fieldset is specified and the server cannot include a requested field, the server MUST respond with a 400 Bad Request (similiar to 6.3. Inclusion of Related Resources, when a related resource that should be included cannot be identified by the server).

I am uncomfortable adding these requirements for relative fieldsets that are not imposed upon fixed fieldsets. There are times when a set of fields may be requested but are not available to a particular user, particularly due to authorization concerns. The current approach allows servers to omit some requested fields without forcing the client to have full knowledge of what fields are authorized prior to making a request.

I think this would even more justify an endpoint for discovering the default fields of a resource object

That seems like a reasonable use of the description document referenced by describedBy.

Does the current specification regarding an empty field[TYPE] parameter (which equals to NO fields included with the resource object for TYPE) not lead to more confusion when we're speaking of specifying a fieldset relative to a default fieldset? Is it too ambiguous?

There is certainly some nuance here. I am trying to be accepting of the general proposal, since it's come up many times, without breaking backwards compat in any way.

@auvipy
Copy link

auvipy commented Jun 21, 2022

One of my concerns with this is that, for consistency, such a wildcard would also be allowed to be specified on its own to request default fields, which would make explicit what was formerly implicit. Clients would then be tempted to specify fields[article]=* instead of just leaving off this param, and that would be incompatible with 1.0 servers.

Also, I'm not sure that * properly connotes "default". It reads more like "all" to me.

For these reasons, I'd just as soon leave this additional complexity out of the proposal.

If this turn out to be a breaking issue, can we defer this from 1.1 now and consider in future?

@jelhan
Copy link
Contributor

jelhan commented Jun 21, 2022

I think there are different understanding of what * should represent leading to confusion:

  • @ThorstenSuckow propose that it represents all fields, which could be requested for a resource.
  • @dgeb seems to interpret it as a shortcut for all fields, which a server returns for a resource if client does not request a specific sparse fieldset.

The default fieldset could be a subset of all fields a resource has. So both could have very different meaning.

This discussion seems to assume that the default fieldset us a fixed one. That is not guaranteed by the spec. If client does not request specific fields explicitly a server may response with any fieldset. It could even pick the fields to be included randomly. Any specification of a relative fieldset can only specify that specific fields must be included (+text) or must not be included (-text). It would be up to server implementation which other fields are included. And server may decide to change on a per-request basis.

Looking at the many different proposals each having its own trade-offs, I think it would be best to experiment in user-land using an extension first before adding anything to the base specification.

@dgeb
Copy link
Member

dgeb commented Jun 21, 2022

Looking at the many different proposals each having its own trade-offs, I think it would be best to experiment in user-land using an extension first before adding anything to the base specification.

I think this is a reasonable solution. And with v1.1 so close, it is probably the only practical one for now.

@jelhan jelhan added the extension Related to existing and proposed extensions as well as extensions in general label Jun 22, 2022
@ThorstenSuckow
Copy link
Author

One of my concerns with this is that, for consistency, such a wildcard would also be allowed to be specified on its own to request default fields, which would make explicit what was formerly implicit. Clients would then be tempted to specify fields[article]=* instead of just leaving off this param, and that would be incompatible with 1.0 servers.

The idea behind the wildcard * is to include all default and all optional fields. I get your point, but I also think that - if omitting fields[TYPE] - the opaqueness introduced with the specification by letting the server decide what to return can be a cause for errors, e.g. if the server decides to omit/add fields of/to a resource object in between requests. As far as I understand that would definitely possible and the only agreement on a guaranteed set of fields can be established by explicitly specifying them with the fields[TYPE] parameter. I thought of * as a replacement, also to keep requests more readable and reduce the visual complexity of some.

Also, I'm not sure that * properly connotes "default". It reads more like "all" to me.

Yes, default and optional :)

I am uncomfortable adding these requirements for relative fieldsets that are not imposed upon fixed fieldsets. There are times when a set of fields may be requested but are not available to a particular user, particularly due to authorization concerns. The current approach allows servers to omit some requested fields without forcing the client to have full knowledge of what fields are authorized prior to making a request.

That is a valid point and I haven't thought about this.

That seems like a reasonable use of the description document referenced by describedBy.

Thanks for pointing this one out!

@ThorstenSuckow
Copy link
Author

Looking at the many different proposals each having its own trade-offs, I think it would be best to experiment in user-land using an extension first before adding anything to the base specification.

I think this is a reasonable solution. And with v1.1 so close, it is probably the only practical one for now.

If you have already an idea for the syntax being used, I'm eager to adopt this since I'm currently in the process of rewriting an API that is in the need of fields being excluded from requests.

If there's anything I can help with or contribute to the project/docs, let me know!

@jelhan
Copy link
Contributor

jelhan commented Jun 24, 2022

If there's anything I can help with or contribute to the project/docs, let me know!

Extensions, which will be included in upcoming v1.1 version of the spec, allows to extend base spec at user-land. You do not need to wait for any editor of the base spec to write an extension. You could write, publish and extension yourself. The same applies to profiles. Actually that's the main idea behind it: Unblock consumers to extend (extensions) and further specify (profiles) JSON:API without editors of the base spec being a bottleneck.

I'm happy to pair on public API and review a draft for such an extension. Feel free to send me a direct message on Twitter if you like.

@ThorstenSuckow
Copy link
Author

Hey @jelhan, I have collected the feedback from this discussion and created a draft for this feature over at https://github.com/ThorstenSuckow/relfield:

  • prefixes for fields are considered with this draft (+, -)
  • wildcards are optional

Let me know what you think!

@ThorstenSuckow
Copy link
Author

ThorstenSuckow commented Jul 11, 2022

Hey all!

I have updated the specifications with the relfield-namespace as suggested by @jelhan

Providing full compatibility with the fieldset's base specification, the extension should now cover most of the use cases if excluding/adding fields is required by a client. If prefixes are omitted with the relfield:fields[TYPE]-parameter, this extension behaves in accordance with the base specification of fieldsets.

The current draft can be found here: https://github.com/ThorstenSuckow/relfield

Up for discussion: The draft allows for mixing relfield:fields- with fields-query parameters, if, and only if the parameters refer to different resource object types. It keeps the query-strings flexible, though, since relfield: would only be required for fieldsets where the client needs the exclusion/addition of fields given the new syntax.

This means, given the current draft:

?relfield:fields[article]=-date&fields[article]=author

must yield a 400 Bad Request, whereas

?relfield:fields[article]=-date&fields[comment]=title

would be allowed.

@ThorstenSuckow
Copy link
Author

If this turn out to be a breaking issue, can we defer this from 1.1 now and consider in future?

@auvipy
The wildcard * is a SHOULD right now in the draft for the relfield-extension. This is to stay more close to the base specifications and only add the + / - (mandatory) for marking fields as excluded/additional.
If the client requests a fieldset with a wildcard and the server does not support wildcards, the server must give an appropriate (error) response.
Unfortunately, the expected behavior of requesting fields with wildcards is somewhat unclear due to possible state interferences given user credentials and access to specific fields, so it would require another set of specifications how the server would respond if the wildcard is submitted in a fieldset, but the client may not read all fields.
So, should the server respond with the allowed subset of field, respond with a 403 Forbidden or soften the specifications and simply add a "MAY respond with" to the extension...

@ThorstenSuckow
Copy link
Author

ThorstenSuckow commented Nov 4, 2022

Hey there, given some feedback I have updated the draft for this extension. The changes include:

  • Remove the wording that relfield extends the sparse fieldsets specs
  • prefixes are mandatory, if the client decides to omit them, it should rather fall back to base specs of sparse fieldsets
  • decided to make wildcard not optional
  • extracted examples to end of document and updated some wording to make the doc more normative
  • add reference to similar implementations (bitbucket API)

Thanks @jelhan for your feedback and contributions so far !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Related to existing and proposed extensions as well as extensions in general
Projects
None yet
Development

No branches or pull requests

6 participants