Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new list field type for typed collections, lexically delimiter-based #38

Merged
merged 4 commits into from
Mar 11, 2024

Conversation

roll
Copy link
Member

@roll roll commented Feb 21, 2024


Rationale

Since the last major Data Package release, there have been two very often occurring feature requests (on the specs and implementation levels):

  • ability to have strongly-typed arrays for SQL-interop
  • ability to serialize arrays with delimiter instead of verbose JSON serialization

The initial attempt was to implement it as a part of the array type (#31, #34) but during the discussions we figured out that it might be better to separate two groups of data types:

  • based on JSON data model (array and object), especially now when we're adding constriants.jsonSchema for them
  • based on SQL-array data model (list)

I think we still can merge array and list in this PR but I'm curious what you think as it seems to be that array and list in this PR edition are pretty different both lexically and logically.

Copy link

@nichtich nichtich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just two minor comments!

- datetime
- date
- time
constraints:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowed constraints for list should also be mentioned in the text, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's something to improve on the next maintenance-level iteration. Currenlty, constraint applicability are listed in the constraints section for all the types - https://datapackage.org/specifications/table-schema/#constraints - but I really think it needs to be improved

content/docs/specifications/table-schema.md Outdated Show resolved Hide resolved
Copy link

cloudflare-pages bot commented Feb 21, 2024

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3c2a9b4
Status: ✅  Deploy successful!
Preview URL: https://d9ed3024.datapackage.pages.dev
Branch Preview URL: https://736-list-field-type.datapackage.pages.dev

View logs

The list field can be customised with these additional properties:

- **delimiter**: specifies the character sequence which separates lexically represented list items. If not present, the default is `,` (comma).
- **itemType**: specifies the list item type in terms of existent Table Schema types. If present, it `MUST` be one of `string`, `integer`, `boolean`, `number`, `datetme`, `date`, and `time`. If not present, the default is `string`. A data consumer `MUST` process list items as it were individual values of the corresponding data type. Note, that on lexical level only default formats are supported, for example, for a list with `itemType` set to `date`, items have to be in default form for dates i.e. `yyyy-mm-dd`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **itemType**: specifies the list item type in terms of existent Table Schema types. If present, it `MUST` be one of `string`, `integer`, `boolean`, `number`, `datetme`, `date`, and `time`. If not present, the default is `string`. A data consumer `MUST` process list items as it were individual values of the corresponding data type. Note, that on lexical level only default formats are supported, for example, for a list with `itemType` set to `date`, items have to be in default form for dates i.e. `yyyy-mm-dd`.
- **itemType**: specifies the list item type in terms of existent Table Schema types. If present, it `MUST` be one of `string`, `integer`, `boolean`, `number`, `datetime`, `date`, and `time`. If not present, the default is `string`. A data consumer `SHOULD` process list items as it were individual values of the corresponding data type. Note, that on lexical level only default formats are supported, for example, for a list with `itemType` set to `date`, items have to be in default form for dates i.e. `yyyy-mm-dd`.

Typo correction.

Just as with delimiter I think data consumers should have the option not to process this field. Therefore SHOULD.

@peterdesmet
Copy link
Member

I think I prefer a separate data type for this, it will make consumption/validation easier. Intuitively, I would call it array, but that is already is already reserved for JSON arrays.

I have updated MUST to SHOULD, to allow consumers to opt to not parse the delimiter and values.

@roll
Copy link
Member Author

roll commented Feb 21, 2024

@peterdesmet
I think MUST there is better because on a logical (Table Schema) level, it can't be anything else other than a sequence of typed items. How an implementation is going to represent this logical entity -- depends on the implementation. Similar to any other feature we use MUST for implementation that can support this feature but an implementation can just mark array and list as not supported at all (raise error, document etc)

PS.
As previously discussed, for example, fields.missingValues MUST ... but if frictionless-r just doesn't support this feature this MUST is not applicable until the feature is supported.

@roll
Copy link
Member Author

roll commented Mar 11, 2024

ACCEPTED by WG (6/9)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants