Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future possibility for delimiter-separated list for arrays (instead of JSON array)? #736

Closed
thbar opened this issue May 10, 2021 · 8 comments · Fixed by frictionlessdata/datapackage#38
Assignees
Milestone

Comments

@thbar
Copy link

thbar commented May 10, 2021

I'm a colleague of @geoffreyaldebert, working on a French national CSV schema for "bikes counting" at the moment.

My understanding is that #712 frictionlessdata/frictionless-py#627 introduced a way to restrict allowed values in an array, which is neat.

In our case (based on input from future data producers and reusers), we would like to avoid using JSON arrays for those values, and instead use delimiter-separated values, which are less complicated to write and decode without troubles for less technical users.

The rationale is that we are creating a CSV schema to avoid JSON in the first place, which some users find confusing with their current level of technicality, to drive adoption.

Our current solution (WIP, the schema is not published yet) is to use a regex pattern:

https://github.com/etalab/schema-comptage-velo/blob/15096e6145b4926530a6fc5126db8cd25e35c803/schema.json#L175-L184

It is a trick commonly used before for that case (e.g. https://schema.data.gouv.fr/etalab/schema-inclusion-numerique/latest/documentation.html#propriété-public_cible).

So my question is: is there room to consider future evolutions to add a "CSV-array" column type, with restrictions on actual values to be in an allowed range?

Thanks!

@roll
Copy link
Member

roll commented May 10, 2021

We discussed it in Discord and I think that type: array; format: separator to have something like:

id,array
1,"A,B,C"

might make sense for the specs

@thbar
Copy link
Author

thbar commented May 27, 2021

FWIW, I have had some feedback from users who would possibly appreciate to have a non-comma separator (e.g. |), which is a "lower tech" way to achieve this and requires less escaping. I am not sure I want to encourage that, though. Ideally just a , as a separator would be quite coherent with the regular case.

Thanks for considering this, it would be great to have and would let us clean a few schemas!

@AyrtonB
Copy link

AyrtonB commented Aug 12, 2021

I'm currently working on a PR to integrate this. I've made the relevant changes in array.py but now need to integrate it elsewhere and add tests.

Currently I'm getting this error (below), which I can get rid of if I remove the format entry for the field.
FrictionlessException: [field-error] Field is not valid: "{'name': 'sett_bmu_id', 'type': 'array', 'format': ', ', 'array_item': {'type': 'string'}, 'description': 'The Balancing Mechanism Unit identifier used for settlement purposes by Elexon', 'title': 'Settlement BMU ID'} is not valid under any of the given schemas" at "" in metadata and at "anyOf" in profile

Where should I be looking to add this to the schema?

@roll
Copy link
Member

roll commented Aug 12, 2021

@AyrtonB
It must be a JSONSchema rule in frictionless/assets/profiles/schema/general.json. We need to update the format definition there for array types

@AyrtonB
Copy link

AyrtonB commented Aug 12, 2021

That makes sense, I'll do that

@AyrtonB
Copy link

AyrtonB commented Aug 12, 2021

I'll continue discussion around specifics of this implementation in the PR linked above

@jze
Copy link
Contributor

jze commented Feb 20, 2023

Is is already possible to specify arrays without the square brackets? I would say it is the normal case for CSV files. You have a value like 594866,594868,608288 and each number references to a primary key in another CSV files.

@roll
Copy link
Member

roll commented Feb 20, 2023

Hi, I've created a feature request for the framework to pilot the feature:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment