Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decimal place constraint to number fields #641

Open
ezwelty opened this issue Sep 6, 2019 · 5 comments
Open

Add decimal place constraint to number fields #641

ezwelty opened this issue Sep 6, 2019 · 5 comments

Comments

@ezwelty
Copy link
Contributor

ezwelty commented Sep 6, 2019

I am rewriting and publishing an existing dataset as a Data Package (https://gitlab.com/ezwelty/glathida), and it includes decimal place limits on several numeric fields. For now, I have enforced this using a pattern constraint:

"pattern": "^\\-?[0-9]*(\\.[0-9]{0,7})?$"

Unfortunately, this practice violates the schema, which currently insists that pattern only apply to post-cast values of string fields (#428). I understand the complexity that is avoided by the decision, but also regret the huge potential for specificity that is lost. I wish pattern was applied to the field values as stored in the text file (csv, json, or otherwise).

Otherwise, I see no other option than adding a specific decimal place constraint for number fields.

@rufuspollock
Copy link
Contributor

@ezwelty thanks for reporting and if i understand this you are trying to constrain raw data structure or resulting number?

@ezwelty
Copy link
Contributor Author

ezwelty commented Sep 9, 2019

@rufuspollock Fundamentally, the number as it is stored in the text file. Once it is read in, the concept of decimal places may be lost (for example, "1.10" could become 1.1 with no knowledge that it was parsed from "1.10"). The original dataset designers wanted to ensure that even if data (e.g. GPS coordinates) were submitted by contributors with absurd numbers of decimal places, they were published rounded to a reasonable number of decimal places.

@rufuspollock
Copy link
Contributor

OK, clear you want parse constraints before casting. Hmmm that seems like it would need something new right ... do you have a suggestion on this that is generic?

@ezwelty
Copy link
Contributor Author

ezwelty commented Sep 10, 2019

The simplest I can think of is to allow the pattern constraint on at least all field types with string representations (so everything except JSON data stored as JSON rather than as a string parsable as JSON). The field is read as a string and all values but those in missingValues are tested against pattern before casting to the target type.

So for example, percentages with up to one decimal place (e.g. "95.2%" and "95%"):

{
  "type": "number",
  "decimalChar": ".",
  "bareNumber": false,
  "constraints": {
    "minimum": 0
  }
}

could be more specifically constrained by:

{
  "type": "number",
  "constraints": {
    "pattern": "^[0-9]+(\\.[0-9]{1})?%$"
  }
}

and integer geopoints in the eastern hemisphere stored as a string parsable as a JSON array (e.g. "[90, 45]"):

{
  "type": "geopoint",
  "format": "array"
}

could be more specifically constrained by:

{
  "type": "geopoint",
  "format": "array",
  "constraints": {
    "pattern": "^\\[[0-9]{1,3}, \\-?[0-9]{1,2}\\]$"
  }
}

The one caveat I can think of – the one brought up by @pwalsh in #428 – is how to deal with JSON data stored as JSON. I presume that JSON values, arrays, and objects could be either read as raw strings or converted back to strings for pattern testing? JSON objects quickly get unwieldy for pattern testing, but it can still be done. For example, integer geopoints in the eastern hemisphere stored as a JSON object:

{
  "type": "geopoint",
  "format": "object",
  "constraints": {
    "pattern": "^\\{\"lon\":\\s*[[0-9]{1,3},\\s*\"lat\":\\s*\\-?[0-9]{1,2}\\}$"
  }
}

@dafeder
Copy link
Contributor

dafeder commented Apr 16, 2024

As discussed in #879 I think this is important but using regex on numbers seems very wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Frictionless General
  
Specifications
Development

No branches or pull requests

4 participants