Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API should validate input for number columns #1145

Closed
mathemancer opened this issue Mar 7, 2022 · 27 comments · Fixed by #1270
Closed

API should validate input for number columns #1145

mathemancer opened this issue Mar 7, 2022 · 27 comments · Fixed by #1270
Assignees
Labels
affects: architecture Improvements or additions to architecture good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this type: bug Something isn't working work: backend Related to Python, Django, and simple SQL

Comments

@mathemancer
Copy link
Contributor

mathemancer commented Mar 7, 2022

Description

Currently, the API accepts strings for values input to number-typed columns. In some cases, these strings carry locale-sensitive information, i.e., using specific decimal points and negation styles. This is a problem since confusion will arise whenever the client, service, and database have different locale settings (it's likely the client and DB will have different locale settings by default). Even worse, the locale settings in the database (assuming PostgreSQL) may be applied differently in different contexts.

Expected behavior

Columns which use a number type for storage at the DB layer should only accept numbers in one of two formats:

  • an actual JSON number, or
  • A string conforming to the JSON number spec, except wrapped in double-quotes.

The validation of this should be locale-independent, and should happen in the Mathesar web service rather than the database.

To Reproduce

  • Create a table with a number-typed column containing a decimal point (e.g., FLOAT).
  • Send an API request with input for that column as a string, with a comma for a decimal point.
    • You can do this easily from the browseable API, see /api/db/v0/tables/<table_ID>/records/<record_ID>/
  • Observe the database-layer error.
@mathemancer mathemancer added type: bug Something isn't working status: triage affects: architecture Improvements or additions to architecture good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this ready Ready for implementation work: backend Related to Python, Django, and simple SQL and removed status: triage labels Mar 7, 2022
@mathemancer mathemancer added this to the [07] Initial Data Types milestone Mar 7, 2022
@pavish
Copy link
Member

pavish commented Mar 7, 2022

@mathemancer I think the api should accept string for number-types, atleast for types like 'BIGINT'.

The frontend can only handle Bigint if it's a string. Javascript can only represent it using a dedicated class and there's no way to represent it as a number in JSON.

@pavish pavish added status: draft and removed ready Ready for implementation labels Mar 7, 2022
@kgodey
Copy link
Contributor

kgodey commented Mar 7, 2022

I think that we should enforce that the number doesn't have any non-digit characters, but the actual input can be string or numeric. @mathemancer, I'm assigning this to you to resolve and removing the help wanted labels until this is no longer a draft.

@kgodey kgodey removed good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this labels Mar 7, 2022
@mathemancer
Copy link
Contributor Author

@pavish It won't just be BIGINT. NUMERIC is arbitrary-precision, so we'd need to be able to handle large numbers with decimal points. The problem with that is we can't know the locale settings and so on for the front end. Thus, in order to accept strings for numbers, we should probably establish a "canonical style" for numbers, and require the front end to submit any numbers in that style (if they're submitted as strings). This style would specify how negation and decimal points work at a minimum. We wouldn't allow grouping characters. I suggest the canonical style being a string version of the description of numbers in the JSON spec. (Quite irritating, by the way, that the spec appears to be ignored by pretty much every implementation, at least w.r.t. arbitrary precision numbers).

The reason to be so fussy with the style is that mistakes would potentially change input values under various mismatched locale conditions.

If we want to be more forgiving, I suppose we could get away with simply disallowing grouping characters and assuming only one negative. Then any non-negative meaning character would be assumed to be the decimal point, and the presence of either - or both ( and ) would imply negative. Note that in some locales / styles, (-1234) == 1234, but in others, (-1234) == -1234. We would always assume the latter. I'm not exactly comfortable with that, but I raise it as an option for completeness.

@pavish What do you think? Is it acceptable to ensure any number submitted as a string conforms to the JSON spec number style, except being wrapped in double-quotes?

@pavish
Copy link
Member

pavish commented Mar 8, 2022

Is it acceptable to ensure any number submitted as a string conforms to the JSON spec number style, except being wrapped in double-quotes?

@mathemancer Yes, I think we can ensure that from the frontend.

@mathemancer mathemancer changed the title API should validate input type for number columns API should validate input for number columns Mar 8, 2022
@mathemancer mathemancer added ready Ready for implementation good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this and removed status: draft labels Mar 8, 2022
@mathemancer
Copy link
Contributor Author

Okay, it should be ready for work now; @kgodey Re-adding the good first issue and help wanted labels, since it they should still apply and the issue is no longer a draft.

@kgodey
Copy link
Contributor

kgodey commented Mar 8, 2022

Sounds good, thanks @mathemancer.

@seancolsen
Copy link
Contributor

establish a "canonical style" for numbers, and require the front end to submit any numbers in that style (if they're submitted as strings)... I suggest the canonical style being a string version of the description of numbers in the JSON spec.

Agreed. Better to be strict here, not forgiving.

@ManishShah120
Copy link
Contributor

Hi, @kgodey @mathemancer I am interested in working on this issue, Can I please be assigned to it. This would be my first issue with this Org and I am interested in adding some useful patches in future.

--Thank You 😄

@pavish
Copy link
Member

pavish commented Mar 9, 2022

@ManishShah120 I've assigned this issue to you. Please let us know if you need any help regarding it.

Glad to have you helping build Mathesar. :)

@kgodey kgodey added status: started and removed ready Ready for implementation labels Mar 9, 2022
@ManishShah120
Copy link
Contributor

Hi @pavish Thank you, I am trying to understand the issue and tested by entering string in the column with number type, but it's
handling the error nicely and we are not able to input string to it. Can I get a more brief about this issue, with example inputs.

--Thank you

@pavish
Copy link
Member

pavish commented Mar 9, 2022

@ManishShah120 This issue focuses only on the backend work. You can see the reproduction steps in the above issue description, which might give you a better idea of the issue.

To Reproduce

Create a table with a number-typed column containing a decimal point (e.g., FLOAT).
Set database locale to be one that uses commas for decimal points
Send a request with input for that column as a string, with a comma for a decimal point.
Observe it being put into the database.

The string should strictly confirm to the JSON number spec.
You would have to change the db locale directly on the database.

@mathemancer looping you in, if you have additional comments.

@kgodey
Copy link
Contributor

kgodey commented Mar 9, 2022

@ManishShah120 I updated the "To Reproduce" step in the issue description with details on how to send ad-hoc API requests, that might help.

@ManishShah120
Copy link
Contributor

@ManishShah120 I updated the "To Reproduce" step in the issue description with details on how to send ad-hoc API requests, that might help.

Thank you @kgodey this helped me alot in debuggin and understanding the flow.

I tested with this values via the browsable API:-

{
    "45": 14,
    "46": 13,
    "47": 12,
    "48": 12,
    "49": 14.0,
    "50": "-13.103434",
    "51": "123e-5",
    "52": "-20.00123"
} 200 response

and with this values:-

{
    "45": 14,
    "46": 13,
    "47": 12,
    "48": 12,
    "49": 14.0,
    "50": "-13,103434",
    "51": 123e-5,
    "52": "-20.00123"
} 500 response

the query looks something like this:-

UPDATE "Test_DB"."Table 3" SET id=14, "Integer4bytes"=13, "Integer8bytes"=12, "Integer2bytes"=12, "DecimalPlaces3"=14.0, "DecimalDigits4"='-13,103434', "Float6digitreal"=0.00123, "Float15digitsdouble"='-20.00123' WHERE "Test_DB"."Table 3".id = '14'

with this as an ERROR response:-

db_1       | 2022-03-10 05:54:28.272 UTC [44] ERROR:  invalid input syntax for type numeric: "-13,103434" at character 139

But is this not what we would expect?? If i can understand it correctly, I am required to add a validation in the API level that it should only accept input in number format and not in string enclosed form,

@silentninja
Copy link
Contributor

silentninja commented Mar 10, 2022

The expected behaviour states that

Columns which use a number type for storage at the DB layer should only accept numbers in one of two formats:

an actual JSON number, or
A string conforming to the JSON number spec, except wrapped in double quotes.

So the API can accept input as a string, but the format should be a JSON spec number. The reason is to avoid conflicts based on locale. For example, let's say I want to set the value as for 99900(Ninety Nine thousand and Nine hundred)

If the frontend is using a US locale, it would be sending 99,900, if the backend happens to be using a locale that uses a comma separator for decimal it would be stored as 99,9 which would mean (Ninety Nine as a whole number and Nine as a decimal)

I am required to add a validation in the API level that it should only accept input in number format and not in string enclosed form

It should validate that the string enclosed format does not contain separators that don't conform to the JSON Spec

@mathemancer
Copy link
Contributor Author

mathemancer commented Mar 10, 2022

@ManishShah120 What does the API error show in that case? We need to move the error handling up to the serializer layer to avoid depending on any service or DB locale settings.

@mathemancer
Copy link
Contributor Author

@ManishShah120 I've updated the issue description with some more details and clarifications. The goal is to avoid letting the database even see a number which doesn't conform to the JSON number spec (as a string or number). It's important that we do the checking ourselves in a locale-independent way.

@ManishShah120
Copy link
Contributor

ManishShah120 commented Mar 10, 2022

@ManishShah120 What does the API error show in that case? We need to move the error handling up to the serializer layer to avoid depending on any service or DB locale settings.

@mathemancer
here's the complete stacktrace:-

{
    "45": 14,
    "46": 13,
    "47": 12,
    "48": "99,9900",
    "49": "14.0",
    "50": 100.00321,
    "51": "99,900",
    "52": "-98,7654"
} 500 response on PATCH request

Response of the API endpoint(with CURL cmd):-

HTTP/1.1 500 Internal Server Error
Allow: GET, PATCH, DELETE, HEAD, OPTIONS
.......
X-Content-Type-Options: nosniff
X-Frame-Options: DENY

[
    {
        "code": 4999,
        "detail": null,
        "field": null,
        "message": "(psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type smallint: \"99,9900\"\nLINE 1: ...er4bytes\"=13, \"Integer8bytes\"=12, \"Integer2bytes\"='99,9900',...\n^\n\n[SQL: UPDATE \"Test_DB\".\"Table 3\" SET id=%(id)s, \"Integer4bytes\"=%(Integer4bytes)s, \"Integer8bytes\"=%(Integer8bytes)s, \"Integer2bytes\"=%(Integer2bytes)s, \"DecimalPlaces3\"=%(DecimalPlaces3)s, \"DecimalDigits4\"=%(DecimalDigits4)s, \"Float6digitreal\"=%(Float6digitreal)s, \"Float15digitsdouble\"=%(Float15digitsdouble)s WHERE \"Test_DB\".\"Table 3\".id = %(id_1)s]\n[parameters: {'id': 14, 'Integer4bytes': 13, 'Integer8bytes': 12, 'Integer2bytes': '99,9900', 'DecimalPlaces3': '14.0', 'DecimalDigits4': 100.00321, 'Float6digitreal': '99,900', 'Float15digitsdouble': '-98,7654', 'id_1': '14'}]\n(Background on this error at: http://sqlalche.me/e/14/9h9h)"
    }
]

Response (from the docker instance)

db_1       | 2022-03-10 10:51:54.525 UTC [45] ERROR:  invalid input syntax for type smallint: "99,9900" at character 95
db_1       | 2022-03-10 10:51:54.525 UTC [45] STATEMENT:  UPDATE "Test_DB"."Table 3" SET id=14, "Integer4bytes"=13, "Integer8bytes"=12, "Integer2bytes"='99,9900', "DecimalPlaces3"='14.0', "DecimalDigits4"=100.00321, "Float6digitreal"='99,900', "Float15digitsdouble"='-98,7654' WHERE "Test_DB"."Table 3".id = '14'
service_1  | Internal Server Error: /api/db/v0/tables/11/records/14/
service_1  | [10/Mar/2022 10:51:54] "PATCH /api/db/v0/tables/11/records/14/ HTTP/1.1" 500 15007

@ManishShah120
Copy link
Contributor

It should validate that the string enclosed format does not contain separators that don't conform to the JSON Spec

Okay got it.

@kgodey
Copy link
Contributor

kgodey commented Mar 18, 2022

@ManishShah120 Are you still working on this?

@ManishShah120
Copy link
Contributor

ManishShah120 commented Mar 19, 2022

Yes, @kgodey am working, I was required to work on certain other Issues I was assigned to so a bit late on this. Will be opening a PR asap.

@ManishShah120
Copy link
Contributor

Hi, I need little help/hint with this, what I am thinking is to write the logic to check if the validated data number is a valid number or not (integer, float) iff the column it is updating is having the column type as NUMERIC(), in this file https://github.com/centerofci/mathesar/blob/master/mathesar/api/serializers/records.py

Can I please get any hint or help with this?

@silentninja
Copy link
Contributor

@ManishShah120 This issue is a bit more involved compared to other good first issues.

You need to check if the data for a NUMERIC column conforms to the JSON spec and not just a numeric. For starters, in the validate method of the serializers I would do something like

  • Get the list of column types
  • For any numeric column, check if the data conforms to JSON schema format.

@kgodey
Copy link
Contributor

kgodey commented Mar 29, 2022

@ManishShah120 Do you need any more help here?

@ManishShah120
Copy link
Contributor

Yes, I am trying to achieve it, we had a discussion on this in the office hour call. it's a bit more challenging.

@ManishShah120 ManishShah120 removed their assignment Apr 1, 2022
@kgodey
Copy link
Contributor

kgodey commented Apr 3, 2022

@ManishShah120 I assume you're no longer working on this since you unassigned yourself?

@kgodey kgodey added ready Ready for implementation and removed status: started labels Apr 3, 2022
@kid-116
Copy link
Contributor

kid-116 commented Apr 5, 2022

Hey, @kgodey @pavish @mathemancer ! I would like to take up this issue. I've setup the codebase and I think I'm ready to work on a problem. Could I be assigned to it?

@seancolsen
Copy link
Contributor

Thanks @kid-116 I've assigned it to you.

@kgodey kgodey added status: started and removed ready Ready for implementation labels Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: architecture Improvements or additions to architecture good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this type: bug Something isn't working work: backend Related to Python, Django, and simple SQL
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

7 participants