Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] web: more lenient number parser #115227

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

caburj
Copy link
Contributor

@caburj caburj commented Mar 14, 2023

PURPOSE

Avoid faulty inputs caused by a weird parsing/detection mixup between thousand
and decimal separator. We must let go of the (wrong) assumption that people will paste
numbers into the format defined on their locale. We should not depend on the locale
for number parsing as much as possible.

The locale format will still be used for formatting of course, so this config has
its use - but parsing should be much more agnostic.

HOW

The purpose is achieved by doing the following parsing steps:

  1. Parse the input using the original more strict parser using the locale.
  2. If it failed in the first step, we parse again using a more lenient heuristics.

This lenient parsing heuristic follows the steps:

  • Remove all the whitespaces.
  • Assuming "dot" and "comma" as separators, we collect them from the input in sequence.
  • If the number of separators is one
    • Check if it's a thousands separator from the locale
      • If so, remove it from the input
      • Otherwise, replace it with "dot" (as decimal point).
  • If the number of separators is two
    • Check if they're the same.
      • If so, then remove them because they're thousands separators.
      • Otherwise, the first is a thousands separator and the second is decimal point.
        • Remove the thousands separator and replace the decimal point with "dot".
  • If the number of separators is more than two
    • The first separators should be thousands separators while the last one is a decimal point.
    • Check if the first separators are all the same
      • If not, throw an error, the input can't be a number.
    • Check if the first separators are the same as the last
      • If so, then remove them all from the input (they're just thousands separators)
      • Otherwise, remove the thousands separators and replace the decimal point with "dot".
  • Convert the resulting input to number using Number.

TASK-ID: 3092583


I confirm I have signed the CLA and read the PR guidelines at www.odoo.com/submit-pr

@robodoo
Copy link
Contributor

robodoo commented Mar 14, 2023

@caburj caburj force-pushed the master-web-float-input-parsing-jcb branch from 9b3e875 to 52e0fc0 Compare March 14, 2023 16:13
@C3POdoo C3POdoo added the RD research & development, internal work label Mar 14, 2023
@caburj caburj force-pushed the master-web-float-input-parsing-jcb branch 3 times, most recently from 0adddea to a8b5476 Compare March 15, 2023 13:41
@caburj caburj changed the title WIP: working [IMP] web: more lenient number parser Mar 15, 2023
@caburj caburj force-pushed the master-web-float-input-parsing-jcb branch 3 times, most recently from 41cb0d3 to 56885fb Compare March 15, 2023 14:10
**PURPOSE**

Avoid faulty inputs caused by a weird parsing/detection mixup between thousand
and decimal separator. We must let go of the (wrong) assumption that people will paste
numbers into the format defined on their locale. We should not depend on the locale
for number parsing as much as possible.

The locale format will still be used for formatting of course, so this config has
its use - but parsing should be much more agnostic.

**HOW**

The purpose is achieved by doing the following parsing steps:

1. Parse the input using the original more strict parser using the locale.
2. If it failed in the first step, we parse again using a more lenient heuristics.

This lenient parsing heuristic follows the steps:

- Remove all the whitespaces.
- Assuming "dot" and "comma" as separators, we collect them from the input in sequence.
- If the number of separators is one
  - Check if it's a thousands separator from the locale
    - If so, remove it from the input
    - Otherwise, replace it with "dot" (as decimal point).
- If the number of separators is two
  - Check if they're the same.
    - If so, then remove them because they're thousands separators.
    - Otherwise, the first is a thousands separator and the second is decimal point.
      - Remove the thousands separator and replace the decimal point with "dot".
- If the number of separators is more than two
  - The first separators should be thousands separators while the last one is a decimal point.
  - Check if the first separators are all the same
    - If not, throw an error, the input can't be a number.
  - Check if the first separators are the same as the last
    - If so, then remove them all from the input (they're just thousands separators)
    - Otherwise, remove the thousands separators and replace the decimal point with "dot".
- Convert the resulting input to number using `Number`.

TASK-ID: 3092583
@caburj caburj force-pushed the master-web-float-input-parsing-jcb branch from 56885fb to dfd3a4c Compare March 16, 2023 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RD research & development, internal work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants