Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle human readable number casting in general #2797

Conversation

Aditramesh
Copy link

@Aditramesh Aditramesh commented Apr 10, 2023

Fixes #1309

Adds functionalities to handle casting of text to integer and decimal types which was implemented for numeric types in #1355

Technical details
Uses similar logic for the integer types and added functions for decimal types(real ,double precision ) to work with the logic for numeric

Screenshots

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@rajatvijay rajatvijay added the pr-status: review A PR awaiting review label Apr 10, 2023
@rajatvijay rajatvijay added this to the Backlog milestone Apr 10, 2023
@dmos62
Copy link
Contributor

dmos62 commented Apr 11, 2023

Continuation of #2465

Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue with the integer-finding regex that I found.

Also, please remember to write some tests. Parsing regexes by eyeball is not a robust way for me to check that these have the correct behavior, and we won't be merging this PR without a decent test suite associated with it.

Thank you for your effort so far.

Comment on lines 1104 to 1105
single_digit = r"^[0-9]$"
no_separator = r"[0-9]{2,}(?:([,])[0-9]{1,2}|[0-9]{4,})?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need separate expressions for single digit and multi digit non-separated integers. An integer with no separator is just a sequence of digits. No other complexity needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure what the no_separator part should match at this point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought numbers like 12,345 also have to be matched,that's why i was writing those two regex's ,later i realised 12,345 should not be matched so i think these the regex's should look like this now
no_separator = r"^-?\d+$"
comma_separator = r"[0-9]{1,3}(?:(,)[0-9]{3}){2,}"
period_separator = r"[0-9]{1,3}(?:(.)[0-9]{3}){2,}"
comma_separator_lakh_system = r"[0-9]{1,2}(?:(,)[0-9]{2})+,[0-9]{3}?"
single_quote_separator = r"[0-9]{1,3}(?:('')[0-9]{3})+"
space_separator = r"[0-9]{1,3}(?:( )[0-9]{3})+(?:([,])[0-9]+)?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the beginning or end string markers (or the [-] in the individual expression. Those are handled by the wrapping expression.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made those changes and added tests,thanks for the help throughout @mathemancer .

@rajatvijay rajatvijay assigned Aditramesh and unassigned mathemancer Apr 12, 2023
@rajatvijay rajatvijay added pr-status: revision A PR awaiting follow-up work from its author after review and removed pr-status: review A PR awaiting review labels Apr 12, 2023
@kgodey kgodey added pr-status: review A PR awaiting review and removed pr-status: revision A PR awaiting follow-up work from its author after review labels Apr 18, 2023
@mathemancer mathemancer self-requested a review April 26, 2023 13:28
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing some test cases. It really helps me see what's going on at a glance. However, there's been a regression in the logic since my last review. Please see the specific comment.

Thanks for your effort on this so far!

Comment on lines 1105 to 1109
no_separator = r"\d+"
comma_or_period_separator = r"([0-9]|[0-9]{3})(([,.])[0-9]{3})+"
comma_separator_lakh_system = r"([0-9]{1,2}(?:(,)[0-9]{2})+),[0-9]{3}"
space_separator = r"[0-9]{1,3}(?:( )[0-9]{3})+"
single_quote_separator = r"([0-9]{1,3})(?:(\'')[0-9]{3})+(?:(\'')?[0-9]*)?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comma and period separator ones, we need at least 2 groups of three to be sure that it's an integer. This is crucial in our inference system at the moment. I.e., it's not possible to determine the value of 1,234 without knowing the locale.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry my bad ,changing the regex to
comma_or_period_separator = r"([0-9]|[0-9]{3})(([,.])[0-9]{3}){2,}" should fix the issue you are referring to,correct ?

@rajatvijay rajatvijay assigned Aditramesh and unassigned mathemancer Apr 28, 2023
@rajatvijay rajatvijay added pr-status: revision A PR awaiting follow-up work from its author after review and removed pr-status: review A PR awaiting review labels Apr 28, 2023
@github-actions
Copy link

This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then.

@github-actions github-actions bot added the stale label Jun 14, 2023
@rajatvijay
Copy link
Contributor

Closing this due to inactivity.

@rajatvijay rajatvijay closed this Jun 16, 2023
@Aditramesh
Copy link
Author

@rajatvijay the requested changed were made the pr has to be checked once and merged

@kgodey kgodey reopened this Jun 26, 2023
@kgodey kgodey assigned mathemancer and unassigned Aditramesh Jun 26, 2023
@kgodey kgodey added pr-status: review A PR awaiting review and removed stale pr-status: revision A PR awaiting follow-up work from its author after review labels Jun 26, 2023
@github-actions
Copy link

This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then.

@github-actions
Copy link

This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then.

@github-actions github-actions bot added the stale label Sep 30, 2023
@rajatvijay rajatvijay removed the stale label Oct 2, 2023
@mathemancer
Copy link
Contributor

I'm sorry, @Aditramesh . This still isn't working correctly, and I've run out of time for going back-and-forth on this PR. I'm closing it.

I should have done this awhile ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Handle human readable number casting in general
5 participants