-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle human readable number casting in general #2797
Handle human readable number casting in general #2797
Conversation
…ting_in_general' into handle_human_readable_number_casting_in_general
Continuation of #2465 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue with the integer-finding regex that I found.
Also, please remember to write some tests. Parsing regexes by eyeball is not a robust way for me to check that these have the correct behavior, and we won't be merging this PR without a decent test suite associated with it.
Thank you for your effort so far.
db/types/operations/cast.py
Outdated
single_digit = r"^[0-9]$" | ||
no_separator = r"[0-9]{2,}(?:([,])[0-9]{1,2}|[0-9]{4,})?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need separate expressions for single digit and multi digit non-separated integers. An integer with no separator is just a sequence of digits. No other complexity needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure what the no_separator
part should match at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought numbers like 12,345 also have to be matched,that's why i was writing those two regex's ,later i realised 12,345 should not be matched so i think these the regex's should look like this now
no_separator = r"^-?\d+$"
comma_separator = r"[0-9]{1,3}(?:(,)[0-9]{3}){2,}"
period_separator = r"[0-9]{1,3}(?:(.)[0-9]{3}){2,}"
comma_separator_lakh_system = r"[0-9]{1,2}(?:(,)[0-9]{2})+,[0-9]{3}?"
single_quote_separator = r"[0-9]{1,3}(?:('')[0-9]{3})+"
space_separator = r"[0-9]{1,3}(?:( )[0-9]{3})+(?:([,])[0-9]+)?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the beginning or end string markers (or the [-]
in the individual expression. Those are handled by the wrapping expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made those changes and added tests,thanks for the help throughout @mathemancer .
…ps://github.com/Aditramesh/mathesar into handle_human_readable_number_casting_in_general update branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing some test cases. It really helps me see what's going on at a glance. However, there's been a regression in the logic since my last review. Please see the specific comment.
Thanks for your effort on this so far!
db/types/operations/cast.py
Outdated
no_separator = r"\d+" | ||
comma_or_period_separator = r"([0-9]|[0-9]{3})(([,.])[0-9]{3})+" | ||
comma_separator_lakh_system = r"([0-9]{1,2}(?:(,)[0-9]{2})+),[0-9]{3}" | ||
space_separator = r"[0-9]{1,3}(?:( )[0-9]{3})+" | ||
single_quote_separator = r"([0-9]{1,3})(?:(\'')[0-9]{3})+(?:(\'')?[0-9]*)?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the comma and period separator ones, we need at least 2 groups of three to be sure that it's an integer. This is crucial in our inference system at the moment. I.e., it's not possible to determine the value of 1,234
without knowing the locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry my bad ,changing the regex to
comma_or_period_separator = r"([0-9]|[0-9]{3})(([,.])[0-9]{3}){2,}" should fix the issue you are referring to,correct ?
…ps://github.com/Aditramesh/mathesar into handle_human_readable_number_casting_in_general
This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then. |
Closing this due to inactivity. |
@rajatvijay the requested changed were made the pr has to be checked once and merged |
This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then. |
This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then. |
I'm sorry, @Aditramesh . This still isn't working correctly, and I've run out of time for going back-and-forth on this PR. I'm closing it. I should have done this awhile ago. |
Fixes #1309
Adds functionalities to handle casting of text to integer and decimal types which was implemented for numeric types in #1355
Technical details
Uses similar logic for the integer types and added functions for decimal types(real ,double precision ) to work with the logic for numeric
Screenshots
Checklist
Update index.md
).develop
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin