New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"." incorrectly interpreted as a thousands separator #46
Comments
There are web-sites which use "." as thousands separator, so if the number of digits after "." is 3, then it's interpreted as thousands. If you know the decimal separator (e.g. from locale / language), you can pass it, see https://github.com/scrapinghub/price-parser#decimal-separator |
@AntonGsv btw if you have some examples from prices in the wild which have "." as decimal separator with 3 digits after it, that would be useful for our future improvements. |
Thanks @AntonGsv , great examples. I'll re-open the ticket, we'd like to be able to handle such cases better. Screenshots attached below. |
I think we can make a modify amount_float function something like
|
sorry @Manish-210 I'm not sure I understand your proposal - currently |
Sorry for the unclear explanation, here is what I mean, The current Example:
It is true that this will add a little work for the user but guessing it from the string provided by the user raises the possibility of error. Let me know your thoughts. |
Thanks for feedback @Manish-210 , I see what you mean. Yes, if someone knows that dots are unlikely to be a thousands separator, then it makes sense to be able to communicate it to the library via some option 👍 |
Should I give a try to implement it? |
I would first try to agree on an interface. I think this should be an argument to |
+1 to make it an argument to Another approach here could be to approach this from locales standpoint - e.g. different locales might have preferences from different formats. But we'd first need to check if this makes sense - does the way dot is interpreted also depend on locale. |
We could change the default behavior based on a locale, but I think allowing the flexibility to override the behavior would be best. |
Additional examples for the issue with wrong recognition thousands.
There is an idea to check how many digits after decimal separator this currency has based on this info https://en.wikipedia.org/wiki/ISO_4217. It can help to avoid some issues with familiar currencies with exactly 3 digits (as only this number price_parser considers as separator https://github.com/scrapinghub/price-parser/blob/master/price_parser/parser.py#L230C34-L230C80 ) |
@PyExplorer doing it based on currency makes sense, if different currencies have different rules. In addition to that, |
Hey guys. I just created a PR for this issue. If someone can give me a feedback I will appreciate it. |
Results:
The text was updated successfully, but these errors were encountered: