Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buggy behavior when dealing with decimals? #29

Closed
ivanlen opened this issue Oct 8, 2020 · 5 comments · Fixed by #30
Closed

buggy behavior when dealing with decimals? #29

ivanlen opened this issue Oct 8, 2020 · 5 comments · Fixed by #30

Comments

@ivanlen
Copy link

ivanlen commented Oct 8, 2020

Hey, I am dealing with this behavior and I don't know if its a bug or I am doing something wrong:

import pandas as pd
import numpy as np
import gspread
import gspread_dataframe
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account()
sh = gc.open('pandas_spreadsheet')
worksheet = sh.get_worksheet(0)

test_df = pd.DataFrame.from_records([{'a': i/500 + np.random.randn(), 'b': i * 2} for i in np.arange(0,1000,100)])
test_df
set_with_dataframe(worksheet, test_df)

Expected:
Screen Shot 2020-10-08 at 15 10 39

Obtained:
image

And if i go to the cell A1 and copy the cell value is -9622382525353560
I thought it was something with the formatting, but this number is not a float.

If I export the df using
set_with_dataframe(worksheet, test_df.round(3))
I obtain the following:
image

And this is a the same test in a new blank worksheet (to avoid any kind of previous formatting problems)

image

Please let me know if there is something that I can do to help you with a PR.

@robin900
Copy link
Owner

robin900 commented Oct 9, 2020

@ivanlen I suspect that a Google spreadsheet's locale settings determine not just the default display formatting of numbers and dates, but also the parsing format rules when interpreting cell value inputs that are in mode USER_ENTERED.

Would you be willing to try the following:

  1. Manually change the locale for this Google spreadsheet to "United States". With the sheet open in a browser, visit File->Spreadsheet settings. Then run your Python script again and see if the float values are parsed correctly as floats.

  2. If everything works fine for your script once spreadsheet locale is "United States", then post here what your original locale was, and we will proceed from there.

(I suspect the solution to the problem will be either:

A) Allow overrides to the valueInputOption value included in Sheets API cell updates; currently the package uses USER_ENTERED to take advantage of the powerful value parsing logic in Sheets, but it's apparently that some users need to bypass USER_ENTERED.

B) Add locale awareness to set_with_dataframe in some fashion, so that formatting inputs using values from the dataframe take the expected locale of the spreadsheet into consideration.)

@ivanlen
Copy link
Author

ivanlen commented Oct 9, 2020

Hey @robin900 , thanks for your reply.
If I change the locale settings of the spreadsheet to US everything works as expected.

The location that causes the bug was Spain. I am actually from Argentina and if a set settings to Argentina the problem persists as well. I suspect there is something with the Spanish language and formatting. Probably any Country that speaks Spanish will have the same behavior.

I am currently in the middle of some projects, but If I have some free time I will inspect the source code of your project and see if I can implement any of your solutions.
I the meantime, I set the locale to US.

robin900 added a commit that referenced this issue Oct 13, 2020
Esnures that numeric values in DataFrames are sent to Sheets API as JSON numeric
values, so that locale-specific parsing of decimal separators cannot misinterpret
the stringified decimal values. Test coverage added.
@robin900
Copy link
Owner

@ivanlen He subido un PR #30 que arregla el problema con locales no de EEUU, sin causar regresiones en el paquete de pruebas. Este PR soluciona el problema para valores numéricos pero no para valores de fecha/datetime. Creo que una solución para valores de fecha/datetime no debe ser incluido en el PR porque 1) no hace falta en este momento porque str(alguna_fecha) ahora produce un string que se puede leer bajo casi todos los locales comunes; y 2) una solución para valores de fecha/datetime tendrá que ser acompañada de pruebas adicionales y completas.

Si tienes un momento y pudieras revisar y probar este PR, te lo agradecería!

robin900 added a commit that referenced this issue Oct 14, 2020
…ale-specific misinterpretation (#30)

* Fixes #29. Ensures that numeric values in DataFrames are sent to Sheets API as JSON numeric
values, so that locale-specific parsing of decimal separators cannot misinterpret
the stringified decimal values. Test coverage included.
@robin900
Copy link
Owner

@ivanlen Release 3.1.1 with this fix ya está en PyPI. Gracias por su ayuda!

@ivanlen
Copy link
Author

ivanlen commented Oct 14, 2020

Hey @robin900 , sorry that I wasn't able to test it before! But Im glad that my Issue helps to improve the overall quality of the package. I will update my version to 3.1.1

Thank you for maintaining this repo,
Cheers from Argentina ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants