New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inferring dtypes in get_as_dataframe #1
Comments
Very nice description of the problem. I also am unsure if the solution belongs in gspread-dataframe. What do you think of the arguments to |
Does it belong here?(IMHO: Yes) Your work thus far has removed (all?) the real pain points like batching, offsets, and performance. The remaining problems are minor, and as of now, I can list only two:
If we can make an approximate solution to both problems by just trying out the three different APII see three options, each having its pros and cons:
Here are some suggestions for how we could approach each of the options: Consistency with
|
I wonder: Since I'm looking especially at https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py#L1852 |
Actually I think we may have to go all the way to https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py#L1101 |
@NTAWolf Turns out to be very easy to hook up https://github.com/robin900/gspread-dataframe/tree/pandas-parser Note: None of the |
pandas-parser branch now supports |
Great job! It works like a charm for me :-) |
OK, I will be adding some tests to exercise the different keyword arguments for In the meantime, a quick recipe with the current release is below. (It will always
|
@NTAWolf I've opened #2 to represent the switch to |
…ell values in a DataFrame. Deal with regression where float precision is mangled during round-trip testing, by using repr() on float values and str() on other values. Fixes #1.
This is an enhancement proposal.
For my use case, it could be nice if gspread-dataframe was able to try to infer column dtypes when fetching data from a sheet. While individual cells are converted through
numericise
, their column dtype remainsobject
, and the returned dataframe fails equality checks with the original dataframe.Motivating example
Suggested solution
I am unsure what is the best way to deal with this, and whether it is a general enough use-case to warrant an addition to
gspread-dataframe
. At any rate, the following code is my initial stab at how dtype inference could be implemented:It intentionally places timedelta before datetime, as '00:03:00' can be interpreted as either one by pandas. In my use-case, datetimes always include a date, so '00:03:00' would definitely be a timedelta.
Take it for a spin!
The text was updated successfully, but these errors were encountered: