include_tailing_empty=False causes crash in get_as_df when last column is empty #423

aaroncooper · 2020-05-06T01:55:23Z

Note, if this is a usage question, please ask a question in stackoverflow with pygsheets tag.

Describe the bug
When using worksheet.get_as_df and setting include_tailing_empty=False, a completely empty final named column leads to a crash. It'd be nicer to gracefully return a df with

To Reproduce
Steps to reproduce the behavior: make a google sheet with a rightmost named column with empty cells below. attempt worksheet.get_as_df

code here ...

  File "/app/orf_optimize.py", line 286, in main
    df = wks.get_as_df(start='A1', empty_value=np.nan, include_tailing_empty=False)
  File "/usr/local/lib/python3.7/site-packages/pygsheets/worksheet.py", line 1412, in get_as_df
    df = pd.DataFrame(values, columns=keys)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 474, in __init__
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 461, in to_arrays
    return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 500, in _list_to_arrays
    raise ValueError(e) from e
ValueError: 6 columns passed, passed data had 5 columns

System Information

OS: OSX
pygsheets version : 2.0.3.1
pygsheets installed from (github or pypi): pypi

The text was updated successfully, but these errors were encountered:

nithinmurali · 2020-05-09T06:56:46Z

Hi, it shouldn't throw an error but it does makes sense to forcefully enable include_tailing_empty=True when has_header=True.because values should match number of columns.

May i ask what is the behavior you are expecting when calling with has_header and include_tailing_empty=True?

aaroncooper · 2020-05-09T15:13:05Z

With has_header=True, include_tailing_empty=True, it makes sense to return all the columns on the Google sheet, which is the behavior I see now. I was thinking that when `has_header=True, include_tailing_empty=False', it would be better to return as many columns as there are header rows. That would avoid the above error, wouldn't it?

aaroncooper · 2020-05-09T15:13:58Z

Right now, I'm getting around this with this block of code, which seems a little convoluted:

df = wks.get_as_df(include_tailing_empty=True)
non_empty_cols = [col for col in df if col != '']
df = df[non_empty_cols]
df = df.fillna('')

I realize that non_empty_cols would be more properly named has_header_cols in the context of our discusson.

nithinmurali · 2020-05-14T12:48:55Z

The error is because when include_tailing_empty=False and some rows are empty then there will be a mismatch. the solution is either to auto fill the values according to the length of header or always set tailing_empty=True depending when header is True.

I think its better to go with 1st option.

aaroncooper · 2020-05-14T16:08:33Z

I agree with you-- if the cells are empty in the Google sheet, it makes sense to auto fill with empty_value

shenker · 2020-10-03T15:45:13Z

Just wanted to chime in to say it'd be great to get this fixed!

RussianImperialScott · 2021-02-01T02:23:44Z

The error is because when include_tailing_empty=False and some rows are empty then there will be a mismatch. the solution is either to auto fill the values according to the length of header or always set tailing_empty=True depending when header is True.

I think its better to go with 1st option.

Looking at the commit that closed out this issue (and the behavior of .get_as_df() that caused me to look into this) it looks like the 2nd option was chosen instead. This results in the return of extra columns with empty strings as names as previously described. Is there a particular reason why?

If I were to submit a pull request to change this behavior to option 1 instead, would it be better for the the .get_values() call return a perfect rectangle, or for .get_as_df() to fill in the missing values for the empty columns that have header entries?

Just a side note that returning extra columns with empty strings as names is very hard to debug. Interactively, pandas printouts do not make it obvious that those columns exist. And by specifying include_tailing_empty=False, the user doesn't even expect those phantom columns to exist. I would very much like for this to not be the default behavior. Thanks!

When include_tailing_empty=False, now returns a DataFrame just as wide as necessary to accommodate the header/data. Issues a warning if has_header=True and >=1 column name is an empty string.

nithinmurali · 2021-02-02T14:21:23Z

Merged @RussianImperialScott 's PR hence the default behavior is changed to option 1.

nithinmurali added the enhancement label May 15, 2020

nithinmurali closed this as completed in ed09f76 Oct 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include_tailing_empty=False causes crash in get_as_df when last column is empty #423

include_tailing_empty=False causes crash in get_as_df when last column is empty #423

aaroncooper commented May 6, 2020

nithinmurali commented May 9, 2020

aaroncooper commented May 9, 2020

aaroncooper commented May 9, 2020 •

edited

nithinmurali commented May 14, 2020

aaroncooper commented May 14, 2020

shenker commented Oct 3, 2020

RussianImperialScott commented Feb 1, 2021

nithinmurali commented Feb 2, 2021

include_tailing_empty=False causes crash in get_as_df when last column is empty #423

include_tailing_empty=False causes crash in get_as_df when last column is empty #423

Comments

aaroncooper commented May 6, 2020

nithinmurali commented May 9, 2020

aaroncooper commented May 9, 2020

aaroncooper commented May 9, 2020 • edited

nithinmurali commented May 14, 2020

aaroncooper commented May 14, 2020

shenker commented Oct 3, 2020

RussianImperialScott commented Feb 1, 2021

nithinmurali commented Feb 2, 2021

aaroncooper commented May 9, 2020 •

edited