.astype(SparseDtype(float)) on empty dataframe leads to "ValueError: No objects to concatenate" #33113

tgy · 2020-03-29T08:45:14Z

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.DataFrame().astype(pd.SparseDtype(float))

Problem description

Converting an empty dataframe to a sparse representation leads to a ValueError.

ValueError: No objects to concatenate

Expected Output

I expected this to work even though the dataframe is empty.

Output of `pd.show_versions()`

>>> pandas.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.0.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.3
numpy            : 1.18.2
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 19.3.1
setuptools       : 45.0.0
Cython           : None
pytest           : 5.2.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.4.2
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.11.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.4.2
matplotlib       : 3.1.2
numexpr          : 2.7.0
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.2.2
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : None
tables           : 3.6.0
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None
numba            : 0.46.0

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-03-29T15:48:09Z

Thanks @tgy for the report. The same exception is raised for Int64Dtype, StringDtype and BooleanDtype and maybe more.

tgy · 2020-03-29T15:50:19Z

@simonjayhawkins yep, i used float as an example but i think it fails for all the types because the code clearly does the same thing for all these types. see #33118

KardoPaska · 2020-06-19T21:24:19Z

My motivation is to make one line of code robust to "blank" inputs. And I encountered the same kind of error for slightly different input...

summarys = [{'testid': 'abc', 'avg': 1.2, 'count': 5, 'ok': True}, # Pass
            {'testid': 'xyz',             'count': 0, 'ok': False},# Pass
                                                           dict()] # Fail (empty)
summary_dtypes = {'testid': str, 'avg': float, 'count': pd.Int64Dtype(), 'ok': bool}
actual_cols_set = {kk for x in summarys for kk in x.keys()}
actual_dtypes = {k: v for k,v in summary_dtypes.items() if k in actual_cols_set}

This is all good (aside from "missing" bool being True)...

>>> pd.DataFrame(summarys).astype(dtype=actual_dtypes)

  testid  avg  count     ok
0    abc  1.2      5   True
1    xyz  NaN      0  False
2    NaN  NaN   <NA>   True

this is also fine...

>>> pd.DataFrame([dict(), dict()])

Empty DataFrame
Columns: []
Index: [0, 1]

... but... for a list of empty dict(), my actual_dtypes will also be an empty dict()...

>>> pd.DataFrame([dict(), dict()]).astype(dtype=dict())

ValueError: No objects to concatenate

So perhaps a short circuit for empty dataframes? (instead of raising error?)

tgy · 2020-06-21T15:46:04Z

@KardoPaska My small change on #33118 seems to fix your issue as well:

>>> import pandas as pd
>>> pd.DataFrame([dict(), dict()]).astype(dtype=dict())
Empty DataFrame
Columns: []
Index: [0, 1]

Waiting for @jreback to approve the change on the PR

tgy · 2020-06-21T20:21:37Z

Shouldn't the code be pd.DataFrame([dict(), dict()]).astype(dtype=dict) instead? (dict instead of dict() in the dtype)? That works on master.

…andas-dev#33118)

tgy mentioned this issue Mar 29, 2020

BUG: conversion of empty DataFrame to SparseDtype (#33113) #33118

Merged

5 tasks

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Mar 29, 2020

simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Sparse Sparse Data Type labels Mar 29, 2020

simonjayhawkins added this to the 1.1 milestone Jun 22, 2020

jreback added the Sparse Sparse Data Type label Jun 25, 2020

jreback closed this as completed in #33118 Jun 25, 2020

jreback pushed a commit that referenced this issue Jun 25, 2020

BUG: conversion of empty DataFrame to SparseDtype (#33113) (#33118)

e37ff6e

fangchenli pushed a commit to fangchenli/pandas that referenced this issue Jun 27, 2020

BUG: conversion of empty DataFrame to SparseDtype (pandas-dev#33113) (p…

c93a3b3

…andas-dev#33118)

simonjayhawkins mentioned this issue Jul 29, 2020

BUG: Empty dataframe with .astype raises Exception #35457

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.astype(SparseDtype(float)) on empty dataframe leads to "ValueError: No objects to concatenate" #33113

.astype(SparseDtype(float)) on empty dataframe leads to "ValueError: No objects to concatenate" #33113

tgy commented Mar 29, 2020 •

edited

Loading

simonjayhawkins commented Mar 29, 2020

tgy commented Mar 29, 2020

KardoPaska commented Jun 19, 2020 •

edited

Loading

tgy commented Jun 21, 2020

tgy commented Jun 21, 2020

.astype(SparseDtype(float)) on empty dataframe leads to "ValueError: No objects to concatenate" #33113

.astype(SparseDtype(float)) on empty dataframe leads to "ValueError: No objects to concatenate" #33113

Comments

tgy commented Mar 29, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

simonjayhawkins commented Mar 29, 2020

tgy commented Mar 29, 2020

KardoPaska commented Jun 19, 2020 • edited Loading

tgy commented Jun 21, 2020

tgy commented Jun 21, 2020

tgy commented Mar 29, 2020 •

edited

Loading

Output of `pd.show_versions()`

KardoPaska commented Jun 19, 2020 •

edited

Loading