New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.concat raises if called on mixture of empty and non-empty dataframes #18178

Closed
JosephWagner opened this Issue Nov 8, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@JosephWagner
Contributor

JosephWagner commented Nov 8, 2017

I noticed a change in how pd.concat works between 0.20.3 and 0.21.0:

import pandas as pd

df1 = pd.DataFrame({'foo': [1]})
df2 = pd.DataFrame({'foo': []})

res = pd.concat([df1, df2])

This example does not raise an exception in 0.20.3. In 0.21.0, it raises the following error:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    res = pd.concat([df1, df2])
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 213, in concat
    return op.get_result()
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 408, in get_result
    copy=self.copy)
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/internals.py", line 5202, in concatenate_block_managers
    return BlockManager(blocks, axes)
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/internals.py", line 3028, in __init__
    self._verify_integrity()
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/internals.py", line 3239, in _verify_integrity
    construction_error(tot_items, block.shape[1:], self.axes)
  File "/homes/joewag/miniconda3/envs/py3/lib/python3.6/site-packages/pandas/core/internals.py", line 4603, in construction_error
    passed, implied))
ValueError: Shape of passed values is (1, 1), indices imply (1, 0)

Expected Output

I would expect no error, and all(res==df1) to be true

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.1.1.el6.centos.plus.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.2.2.post20170724
Cython: 0.26
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.4
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.0
numexpr: 2.6.2
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.1.14
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 8, 2017

Contributor

This check: https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape/concat.py#L293 is a little bogus, instead if this were np.prod(obj.shape) then it would filter properly.

want to give this a shot?

Contributor

jreback commented Nov 8, 2017

This check: https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape/concat.py#L293 is a little bogus, instead if this were np.prod(obj.shape) then it would filter properly.

want to give this a shot?

@jreback jreback added this to the Next Major Release milestone Nov 8, 2017

@jreback jreback changed the title from pd.concat raises if called on mixture of empty and non-empty dataframes to BUG: pd.concat raises if called on mixture of empty and non-empty dataframes Nov 8, 2017

@SmokinCaterpillar

This comment has been minimized.

Show comment
Hide comment
@SmokinCaterpillar

SmokinCaterpillar Nov 9, 2017

Contributor

While trying to fix this, I stumbled upon #18187. There is also unexpected behavior if an empty and non-empty series are concatenated, which does not fail, but simply returns an empty series.

Contributor

SmokinCaterpillar commented Nov 9, 2017

While trying to fix this, I stumbled upon #18187. There is also unexpected behavior if an empty and non-empty series are concatenated, which does not fail, but simply returns an empty series.

@jorisvandenbossche jorisvandenbossche added Regression and removed Bug labels Nov 9, 2017

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.21.1 Nov 9, 2017

@SmokinCaterpillar

This comment has been minimized.

Show comment
Hide comment
@SmokinCaterpillar

SmokinCaterpillar Nov 9, 2017

Contributor

@jreback the straight forward fix via np.prod(obj.shape) does not seem to work, simply replacing sum(obj.shape) by it leads to a failure of test_append_length0_frame at https://github.com/pandas-dev/pandas/blob/master/pandas/tests/reshape/test_concat.py#L760

Edit: I'm not so sure if "Effort Low" is the correct tag here :-D.
Probably someone has to solve this and #18187 simultaneously by fixing some changes from 0.20.x to 0.21.0. I'll try to look into it, but I don't know if I am able to find a solution in a reasonable amount of time.

Contributor

SmokinCaterpillar commented Nov 9, 2017

@jreback the straight forward fix via np.prod(obj.shape) does not seem to work, simply replacing sum(obj.shape) by it leads to a failure of test_append_length0_frame at https://github.com/pandas-dev/pandas/blob/master/pandas/tests/reshape/test_concat.py#L760

Edit: I'm not so sure if "Effort Low" is the correct tag here :-D.
Probably someone has to solve this and #18187 simultaneously by fixing some changes from 0.20.x to 0.21.0. I'll try to look into it, but I don't know if I am able to find a solution in a reasonable amount of time.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 9, 2017

Contributor

I'm not so sure if "Effort Low" is the correct tag here :-D.

sure it is, its not 0 effort, rather a an hour or 2

Contributor

jreback commented Nov 9, 2017

I'm not so sure if "Effort Low" is the correct tag here :-D.

sure it is, its not 0 effort, rather a an hour or 2

SmokinCaterpillar added a commit to flix-tech/pandas that referenced this issue Nov 9, 2017

Fix for pandas-dev#18178 and pandas-dev#18187 by changing the concat …
…of empty RangeIndex

The `_concat_rangeindex_same_dtype` now keeps track of the last non-empty RangeIndex to extract the new stop value.

This fixes two issues with concatenating non-empty and empty DataFrames and Series.

Two regression tests were added as well.

jreback added a commit that referenced this issue Nov 10, 2017

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

TomAugspurger added a commit that referenced this issue Dec 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment