Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat Series as 1-column dataframe in concat with dataframe #15047

Closed
jcrist opened this issue Jan 3, 2017 · 6 comments · Fixed by #56365
Closed

Treat Series as 1-column dataframe in concat with dataframe #15047

jcrist opened this issue Jan 3, 2017 · 6 comments · Fixed by #56365
Assignees
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jcrist
Copy link
Contributor

jcrist commented Jan 3, 2017

When concatenating a dataframe and series along axis 1, the docs indicate that the series will be treated as a series with a single column matching the column name. Based on this, I'd expect similar behavior (or an error) when concatenating a named series with a dataframe along axis 0. However, this seems to result in a new numeric column for the named series instead of converting the series to a dataframe:

In [10]: pd.__version__
Out[10]: '0.19.1'

In [11]: df = pd.DataFrame({'x': range(5), 'y': range(5), 'z': range(5)})

In [12]: pd.concat([df, df.x])
Out[12]:
     x    y    z    0
0  0.0  0.0  0.0  NaN
1  1.0  1.0  1.0  NaN
2  2.0  2.0  2.0  NaN
3  3.0  3.0  3.0  NaN
4  4.0  4.0  4.0  NaN
0  NaN  NaN  NaN  0.0
1  NaN  NaN  NaN  1.0
2  NaN  NaN  NaN  2.0
3  NaN  NaN  NaN  3.0
4  NaN  NaN  NaN  4.0

In [13]: pd.concat([df, df.x.to_frame()])
Out[13]:
   x    y    z
0  0  0.0  0.0
1  1  1.0  1.0
2  2  2.0  2.0
3  3  3.0  3.0
4  4  4.0  4.0
0  0  NaN  NaN
1  1  NaN  NaN
2  2  NaN  NaN
3  3  NaN  NaN
4  4  NaN  NaN

I'd expect either an error, or the behavior in out[13].

@jreback
Copy link
Contributor

jreback commented Jan 3, 2017

https://github.com/pandas-dev/pandas/blob/master/pandas/tools/merge.py#L1580 is the reason this ends up 0, but I don't remember why that is.

Will mark this as a bug.

@jreback jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 3, 2017
@jreback jreback added this to the Next Major Release milestone Jan 3, 2017
@sinhrks
Copy link
Member

sinhrks commented Jan 4, 2017

+1 for resulting in out[13] rather than error.

@jcrist
Copy link
Contributor Author

jcrist commented Jan 4, 2017

Looks like with a recent commit, the line got bumped. Here's a commit specific link.

Looking at the current logic (and trying it out), it seems that this is definitely a bug instead of a difference in expectations - since the series is always interpreted as a frame with name 0, you get really unexpected behavior if there's a column also named 0:

In [23]: df = pd.DataFrame({'x': range(5), 'y': range(5), 0: range(5)})

In [24]: pd.concat([df, df.x])
/Users/jcrist/anaconda/envs/dask/lib/python3.5/site-packages/pandas/indexes/api.py:71: RuntimeWarning: unorderable types: str() > int(), sort order is undefined for incomparable objects
  result = result.union(other)
Out[24]:
     x  0    y
0  0.0  0  0.0
1  1.0  1  1.0
2  2.0  2  2.0
3  3.0  3  3.0
4  4.0  4  4.0
0  NaN  0  NaN
1  NaN  1  NaN
2  NaN  2  NaN
3  NaN  3  NaN
4  NaN  4  NaN

@sinhrks
Copy link
Member

sinhrks commented Jan 21, 2017

Looked into a little, and found current behavior is to replace empty Series name with numbers when concat(axis=1). Will send a PR to fix keeping current axis=1 behavior.

@rob-sil
Copy link
Contributor

rob-sil commented Dec 6, 2023

I'm guessing that concat needs to set the Series name to 0 for the specific case where ignore_index=True and axis=0 (equivalently axis="index"). When all the input series are renamed 0, they line up nicely for concatenation. However, the code that sets names to 0 still runs when ignore_index=False, causing this bug.

That specific code happens right after the code for ignore_index=True and axis=1, so perhaps it just belongs as part of the earlier if statement. Looks like it would run correctly with an extra indent.

if ignore_index or name is None:
name = current_column
current_column += 1
# doing a row-wise concatenation so need everything
# to line up
if self._is_frame and axis == 1:
name = 0

(Note: axis has been flipped at this point for data frame outputs.)

@rob-sil
Copy link
Contributor

rob-sil commented Dec 6, 2023

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants