BUG: can't concatenate DataFrame with Series with duplicate keys #33654

MarcoGorelli · 2020-04-19T14:58:58Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1,2,3], 'b': [1,2,3]})
>>> s1 = pd.Series([1,2,3], name='a')
>>> s2 = pd.Series([1,2,3], name='a')
>>>pd.concat([df, s1, s2], axis=1, keys=['a', 'b', 'b'])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'

full traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-f6a5f4790f76> in <module>
      3 s1 = pd.Series([1,2,3], name='a')
      4 s2 = pd.Series([1,2,3], name='a')
----> 5 pd.concat([df, s1, s2], axis=1, keys=['a', 'b', 'b'])

~/pandas-dev/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    269     ValueError: Indexes have overlapping values: ['a']
    270     """
--> 271     op = _Concatenator(
    272         objs,
    273         axis=axis,

~/pandas-dev/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    449         self.copy = copy
    450 
--> 451         self.new_axes = self._get_new_axes()
    452 
    453     def get_result(self):

~/pandas-dev/pandas/core/reshape/concat.py in _get_new_axes(self)
    512     def _get_new_axes(self) -> List[Index]:
    513         ndim = self._get_result_dim()
--> 514         return [
    515             self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
    516             for i in range(ndim)

~/pandas-dev/pandas/core/reshape/concat.py in <listcomp>(.0)
    513         ndim = self._get_result_dim()
    514         return [
--> 515             self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
    516             for i in range(ndim)
    517         ]

~/pandas-dev/pandas/core/reshape/concat.py in _get_concat_axis(self)
    569             concat_axis = _concat_indexes(indexes)
    570         else:
--> 571             concat_axis = _make_concat_multiindex(
    572                 indexes, self.keys, self.levels, self.names
    573             )

~/pandas-dev/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    651             names = names + get_consensus_names(indexes)
    652 
--> 653         return MultiIndex(
    654             levels=levels, codes=codes_list, names=names, verify_integrity=False
    655         )

~/pandas-dev/pandas/core/indexes/multi.py in __new__(cls, levels, codes, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    281         # we've already validated levels and codes, so shortcut here
    282         result._set_levels(levels, copy=copy, validate=False)
--> 283         result._set_codes(codes, copy=copy, validate=False)
    284 
    285         result._names = [None] * len(levels)

~/pandas-dev/pandas/core/indexes/multi.py in _set_codes(self, codes, level, copy, validate, verify_integrity)
    880 
    881         if level is None:
--> 882             new_codes = FrozenList(
    883                 _coerce_indexer_frozen(level_codes, lev, copy=copy).view()
    884                 for lev, level_codes in zip(self._levels, codes)

~/pandas-dev/pandas/core/indexes/multi.py in <genexpr>(.0)
    881         if level is None:
    882             new_codes = FrozenList(
--> 883                 _coerce_indexer_frozen(level_codes, lev, copy=copy).view()
    884                 for lev, level_codes in zip(self._levels, codes)
    885             )

~/pandas-dev/pandas/core/indexes/multi.py in _coerce_indexer_frozen(array_like, categories, copy)
   3681         Non-writeable.
   3682     """
-> 3683     array_like = coerce_indexer_dtype(array_like, categories)
   3684     if copy:
   3685         array_like = array_like.copy()

~/pandas-dev/pandas/core/dtypes/cast.py in coerce_indexer_dtype(indexer, categories)
    866     length = len(categories)
    867     if length < _int8_max:
--> 868         return ensure_int8(indexer)
    869     elif length < _int16_max:
    870         return ensure_int16(indexer)

~/pandas-dev/pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_int8()
     59             return arr
     60         else:
---> 61             return arr.astype(np.int8, copy=copy)
     62     else:
     63         return np.array(arr, dtype=np.int8)

TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'

Problem description

Noticed while working on #30858, I think this one needs to be solved first if we want to solve the ohlc case

Expected Output

   a     b  b
   a  b  a  a
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : e878fdc
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-46-generic
Version : #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.0.dev0+1302.ge878fdc41
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200325
Cython : 0.29.16
pytest : 5.4.1
hypothesis : 5.8.0
sphinx : 3.0.0
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 19, 2020

This was referenced Apr 19, 2020

BUG: aggregations were getting overwritten if they had the same name #30858

Merged

BUG: can't concatenate DataFrame with Series with duplicate keys #33805

Merged

jreback added this to the 1.1 milestone May 1, 2020

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 1, 2020

WillAyd closed this as completed in #33805 May 1, 2020

MarcoGorelli mentioned this issue May 28, 2020

concatenating frame and series with identical keys returns " int() argument must be a string" #33114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

MarcoGorelli commented Apr 19, 2020 •

edited

Loading

INSTALLED VERSIONS

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

Comments

MarcoGorelli commented Apr 19, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

MarcoGorelli commented Apr 19, 2020 •

edited

Loading

Output of `pd.show_versions()`