Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

Closed
3 tasks done
MarcoGorelli opened this issue Apr 19, 2020 · 0 comments · Fixed by #33805
Closed
3 tasks done

BUG: can't concatenate DataFrame with Series with duplicate keys #33654

MarcoGorelli opened this issue Apr 19, 2020 · 0 comments · Fixed by #33805
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Apr 19, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1,2,3], 'b': [1,2,3]})
>>> s1 = pd.Series([1,2,3], name='a')
>>> s2 = pd.Series([1,2,3], name='a')
>>>pd.concat([df, s1, s2], axis=1, keys=['a', 'b', 'b'])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'
full traceback
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-f6a5f4790f76> in <module>
      3 s1 = pd.Series([1,2,3], name='a')
      4 s2 = pd.Series([1,2,3], name='a')
----> 5 pd.concat([df, s1, s2], axis=1, keys=['a', 'b', 'b'])

~/pandas-dev/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    269     ValueError: Indexes have overlapping values: ['a']
    270     """
--> 271     op = _Concatenator(
    272         objs,
    273         axis=axis,

~/pandas-dev/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    449         self.copy = copy
    450 
--> 451         self.new_axes = self._get_new_axes()
    452 
    453     def get_result(self):

~/pandas-dev/pandas/core/reshape/concat.py in _get_new_axes(self)
    512     def _get_new_axes(self) -> List[Index]:
    513         ndim = self._get_result_dim()
--> 514         return [
    515             self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
    516             for i in range(ndim)

~/pandas-dev/pandas/core/reshape/concat.py in <listcomp>(.0)
    513         ndim = self._get_result_dim()
    514         return [
--> 515             self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
    516             for i in range(ndim)
    517         ]

~/pandas-dev/pandas/core/reshape/concat.py in _get_concat_axis(self)
    569             concat_axis = _concat_indexes(indexes)
    570         else:
--> 571             concat_axis = _make_concat_multiindex(
    572                 indexes, self.keys, self.levels, self.names
    573             )

~/pandas-dev/pandas/core/reshape/concat.py in _make_concat_multiindex(indexes, keys, levels, names)
    651             names = names + get_consensus_names(indexes)
    652 
--> 653         return MultiIndex(
    654             levels=levels, codes=codes_list, names=names, verify_integrity=False
    655         )

~/pandas-dev/pandas/core/indexes/multi.py in __new__(cls, levels, codes, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    281         # we've already validated levels and codes, so shortcut here
    282         result._set_levels(levels, copy=copy, validate=False)
--> 283         result._set_codes(codes, copy=copy, validate=False)
    284 
    285         result._names = [None] * len(levels)

~/pandas-dev/pandas/core/indexes/multi.py in _set_codes(self, codes, level, copy, validate, verify_integrity)
    880 
    881         if level is None:
--> 882             new_codes = FrozenList(
    883                 _coerce_indexer_frozen(level_codes, lev, copy=copy).view()
    884                 for lev, level_codes in zip(self._levels, codes)

~/pandas-dev/pandas/core/indexes/multi.py in <genexpr>(.0)
    881         if level is None:
    882             new_codes = FrozenList(
--> 883                 _coerce_indexer_frozen(level_codes, lev, copy=copy).view()
    884                 for lev, level_codes in zip(self._levels, codes)
    885             )

~/pandas-dev/pandas/core/indexes/multi.py in _coerce_indexer_frozen(array_like, categories, copy)
   3681         Non-writeable.
   3682     """
-> 3683     array_like = coerce_indexer_dtype(array_like, categories)
   3684     if copy:
   3685         array_like = array_like.copy()

~/pandas-dev/pandas/core/dtypes/cast.py in coerce_indexer_dtype(indexer, categories)
    866     length = len(categories)
    867     if length < _int8_max:
--> 868         return ensure_int8(indexer)
    869     elif length < _int16_max:
    870         return ensure_int16(indexer)

~/pandas-dev/pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_int8()
     59             return arr
     60         else:
---> 61             return arr.astype(np.int8, copy=copy)
     62     else:
     63         return np.array(arr, dtype=np.int8)

TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'

Problem description

Noticed while working on #30858, I think this one needs to be solved first if we want to solve the ohlc case

Expected Output

   a     b  b
   a  b  a  a
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : e878fdc
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-46-generic
Version : #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.0.dev0+1302.ge878fdc41
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200325
Cython : 0.29.16
pytest : 5.4.1
hypothesis : 5.8.0
sphinx : 3.0.0
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 19, 2020
@jreback jreback added this to the 1.1 milestone May 1, 2020
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants