Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: Index/Series concat inconsistencies #13626

Closed
sinhrks opened this issue Jul 12, 2016 · 4 comments · Fixed by #13660
Closed

BUG/API: Index/Series concat inconsistencies #13626

sinhrks opened this issue Jul 12, 2016 · 4 comments · Fixed by #13660
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Jul 12, 2016

xref #7795 #13221. Series/Index concat-like op which triggers object-coercion is not well tested. Followings are needed:

  • Refactor concat internal to make it consistent / stabled.
  • Add comprehensive tests to cover both Index/Series object-coercion cases

Code Sample, a copy-pastable example if possible

Found some problems below:

# NG. though it looks OK, the objects are python built-in (must be Timestamp and Timedelta)
s = pd.Series([pd.Timestamp('2011-01-01')]).append(pd.Series([pd.Timedelta('1 days')]))
s
#0    2011-01-01 00:00:00
#0         1 day, 0:00:00
# dtype: object

type(s.iloc[0]), type(s.iloc[-1])
# (datetime.datetime, datetime.timedelta)
# NG, the result must be object dtype contaions Timestamp and Timedelta
idx = pd.DatetimeIndex(['2011-01-01']).append(pd.TimedeltaIndex(['1 days']))
idx
# Index([2011-01-01 00:00:00, 86400000000000], dtype='object')

type(idx[0]), type(idx[-1])
# (pandas.tslib.Timestamp, int)

Expected Output

  • result must be object dtype containing Timestamp and Timedelta (not datetime and timedelta)

output of pd.show_versions()

0.18.1

@sinhrks sinhrks added this to the 0.19.0 milestone Jul 12, 2016
@jreback
Copy link
Contributor

jreback commented Jul 12, 2016

I think this mostly has to do with pre-concat coercion in .append. Instead I would fix this logic.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Jul 12, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Jul 14, 2016

@jreback maybe my wording was misleading. Prepared a draft PR #13660 to show my intention. You're correct that the changes are mostly on .append (much simplified I think).

BTW, I found 2 more issues, fixing in the same PR.

1. CategoricalIndex.append may regard list as value

# raise ValueError, ok (error message may better to be more appropriate) 
pd.Index([1, 2, 3]).append([1, 2, 3])
# ValueError: all the input arrays must have same number of dimensions

# NG, concatenated as values if categories are the same 
pd.CategoricalIndex([1, 2, 3]).append([1, 2, 3])
# CategoricalIndex([1, 2, 3, 1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category')

2. Series.append may raise AmbiguousTimeError

# DatetimeIndex, OK
dti = pd.date_range('2017-11-05', freq='H', periods=3, tz='US/Eastern')
dti.append(dti) 
# DatetimeIndex(['2017-11-05 00:00:00-04:00', '2017-11-05 01:00:00-04:00',
#                '2017-11-05 01:00:00-05:00', '2017-11-05 00:00:00-04:00',
#                '2017-11-05 01:00:00-04:00', '2017-11-05 01:00:00-05:00'],
#               dtype='datetime64[ns, US/Eastern]', freq=None)

# Series, NG
pd.Series(dti).append(pd.Series(dti))
# AmbiguousTimeError: Cannot infer dst time from Timestamp('2017-11-05 01:00:00'), try using the 'ambiguous' argument

@jreback
Copy link
Contributor

jreback commented Jul 15, 2016

@sinhrks actually was tinkin about the appending logic in DataFrame (which coerces Series to the proper shape), but yes Index appending seems not well tested / working. In fact this generated lots of weird cases in indexing.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 19, 2016

#13660 has been almost finished, found following 2 more bugs:

3. name is not properly reset when mismatch

being fixed.

# Index NG, name must be None
pd.Index([1, 2], name='x').append(pd.Index([3, 4]))
# Int64Index([1, 2, 3, 4], dtype='int64', name=u'x')

# Series, OK (name is None)
pd.Series([1, 2], name='x').append(pd.Series([3, 4]))
# 0    1
# 1    2
# 0    3
# 1    4
# dtype: int64

4. Categorical concat inconsistency

related to #13524. I've marked inconsistent points as "??" (including the difference of raised error)

Series

# OK, Categorical + Categorical which has the same category = Categorical
s = pd.Series([1, 2], dtype='category')
s.append(s)
# 0    1
# 1    2
# 0    1
# 1    2
# dtype: category
# Categories (2, int64): [1, 2]

# ??, Categorical + Categorical which has the different category = ValueError
s.append(pd.Series([3, 4], dtype='category'))
ValueError: incompatible categories in categorical concat


# ??, Categorical + other dtype = object
s.append(pd.Series([1, 2]))
# 0    1
# 1    2
# 0    1
# 1    2
# dtype: object

Index

# OK, Categorical + Categorical which has the same category = Categorical
i = pd.Index([1, 2], dtype='category')
i.append(i)
# CategoricalIndex([1, 2, 1, 2], categories=[1, 2], ordered=False, dtype='category')

# ??, Categorical + Categorical which has the different category = TypeError
i.append(pd.CategoricalIndex([3, 4]))
# TypeError: categories must match existing categories when appending

# ??, Categorical + other dtype whicn in the category = Categorical
i.append(pd.Index([1, 2]))
# CategoricalIndex([1, 2, 1, 2], categories=[1, 2], ordered=False, dtype='category')

# ??, Categorical + other dtype which is not in the category = TypeError
i.append(pd.Index([3, 4]))
# TypeError: cannot append a non-category item to a CategoricalIndex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants