Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG/API: Index.append with mixed object/Categorical indices #14545
Conversation
jorisvandenbossche
added Reshaping Categorical
labels
Oct 31, 2016
jorisvandenbossche
added this to the
0.19.1
milestone
Oct 31, 2016
|
In the current PR, I just removed the check for category, which is the more intrusive change, as now the return result will always be object dtype if not self and all objects to append are CategoricalIndex. |
| - typs = _concat.get_dtype_kinds(to_concat) | ||
| - | ||
| - if 'category' in typs: | ||
| + if self.is_categorical(): | ||
| # if any of the to_concat is category |
jreback
Nov 1, 2016
Contributor
then you should change this comment as this is no longer true (the Index must be a CI to return a CI) only.
codecov-io
commented
Nov 1, 2016
•
Current coverage is 85.27% (diff: 100%)@@ master #14545 diff @@
==========================================
Files 140 140
Lines 50693 50693
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43229 43229
Misses 7464 7464
Partials 0 0
|
|
OK, so I changed the PR to only fix the issue when the calling Index is not of Categorical dtype (which caused the bug), and left the behaviour of That is probably the easiest for now, to have this fix in 0.19.1, but we should then discuss the behaviour of |
|
+1 on the behavior here (coercing Categorical to Index). I'd feel slightly better if the monkey patching of from contextlib import contextmanager
@contextmanager
def stdout():
sys.stdout = StringIO()
yield
sys.stdout = sys.__stdout__but not a huge deal. |
I just copied the approach from other tests :-), so would be a nice clean-up in general! |
|
don't re-invent the wheel, the pattern is:
|
| + df = pd.DataFrame(np.zeros((2, 2)), index=idx, columns=idx) | ||
| + | ||
| + import sys | ||
| + sys.stdout = StringIO() |
jorisvandenbossche
merged commit 252526c
into pandas-dev:master
Nov 3, 2016
jorisvandenbossche
added a commit
that referenced
this pull request
Nov 3, 2016
|
|
jorisvandenbossche |
dbc19da
|
jorisvandenbossche
referenced
this pull request
Nov 4, 2016
Open
API: Index.append behaviour with categoricals #14586
yarikoptic
added a commit
to neurodebian/pandas
that referenced
this pull request
Nov 18, 2016
|
|
yarikoptic |
dd3759d
|
amolkahat
added a commit
to amolkahat/pandas
that referenced
this pull request
Nov 26, 2016
|
|
jorisvandenbossche + amolkahat |
5319729
|
jorisvandenbossche commentedOct 31, 2016
Closes #14298, related to #13660 and #13767
This closes the
info()issue with CategoricalIndex, but the underlying issue is actually related to Index.append, when appending mixed dtype index objects:This error occurs when the calling index is not a Categorical (because of https://github.com/pandas-dev/pandas/blob/v0.19.0/pandas/indexes/base.py#L1439 the Categorical
_is_dtype_compatmethod is used anyway).But the question is, what should the result be of the above expression?
CategoricalIndex? -> IMO not, as the calling index is not a CategoricalIndex, the result should also not be one (regardless of what is appended) -> so the intention of the code is IMO also not correctBut, the more important discussion is what to do when the calling index is a CategoricalIndex ? In 0.18 (and in 0.19.0) this either returned a CategoricalIndex or raised an error:
But, for the
concatcase we decided to return an object dtyped Index in those cases instead of coercing or raising (see discussion in #13767 and the summary table with rules in #13767 (comment)). So the question is, do think thatappendshould follow the same rules asconcat. Or, do we want to be more flexible forappend, and let it depend on the type of the calling Index?cc @sinhrks