Join GitHub today
Dropped categories when constructing Series or DF with categorical dtype, scalar data, and list index #19565
Code Sample, a copy-pastable example if possible
import pandas as pd # You need >= pandas 0.21 for this to work. from pandas.api.types import CategoricalDtype # Create categorical type with three unordered categories for future use cats = ['a', 'b', 'c'] catType = CategoricalDtype(categories=cats, ordered=False) # Use categorical type to create series from list s1 = pd.Series(['a', 'a'], dtype=catType) # Use categorical type to create series from scalar with two element # index (constructor does broadcasting) s2 = pd.Series('a', index=s1.index, dtype=catType) # I expect s1 and s2 to be identical. They are not. print(s1) print(s2) #notice only one category is shown for the dtype # I can assign any member of the original categories to s1. s1.loc = 'c' # However, this call will fail try: s2.loc = 'c' except: print(" ") print("Code in try block fails because of dropped category members") print(" ") # Work around: explicitly call add_categories to replace info lost by the constructor s2.cat.add_categories(cats[1:], inplace=True) s2.loc = 'c'
The constructors of both Series and DataFrame accept the combination of a scalar value and a n-element index. With given this calling syntax, they mimic numpy broadcasting/scalar expansion and repeat the scalar value n times to produce the object.
This behavior is broken by specifying a pre-defined categorical dtype. The dtype of the resulting pandas object will only have one category corresponding to the original scalar. All other categories associated with the dtype are lost.
If, rather, the constructor is called with n-element data and n-element index, all categories are retained in the dtype, even of they were not all included in the n-element data.
In the above code, I expect s1 and s2 to be identical. Further, I expect to be able to assign other category members to s2.
For s1 I get this expected output:
but for for s2 I get this output showing only 1 category member: