Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Categorical.from_codes shouldn't coerce to int64 #18501
Categorical.from_codes coerces its input to an array of np.int64 unconditionally even though the Categorical constructor immediately coerces the input to some other dtype using coerce_indexer_dtype. This coercion might cause a memory usage spike when codes is large. ISTM that we can just avoid the conversion in from_codes entirely and let coerce_indexer_dtype take care of any error case.
Should be able to wrap this:
if not is_integer_dtype(codes): # do the try / except
And see what breaks. @dcolascione could you submit a PR for that, along with tests and a release note?
Note that we even if we avoid the cast to
In : pd.Categorical.from_codes(codes=np.asarray([0,1], np.int16), categories=["foo", "bar"]).codes.dtype Out: dtype('int8')
Avoiding all copies may be more difficult, but possible.
ok this looks fine to do