-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Appending categorical data should be more flexible #12699
Comments
xref #10409 . where on a for other dtypes we are a bit friendlier in I think that if the concat/merge dtypes match then we should preserve, otherwise cast to The current rationale is that So I think would be ok to keep dtypes as appropriate and upcast if needed (only small change is needed in |
not the most elegant solution, but works for now. Pandas has already annoyed me this evening...
I'd love see this handled by pandas. Another thing about categorical data is calling |
No, please no: this is only correct for your use of a categorical, and is wrong for things like objects, integers and so on. It is also wrong when the categorical is ordered (NA is "outside the order", but Individual categoricals are alike to See #8640 for a
My vote would go to "fail" because "explicit is better than implicit": I've seen too many (dict) lookups fail because of int/float conversations when a NA is included in an int column and in this case it even has the memory penalty. On the other hand it would be inconsistent with the rest of pandas which does upcasting... |
For those who need to concatenate now, here's a quick example of how it could be done (in Python 3): https://gist.github.com/tdhopper/91f03250892c12c6e0d35ca6d2ade1ca |
@jreback Should've explained that my complete dataframe is so big that I can't convert to objects without running out of ram. |
Then you can do it iteratively, e.g. chunk it, concat, convert back to categoricals. kind of like an in-memory with an on-disk merged. |
Please Jeff, could you elaborate a bit more about your last comment? I would need to concat In disk and not loosing categorías Thx |
actually this is closed by #13361 |
I ran into this issue today, and it seems like it should be a fairly common situation. I have imported two dataframes (using
pandas.read_stata
) of categorical data that I want to concatenate. One of them might not have an instance of every category that the other one has, so pandas won't concatenate. It seems like it would be more flexible if it could add all missing categories.I know that this inflexibility is in the documentation, but I wonder why it exists. Is there a good reason why pandas shouldn't automatically append new categories as they are encountered?
In
Expected Output
Actual Output
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: