-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping original categories when subsetting an AnnData
object
#997
Comments
AnnData
objectsAnnData
object
Asking also @giovp since when plotting categorical values in Squidpy could have had to deal with this behavior. |
Please notice that when subsetting a |
Created a simple function to fix this: def _inplace_fix_subset_categorical_obs(subset_adata: AnnData, original_adata: AnnData) -> None:
"""
Fix categorical obs columns of subset_adata to match the categories of original_adata.
Parameters
----------
subset_adata
The subset AnnData object
original_adata
The original AnnData object
Notes
-----
See discussion here: https://github.com/scverse/anndata/issues/997
"""
obs = subset_adata.obs
for column in obs.columns:
is_categorical = pd.api.types.is_categorical_dtype(obs[column])
if is_categorical:
c = obs[column].cat.set_categories(original_adata.obs[column].cat.categories)
obs[column] = c |
hat's the purpose of this? cause if it is for the colormap saved in |
I use it in the I also had a related problem in the past. When initializing colors with |
This behavior is intentional, as anndata explicitly calls a method named anndata/anndata/_core/anndata.py Lines 343 to 346 in e067515
|
@flying-sheep I believe we have this behavior since we were early adopters of categorical values, and pandas threw a ton of errors if you passed categorical arrays where there was no instance of a particular category. However, pandas also did not remove categories when subsetting, nor did it make doing that easy. I think the situation is much improved, and we should figure out how to deprecate this behavior. |
The colors saved in the anndata should be updated so they don't change when unused categoricals are removed. However, we should probably just save the mapping of categories to colors as a dict, and not need to bother keeping these things synced. |
This has previously been requested and tracked in #890, so I'm going to close this as duplicated |
If one subsets an
AnnData
object that has a categoricalobs
column, the new categories of the column will be only the one that appear in the subset column, an not all the original one.Is this a bug or the expected behavior?
If it's not a bug, is there an helper function/standard way to keep all the original categories?
The text was updated successfully, but these errors were encountered: