Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obs_names and var_names as character/str or not #73

Open
rcannood opened this issue May 6, 2023 · 3 comments
Open

obs_names and var_names as character/str or not #73

rcannood opened this issue May 6, 2023 · 3 comments

Comments

@rcannood
Copy link
Collaborator

rcannood commented May 6, 2023

During the hackathon, I think it was mentioned that obs_names and var_names might not just be an array of strings, but other dtypes would also be supported. This would cause an issue w.r.t. interoperability in other languages -- at least in R, since standard data frames do not support non-string row names.

This has resulted in a somewhat clunky approach to storing obs_names and var_names in an anndataR::AnnData object, since we don't assume we can simply add the obs_names and var_names to any of the slots (X, var, obs, ...) since it will result in a conversion warning being thrown and thus loss of information.

We should figure out what the planned roadmap for this functionality is (probably related to scverse/anndata#777?), and whether there is a different way of resolving this in R because not being to add any dimnames to X and rownames to obs and var is quite cumbersome.

@mtmorgan
Copy link
Collaborator

mtmorgan commented May 6, 2023

a strategy following https://anndata.readthedocs.io/en/latest/fileformat-prose.html#dataframe-specification-v0-2-0 might define obs_attrs() and including obs_attrs()[["_name"]] the column name of the index. obs would return a data.frame that included the named column. Is it actually a problem not having row names? This is the norm in the tidyverse world.

Perhaps we would implement (on AbstractAnnData) a single-square-brack subset method ad[cidx, ridx] that created a subset / view based on the corresponding row / column index,.

@lazappi
Copy link
Collaborator

lazappi commented May 8, 2023

I think it should be possible to support most things by storing names separately. Indexing is maybe only an issue for the in-memory backend anyway, I don't think we want to try to implement indexing on the file-backed backends. Conversion to R objects might be more difficult because as soon as we put things into colnames/rownames they will coerced to characters.

Maybe @ivirshup can give us an idea of how soon this might happen (and how much we need to worry about it now)?

@ivirshup
Copy link
Member

ivirshup commented May 8, 2023

It's not imminent. At soonest, my guess would be late this year. But I can ping maintainers of this library ahead of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants