Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anndata - deleting specific row #73

Closed
ltosti opened this issue Oct 19, 2018 · 5 comments
Closed

anndata - deleting specific row #73

ltosti opened this issue Oct 19, 2018 · 5 comments

Comments

@ltosti
Copy link

ltosti commented Oct 19, 2018

Hi there,

Sorry for the simple question, but I just started using anndata within Scanpy and was wondering: is there a way to remove a specific row? Something like

if 
adata.var_names == "foo"
remove the row

(It is to remove some mitochondrial genes)

Thank you!

@falexwolf
Copy link
Member

Checkout the basic tutorial.

What you want specifically is

non_mito_genes_list = [name for name in adata.var_names if not name.startswith('MT-')]
adata_no_mito_genes = adata[:, non_mito_genes_list]

Or you can also do

mito_gene_list = sc.queries.mitochondrial_genes()
mito_gene_indicator = np.in1d(adata.var_names, mito_gene_list)
adata_no_mito_genes = adata[:, ~mito_gene_indicator]

where ~ inverts the boolean indicator array, making it a mask.

Depending on the scenario, one or the other will be more convenient.

PS: AnnData stores observations as rows and variables as columns as common in machine learning, statistics and Python, but opposite to the genomics convention.

@JPV95
Copy link

JPV95 commented Apr 30, 2019

I ran the above solution, only to get memory error, I am running 64 bit python, is there a way to delete a row based on observation value?

essentially
if anndata['value'] == "threshold"
delete()

@falexwolf
Copy link
Member

No, sorry, there is no way to delete a row as storage is in contiguous arrays.

@falexwolf
Copy link
Member

How come you are getting a memory error from the above? Are you dealing with a huge matrix on a machine with very little memory?

@wubaosheng
Copy link

No, sorry, there is no way to delete a row as storage is in contiguous arrays.

you can do like this:
t = adata.obs.drop(index=["C_DeepM_r_2_FRAS210239463_1r"])
adata= adata[adata.obs["sample_index"].isin(t["sample_index"].to_list())]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants