Skip to content

Issues with ingest #1128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andreatangherloni opened this issue Mar 25, 2020 · 4 comments
Open

Issues with ingest #1128

andreatangherloni opened this issue Mar 25, 2020 · 4 comments
Labels
good first issue easy first issue to get started in OSS community contribution!

Comments

@andreatangherloni
Copy link

Hi all,

I am trying to use ingest to integrate different datasets.
I found a couple of issues.

  • ingest requires that the var_names are the same in the reference and the new object. I can select the intersection between the datasets; however, it requires that the genes are in the same order if not ref_var_names.equals(new_var_names). I think this if could be modified using set (e.g., len(set(ref_var_names).difference(set(new_var_names))) == 0). I tried to order the .var dataframe, but the .X remains the same. In such a way, the expression of the genes does not correspond to the correct one. I can generate a dataframe and recreate the .X, but it could be very nice that the .X will be modified according to .var or .obs modifications (i.e., ordering).

  • although it is possible to set embedding_method=umap, ingest requires the PCA components. I used autoencoders instead of PCA, and I cannot run ingest only considering the UMAP. Can you fix it?

Thank you in advance.
Best,
Andrea

@Koncopd
Copy link
Member

Koncopd commented Mar 25, 2020

Hi, @andrea-tango
About the second issue - you dont need pca
you can do something like

# project reference adata to latent dimensions with your autoencoder
adata_ref.obsm['X_latent'] = autoencoder.to_latent(adata_ref.X)
# use your latent variables to calculate neighbors
sc.pp.neighbors(adata_ref, use_rep='X_latent')
sc.tl.umap(adata_ref)
# project your new adata to latent dimensions with your autoencoder
adata_new.obsm['X_latent'] = autoencoder.to_latent(adata_new.X)
sc.tl.ingest(adata_new, adata_ref, embedding_method='umap')

About the first, yes, ingest needs vars in the same order. The ordering thing you describe is definitely not the issue with ingest.

@andreatangherloni
Copy link
Author

Hi @Koncopd,

Thank you for the suggestions.
Regarding the vars, I wrote the attached function.

def filteringGenesCells(adata, genes=None, cells=None, sortGenes=False, sortCells=False):

df = pd.DataFrame(index   = adata.obs.index.tolist(),
                  columns = adata.var.index.tolist(),
                  data    = adata.X)

if genes is not None:
    df = df[genes]
    
if sortGenes:
    df1 = df.T
    df1.sort_index(inplace=True)
    df = df1.T
    
if cells is not None:
    df1 = df.T
    df1 = df1[cells]
    df  = df1.T

if sortCells:
    df.sort_index(inplace=True)
    
adata = adata[:, adata.var.index.isin(df.columns)]
adata = adata[adata.obs.index.isin(df.index)]

if sortGenes:
    adata.var.sort_index(inplace=True)

if sortCells:
    adata.obs.sort_index(inplace=True)

adata.X = df.values

return adata

Best,
Andrea

@ivirshup
Copy link
Member

Andrea, I think you'll want something more like:

shared_genes = adata1.var_names.intersection(adata2.var_names)

adata1 = adata1[:, shared_genes].copy()
adata2 = adata2[:, shared_genes].copy()

Your code only sorts the index of the var dataframe, but doesn't actually reorder the anndata object. It's definitely a little confusing that this is possible.

@giovp giovp added the good first issue easy first issue to get started in OSS community contribution! label Oct 9, 2020
@potulabe
Copy link

potulabe commented Jan 2, 2023

But really, sometimes, this check for equality of var_names between reference and query is unnecessary, for example when you have embeddings being built in some way on both reference and query, you really don't need var_names to be equal, if you have found neighbors based on these embeddings. Is it possible to keep the check only when it is really essential?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue easy first issue to get started in OSS community contribution!
Projects
None yet
Development

No branches or pull requests

5 participants