-
Notifications
You must be signed in to change notification settings - Fork 14
add scanpy score_genes and score_genes_cell_cycle components #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/feature_annotation/score_genes_cell_cycle_scanpy/config.vsh.yaml
Outdated
Show resolved
Hide resolved
src/feature_annotation/score_genes_cell_cycle_scanpy/config.vsh.yaml
Outdated
Show resolved
Hide resolved
) | ||
|
||
# find matching index names for given genes | ||
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I think this also works and uses the intended pandas methods. Could you test it?
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]] | |
gene_pool_index = input_adata.var_names.intersection(gene_names.reindex(gene_list)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment will not work when gene_names contains duplicate entries
What I would do is change starting from line 38:
gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names
gene_names = pd.Series(input_adata.var_names, index=gene_names_index)
Make sure that read_gene_list
works with the Series object or just pass gene_names.index
as the second argument
And then use:
gene_list_index = gene_names.loc[gene_list].tolist()
But perhaps tbd with @rcannood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
|
||
# find matching index names for given genes | ||
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment will not work when gene_names contains duplicate entries
What I would do is change starting from line 38:
gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names
gene_names = pd.Series(input_adata.var_names, index=gene_names_index)
Make sure that read_gene_list
works with the Series object or just pass gene_names.index
as the second argument
And then use:
gene_list_index = gene_names.loc[gene_list].tolist()
But perhaps tbd with @rcannood
Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @dorien-er
Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Changelog
add scanpy score_genes and score_genes_cell_cycle components
Issue ticket number and link
Closes #xxxx (Replace xxxx with the GitHub issue number)
Checklist before requesting a review
I have performed a self-review of my code
Conforms to the Contributor's guide
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI tests succeed!