Skip to content

Conversation

dorien-er
Copy link
Contributor

@dorien-er dorien-er commented Feb 12, 2024

Changelog

add scanpy score_genes and score_genes_cell_cycle components

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

)

# find matching index names for given genes
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think this also works and uses the intended pandas methods. Could you test it?

Suggested change
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]
gene_pool_index = input_adata.var_names.intersection(gene_names.reindex(gene_list)[0])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment will not work when gene_names contains duplicate entries

What I would do is change starting from line 38:

gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names
gene_names = pd.Series(input_adata.var_names, index=gene_names_index)

Make sure that read_gene_list works with the Series object or just pass gene_names.index as the second argument

And then use:

gene_list_index = gene_names.loc[gene_list].tolist()

But perhaps tbd with @rcannood

Copy link
Contributor Author

@dorien-er dorien-er Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcannood i commited those changes in e87c9bd, so can be merged/reverted back

)

# find matching index names for given genes
gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment will not work when gene_names contains duplicate entries

What I would do is change starting from line 38:

gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names
gene_names = pd.Series(input_adata.var_names, index=gene_names_index)

Make sure that read_gene_list works with the Series object or just pass gene_names.index as the second argument

And then use:

gene_list_index = gene_names.loc[gene_list].tolist()

But perhaps tbd with @rcannood

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Copy link
Member

@DriesSchaumont DriesSchaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you @dorien-er

@DriesSchaumont DriesSchaumont merged commit 1c5dcdc into main Feb 19, 2024
@DriesSchaumont DriesSchaumont deleted the score_genes branch February 20, 2024 09:20
VladimirShitov pushed a commit that referenced this pull request Mar 12, 2024
Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants