add scanpy score_genes and score_genes_cell_cycle components #703

dorien-er · 2024-02-12T17:24:01Z

Changelog

add scanpy score_genes and score_genes_cell_cycle components

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

src/feature_annotation/score_genes_cell_cycle_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/script.py

src/feature_annotation/score_genes_cell_cycle_scanpy/test.py

src/feature_annotation/score_genes_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/test.py

src/feature_annotation/score_genes_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/script.py

src/feature_annotation/score_genes_cell_cycle_scanpy/test.py

DriesSchaumont · 2024-02-16T08:08:47Z

src/feature_annotation/score_genes_scanpy/script.py

+)
+
+# find matching index names for given genes
+gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]


I'm not sure, but I think this also works and uses the intended pandas methods. Could you test it?

Suggested change

gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]

gene_pool_index = input_adata.var_names.intersection(gene_names.reindex(gene_list)[0])

My previous comment will not work when gene_names contains duplicate entries

What I would do is change starting from line 38:

gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names gene_names = pd.Series(input_adata.var_names, index=gene_names_index)

Make sure that read_gene_list works with the Series object or just pass gene_names.index as the second argument

And then use:

gene_list_index = gene_names.loc[gene_list].tolist()

But perhaps tbd with @rcannood

@rcannood i commited those changes in e87c9bd, so can be merged/reverted back

src/feature_annotation/score_genes_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/config.vsh.yaml

src/feature_annotation/score_genes_cell_cycle_scanpy/script.py

DriesSchaumont · 2024-02-16T09:15:14Z

src/feature_annotation/score_genes_scanpy/script.py

+)
+
+# find matching index names for given genes
+gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]


My previous comment will not work when gene_names contains duplicate entries

What I would do is change starting from line 38:

gene_names_index = input_adata.var[par["var_gene_names"]] if par["var_gene_names"] else input_adata.var_names gene_names = pd.Series(input_adata.var_names, index=gene_names_index)

Make sure that read_gene_list works with the Series object or just pass gene_names.index as the second argument

And then use:

gene_list_index = gene_names.loc[gene_list].tolist()

But perhaps tbd with @rcannood

src/feature_annotation/score_genes_cell_cycle_scanpy/script.py

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

src/feature_annotation/score_genes_scanpy/script.py

DriesSchaumont

LGTM! Thank you @dorien-er

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

dorien-er added 10 commits February 12, 2024 14:49

initial push

ddf04d1

update input output handling

6df2776

update cell cycle scoring

955ed09

score_genes edit config

39f2231

add score_genes tests

9e97890

add score_genes_cell_cycle test

22674bb

ensure versioning

ee493ae

update descriptions

5517d52

update input/output descriptions

645eb5f

add changelog entry

4bc65f4

dorien-er requested review from DriesSchaumont and rcannood February 12, 2024 17:26

dorien-er added 4 commits February 12, 2024 18:30

remove whitespace, add newline

af64388

move score genes componenets to feature annotation

651c4e6

add namespace

51cd562

adjust name and namespace

545ab39

DriesSchaumont requested changes Feb 15, 2024

View reviewed changes

dorien-er and others added 8 commits February 15, 2024 11:56

bump python version

7a85ff6

add context managers

6c655a5

assert gene files are not empty

9b6da70

add h5mu output compression

e76cbd9

fix tests

d9d522c

add fixtures and tmp_path to tests

f2a8ef6

update mudata handling

b02cf0b

Merge branch 'main' into score_genes

e7338ba

rcannood approved these changes Feb 15, 2024

View reviewed changes

rcannood requested a review from DriesSchaumont February 15, 2024 14:59

remove f string

131ddd1

DriesSchaumont requested changes Feb 16, 2024

View reviewed changes

add authors

49c2934

dorien-er added 4 commits February 16, 2024 09:37

remove file

3a54f40

inline helper script

f371cf9

update helper and tests

acba13b

update helper

ce05d6d

DriesSchaumont requested changes Feb 16, 2024

View reviewed changes

dorien-er added 7 commits February 16, 2024 11:01

undo assert outside of pytest raise

e001f75

copy

1c9bfc8

add assert

eeb35d8

add comment

6d59139

assert outside of pytest raise

34d7923

assert outside of pytest raise

86e2ce9

update index matching

e87c9bd

DriesSchaumont requested changes Feb 16, 2024

View reviewed changes

src/feature_annotation/score_genes_cell_cycle_scanpy/script.py Outdated Show resolved Hide resolved

Update src/feature_annotation/score_genes_cell_cycle_scanpy/script.py

809139c

Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>

DriesSchaumont requested changes Feb 19, 2024

View reviewed changes

src/feature_annotation/score_genes_scanpy/script.py Outdated Show resolved Hide resolved

dorien-er and others added 2 commits February 19, 2024 09:43

update score genes script

7d62cd7

Merge remote-tracking branch 'origin/main' into score_genes

fc5928b

DriesSchaumont approved these changes Feb 19, 2024

View reviewed changes

DriesSchaumont merged commit 1c5dcdc into main Feb 19, 2024

DriesSchaumont deleted the score_genes branch February 20, 2024 09:20

VladimirShitov pushed a commit that referenced this pull request Mar 12, 2024

add scanpy score_genes and score_genes_cell_cycle components (#703)

0901f4b

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

	gene_list_index = input_adata.var.index[[gene in gene_list for gene in gene_names]]
	gene_pool_index = input_adata.var_names.intersection(gene_names.reindex(gene_list)[0])

add scanpy score_genes and score_genes_cell_cycle components #703

add scanpy score_genes and score_genes_cell_cycle components #703

Uh oh!

Conversation

dorien-er commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Issue ticket number and link

Checklist before requesting a review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DriesSchaumont Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

DriesSchaumont Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

dorien-er Feb 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DriesSchaumont Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DriesSchaumont left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dorien-er commented Feb 12, 2024 •

edited

Loading

dorien-er Feb 19, 2024 •

edited

Loading