Change `add_dna_sequences` to use `get_seq` #1867

martinkim0 · 2023-01-25T22:31:33Z

add_dna_sequence fails in some datasets such as the Buenrostro 2018 used in scBasset batch correction. See notebook: https://colab.research.google.com/drive/1cTkQqQGmbTAV7jtFiV5bK7sr7nD0g3rJ?usp=sharing. New code outputs the same dna sequence for the initial tutorial.

codecov · 2023-01-25T22:42:06Z

Codecov Report

Base: 90.42% // Head: 90.42% // No change to project coverage 👍

Coverage data is based on head (1cd2dc1) compared to base (508ed0b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1867   +/-   ##
=======================================
  Coverage   90.42%   90.42%           
=======================================
  Files         141      141           
  Lines       11058    11058           
=======================================
  Hits         9999     9999           
  Misses       1059     1059

Impacted Files	Coverage Δ
scvi/data/_preprocessing.py	`76.50% <100.00%> (ø)`
scvi/external/scbasset/_module.py	`98.64% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

martinkim0 · 2023-01-25T22:46:38Z

scvi/data/_preprocessing.py

@@ -357,7 +357,6 @@ def _concat_anndata(multi_anndata, other):


 def _dna_to_code(nt: str) -> int:
-    nt = nt.upper()


not necessary as sequences are uppercased in add_dna_sequences

martinkim0 · 2023-01-25T22:52:26Z

scvi/data/_preprocessing.py

+        block_mid = (chrom_df[start_var_key] + chrom_df[end_var_key]) // 2
+        block_starts = block_mid - (seq_len // 2)
        block_ends = block_starts + seq_len


https://github.com/calico/scBasset/blob/main/scbasset/basenji_utils.py#L84

Change add_dna

696d938

martinkim0 commented Jan 25, 2023

View reviewed changes

martinkim0 requested a review from adamgayoso January 25, 2023 22:46

Martin Kim added 2 commits January 25, 2023 14:48

Add assert

e96e628

Update release notes

5994487

martinkim0 commented Jan 25, 2023

View reviewed changes

adamgayoso approved these changes Jan 26, 2023

View reviewed changes

Merge branch 'main' into scbasset-setup

1cd2dc1

martinkim0 merged commit 928ce6c into main Jan 26, 2023

adamgayoso deleted the scbasset-setup branch January 26, 2023 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `add_dna_sequences` to use `get_seq` #1867

Change `add_dna_sequences` to use `get_seq` #1867

martinkim0 commented Jan 25, 2023 •

edited

codecov bot commented Jan 25, 2023 •

edited

martinkim0 Jan 25, 2023

martinkim0 Jan 25, 2023

		@@ -357,7 +357,6 @@ def _concat_anndata(multi_anndata, other):


		def _dna_to_code(nt: str) -> int:
		nt = nt.upper()

Change add_dna_sequences to use get_seq #1867

Change add_dna_sequences to use get_seq #1867

Conversation

martinkim0 commented Jan 25, 2023 • edited

codecov bot commented Jan 25, 2023 • edited

Codecov Report

martinkim0 Jan 25, 2023

Choose a reason for hiding this comment

martinkim0 Jan 25, 2023

Choose a reason for hiding this comment

Change `add_dna_sequences` to use `get_seq` #1867

Change `add_dna_sequences` to use `get_seq` #1867

martinkim0 commented Jan 25, 2023 •

edited

codecov bot commented Jan 25, 2023 •

edited