New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change add_dna_sequences
to use get_seq
#1867
Conversation
Codecov ReportBase: 90.42% // Head: 90.42% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #1867 +/- ##
=======================================
Coverage 90.42% 90.42%
=======================================
Files 141 141
Lines 11058 11058
=======================================
Hits 9999 9999
Misses 1059 1059
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -357,7 +357,6 @@ def _concat_anndata(multi_anndata, other): | |||
|
|||
|
|||
def _dna_to_code(nt: str) -> int: | |||
nt = nt.upper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary as sequences are uppercased in add_dna_sequences
block_mid = (chrom_df[start_var_key] + chrom_df[end_var_key]) // 2 | ||
block_starts = block_mid - (seq_len // 2) | ||
block_ends = block_starts + seq_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_dna_sequence
fails in some datasets such as the Buenrostro 2018 used in scBasset batch correction. See notebook: https://colab.research.google.com/drive/1cTkQqQGmbTAV7jtFiV5bK7sr7nD0g3rJ?usp=sharing. New code outputs the same dna sequence for the initial tutorial.