Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deeptools countReadsPerBin output is not sorted #14

Closed
ntanmayee opened this issue Mar 5, 2024 · 2 comments
Closed

deeptools countReadsPerBin output is not sorted #14

ntanmayee opened this issue Mar 5, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ntanmayee
Copy link
Owner

The new preprocessing pipeline uses deeptools countReadsPerBin class. This uses multiprocessing and is much faster than before.

However, the output from this is not sorted. This means that two runs of crpb.run() can give different results making the rest of the DecoDen pipeline wrong.

@ntanmayee ntanmayee added the bug Something isn't working label Mar 5, 2024
@ntanmayee ntanmayee self-assigned this Mar 5, 2024
@ntanmayee ntanmayee changed the title deeptools countReadsPerBin output is not sorted deeptools countReadsPerBin output is not sorted Mar 5, 2024
@ntanmayee
Copy link
Owner Author

Potential solution --

Re-implement countReadsPerBin.py and pass includeLabels=False to mapReduce. This should return the chromosome, start and end which will help in sorting the multiprocessing output.

@ntanmayee
Copy link
Owner Author

The previous solution does not work. This is the new strategy

  1. Read in chrom_sizes.bed file to get chromosome names and lengths
  2. Call count_reads_in_region instead of run. This is still run with multiprocessing, but the results are ordered
  3. Concatenate resulting coverage arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant