`deeptools` `countReadsPerBin` output is not sorted #14

ntanmayee · 2024-03-05T17:17:46Z

The new preprocessing pipeline uses deeptools countReadsPerBin class. This uses multiprocessing and is much faster than before.

However, the output from this is not sorted. This means that two runs of crpb.run() can give different results making the rest of the DecoDen pipeline wrong.

The text was updated successfully, but these errors were encountered:

ntanmayee · 2024-03-05T17:20:12Z

Potential solution --

Re-implement countReadsPerBin.py and pass includeLabels=False to mapReduce. This should return the chromosome, start and end which will help in sorting the multiprocessing output.

ntanmayee · 2024-03-06T13:58:23Z

The previous solution does not work. This is the new strategy

Read in chrom_sizes.bed file to get chromosome names and lengths
Call count_reads_in_region instead of run. This is still run with multiprocessing, but the results are ordered
Concatenate resulting coverage arrays

ntanmayee added the bug Something isn't working label Mar 5, 2024

ntanmayee self-assigned this Mar 5, 2024

ntanmayee changed the title ~~deeptools countReadsPerBin output is not sorted~~ deeptools countReadsPerBin output is not sorted Mar 5, 2024

ntanmayee closed this as completed in 256551c Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`deeptools` `countReadsPerBin` output is not sorted #14

`deeptools` `countReadsPerBin` output is not sorted #14

ntanmayee commented Mar 5, 2024

ntanmayee commented Mar 5, 2024

ntanmayee commented Mar 6, 2024

deeptools countReadsPerBin output is not sorted #14

deeptools countReadsPerBin output is not sorted #14

Comments

ntanmayee commented Mar 5, 2024

ntanmayee commented Mar 5, 2024

ntanmayee commented Mar 6, 2024

`deeptools` `countReadsPerBin` output is not sorted #14

`deeptools` `countReadsPerBin` output is not sorted #14