Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with chr name format #1

Open
mgandal opened this issue Sep 4, 2017 · 4 comments
Open

Issue with chr name format #1

mgandal opened this issue Sep 4, 2017 · 4 comments

Comments

@mgandal
Copy link

mgandal commented Sep 4, 2017

I'm using sorted BAM files that have numeric chromosome names (no "chr" prefix). There seems to be an issue with quantification. If I remove "chr" from your degradation BED file chromosome names, I get 0's for quantification of regions on chromosomes 10-19.

If I sort the BED file so that numeric chromosomes are in numeric order (to match BAM files), I get the following error I think from wiggleTools:
"Position 10:714133 is before 9:140899137"
even though the BED file looks like

9	140049063	140050870	.	0	.
9	140791269	140791603	.	0	.
9	140833366	140833967	.	0	.
9	140899136	140899908	.	0	.
10	714132	714671	.	0	.
10	7830132	7830226	.	0	.
10	51565182	51565296	.	0	.
10	51580555	51580696	.	0	.
10	51632928	51633033	.	0	.
10	60028911	60029064	.	0	.
@nellore
Copy link
Owner

nellore commented Sep 4, 2017

Hey! Are you sure you're sorting the bed file so numeric chromosomes are in numeric order? If 10:714133 < 9:1408991937, that may not be true. Are you doing
sort -k1,1n -k2,2n -k3,3n regions.bed?

Not sure that would solve your problem though. For the case where you just remove chr from the degradation bed, can you try running wiggletools directly using

wiggletools write_bg - trim regions.bed indexed_alignments.bam

and checking whether the output gives 0 coverage for regions chromosome 10 and up? Let me know what happens.

@mgandal
Copy link
Author

mgandal commented Sep 5, 2017

Hi, Thanks. I think it is actually a problem with wiggletools.

Here is an example using wiggletools write_bg - trim regions.bed indexed_alignments.bam with a BAM file with numeric chromosomes. The BAM file is indexed and coordinate sorted:

@HD	VN:1.4	SO:coordinate
@SQ	SN:1	LN:249250621	UR:file:/RefGenome/genome.fa	M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ	SN:2	LN:243199373	UR:file:/RefGenome/genome.fa	M5:a0d9851da00400dec1098a9255ac712e
@SQ	SN:3	LN:198022430	UR:file:/RefGenome/genome.fa	M5:fdfd811849cc2fadebc929bb925902e5
@SQ	SN:4	LN:191154276	UR:file:/RefGenome/genome.fa	M5:23dccd106897542ad87d2765d28a19a1
@SQ	SN:5	LN:180915260	UR:file:/RefGenome/genome.fa	M5:0740173db9ffd264d728f32784845cd7
@SQ	SN:6	LN:171115067	UR:file:/RefGenome/genome.fa	M5:1d3a93a248d92a729ee764823acbbc6b
@SQ	SN:7	LN:159138663	UR:file:/RefGenome/genome.fa	M5:618366e953d6aaad97dbe4777c29375e
@SQ	SN:8	LN:146364022	UR:file:/RefGenome/genome.fa	M5:96f514a9929e410c6651697bded59aec
@SQ	SN:9	LN:141213431	UR:file:/RefGenome/genome.fa	M5:3e273117f15e0a400f01055d9f393768
@SQ	SN:10	LN:135534747	UR:file:/RefGenome/genome.fa	M5:988c28e000e84c26d552359af1ea2e1d
@SQ	SN:11	LN:135006516	UR:file:/RefGenome/genome.fa	M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ	SN:12	LN:133851895	UR:file:/RefGenome/genome.fa	M5:51851ac0e1a115847ad36449b0015864
@SQ	SN:13	LN:115169878	UR:file:/RefGenome/genome.fa	M5:283f8d7892baa81b510a015719ca7b0b
@SQ	SN:14	LN:107349540	UR:file:/RefGenome/genome.fa	M5:98f3cae32b2a2e9524bc19813927542e
@SQ	SN:15	LN:102531392	UR:file:/RefGenome/genome.fa	M5:e5645a794a8238215b2cd77acb95a078
@SQ	SN:16	LN:90354753	UR:file:/RefGenome/genome.fa	M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ	SN:17	LN:81195210	UR:file:/RefGenome/genome.fa	M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ	SN:18	LN:78077248	UR:file:/RefGenome/genome.fa	M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ	SN:19	LN:59128983	UR:file:/RefGenome/genome.fa	M5:1aacd71f30db8e561810913e0b72636d
@SQ	SN:20	LN:63025520	UR:file:/RefGenome/genome.fa	M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ	SN:21	LN:48129895	UR:file:/RefGenome/genome.fa	M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ	SN:22	LN:51304566	UR:file:/RefGenome/genome.fa	M5:a718acaa6135fdca8357d5bfe94211dd
@SQ	SN:X	LN:155270560	UR:file:/RefGenome/genome.fa	M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ	SN:Y	LN:59373566	UR:file:/RefGenome/genome.fa	M5:1fa3474750af0948bdf97d5a0ee52e51

For simplicity, I made a BED file with only one region per chromosome. If I sort the BED file as you suggest sort -k1,1n -k2,2n -k3,3n regions_noCh_v1.bed, and run the wiggletools command wiggletools write_bg - trim regions.bed indexed_alignments.bam I get the following error:

Position 10:714133 is before 9:32552799

even though the BED file is corrected sorted by chromosome in numeric order:

1	6257711	6257893	.	0	.
2	1872090	1872390	.	0	.
3	197682531	197683078	.	0	.
4	602915	603654	.	0	.
5	176279272	176279378	.	0	.
6	16430725	16431049	.	0	.
6	166265710	166267024	.	0	.
7	713588	713686	.	0	.
8	121457307	121457378	.	0	.
9	32552798	32552977	.	0	.
10	714132	714671	.	0	.
11	14520383	14520556	.	0	.
12	6601303	6601633	.	0	.
13	114561069	114562803	.	0	.
14	23495058	23495584	.	0	.
15	101812398	101813061	.	0	.
16	648960	649176	.	0	.
17	79857795	79858128	.	0	.
18	3253886	3254048	.	0	.
19	58906048	58906170	.	0	.
20	60963364	60963575	.	0	.
21	35286753	35286804	.	0	.

If instead I use the BED file sorted as it was previously (as if the chr prefix was still present) wiggletools runs but only returns outputs for chromosomes 1-9

1	6257711	6257893	.	0	.
10	714132	714671	.	0	.
11	14520383	14520556	.	0	.
12	6601303	6601633	.	0	.
13	114561069	114562803	.	0	.
14	23495058	23495584	.	0	.
15	101812398	101813061	.	0	.
16	648960	649176	.	0	.
17	79857795	79858128	.	0	.
18	3253886	3254048	.	0	.
19	58906048	58906170	.	0	.
2	1872090	1872390	.	0	.
20	60963364	60963575	.	0	.
21	35286753	35286804	.	0	.
3	197682531	197683078	.	0	.
4	602915	603654	.	0	.
5	176279272	176279378	.	0	.
6	16430725	16431049	.	0	.
6	166265710	166267024	.	0	.
7	713588	713686	.	0	.
8	121457307	121457378	.	0	.
9	32552798	32552977	.	0	.

@mgandal
Copy link
Author

mgandal commented Sep 6, 2017

Yes, looks like the wiggletools BAM parser is having issues
Ensembl/WiggleTools#19 (comment)

@nellore
Copy link
Owner

nellore commented Sep 7, 2017

Glad this has been figured out! Hope it's resolved soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants