Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistaken Chromosome Name Mismatch Error #3

Closed
DarioS opened this issue May 6, 2020 · 1 comment
Closed

Mistaken Chromosome Name Mismatch Error #3

DarioS opened this issue May 6, 2020 · 1 comment

Comments

@DarioS
Copy link

DarioS commented May 6, 2020

I am running vcf2spec but I see lots of errors like "WARNING. chromosome (1) was not found in the FASTA file. Skipping." The FASTA and VCF use the same style of chromosome names.

$ grep \> hs38DH.fasta | head 
>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
>chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
>chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
>chr4  AC:CM000666.2  gi:568336020  LN:190214555  rl:Chromosome  M5:3210fecf1eb92d5489da4346b3fddc6e  AS:GRCh38
>chr5  AC:CM000667.2  gi:568336019  LN:181538259  rl:Chromosome  M5:a811b3dc9fe66af729dc0dddf7fa4f13  AS:GRCh38  hm:47309185-49591369
>chr6  AC:CM000668.2  gi:568336018  LN:170805979  rl:Chromosome  M5:5691468a67c7e7a7b5f2a3a683792c29  AS:GRCh38
>chr7  AC:CM000669.2  gi:568336017  LN:159345973  rl:Chromosome  M5:cc044cc2256a1141212660fb07b6171e  AS:GRCh38
>chr8  AC:CM000670.2  gi:568336016  LN:145138636  rl:Chromosome  M5:c67955b5f7815a9a1edfaa15893d3616  AS:GRCh38
>chr9  AC:CM000671.2  gi:568336015  LN:138394717  rl:Chromosome  M5:6c198acf68b5af7b9d676dfdd531b5de  AS:GRCh38
>chr10  AC:CM000672.2  gi:568336014  LN:133797422  rl:Chromosome  M5:c0eeee7acfdaf31b770a509bdaa6e51a  AS:GRCh38

The VCF file has the same style of chromosome IDs.

$ grep ^chr OSCC_1-P_pass.vcf | head
chr1    455086  .       C       A       .       PASS    CONTQ=93;DP=110;ECNT=1;GERMQ=93;MBQ=29,33;MFRL=347,303;MMQ=25,30;MPOS=56;NALOD=1.36;NLOD=6.57;POPAF=0.924;ROQ=93;SEQQ=93;STRANDQ=35;TLOD=17.87  GT:AD:AF:DP:F1R2:F2R1:SB0/0:22,0:0.042:22:10,0:11,0:13,9,0,0    0/1:76,11:0.134:87:31,6:41,5:45,31,3,8
chr1    613111  .       C       T       .       PASS    CONTQ=22;DP=65;ECNT=1;GERMQ=49;MBQ=33,33;MFRL=394,309;MMQ=41,32;MPOS=3;NALOD=1.4;NLOD=6.92;POPAF=0.117;ROQ=27;SEQQ=13;STRANDQ=36;TLOD=6.48      GT:AD:AF:DP:F1R2:F2R1:SB0/0:23,0:0.039:23:10,0:13,0:11,12,0,0   0/1:30,4:0.137:34:13,4:17,0:15,15,2,2
chr1    653704  .       G       A       .       PASS    CONTQ=4;DP=78;ECNT=1;GERMQ=93;MBQ=33,34;MFRL=385,258;MMQ=60,41;MPOS=39;NALOD=1;NLOD=5.09;POPAF=0.876;ROQ=34;SEQQ=5;STRANDQ=15;TLOD=5.54 GT:AD:AF:DP:F1R2:F2R1:SB       0/0:18,0:0.051:18:8,0:9,0:12,6,0,0       0/1:47,3:0.08:50:20,2:22,1:30,17,1,2
chr1    1140424 .       A       T       .       PASS    CONTQ=41;DP=37;ECNT=1;GERMQ=18;MBQ=33,32;MFRL=385,163;MMQ=47,47;MPOS=13;NALOD=0.996;NLOD=2.64;POPAF=3.49;ROQ=56;SEQQ=10;STRANDQ=9;TLOD=6.24     GT:AD:AF:DP:F1R2:F2R1:SB0/0:9,0:0.092:9:6,0:2,0:8,1,0,0 0/1:22,3:0.147:25:14,2:7,1:21,1,3,0
chr1    1253674 .       C       T       .       PASS    CONTQ=93;DP=164;ECNT=1;GERMQ=93;MBQ=32,33;MFRL=382,397;MMQ=60,60;MPOS=43;NALOD=1.7;NLOD=14.39;POPAF=2.4;ROQ=66;SEQQ=93;STRANDQ=72;TLOD=26.99    GT:AD:AF:DP:F1R2:F2R1:SB0/0:48,0:0.02:48:26,0:21,0:27,21,0,0    0/1:99,12:0.115:111:40,7:58,5:59,40,6,6
chr1    1678117 .       G       A       .       PASS    CONTQ=93;DP=170;ECNT=1;GERMQ=93;MBQ=30,33;MFRL=386,429;MMQ=60,60;MPOS=38;NALOD=1.72;NLOD=15.29;POPAF=6;ROQ=78;SEQQ=93;STRANDQ=66;TLOD=24.72     GT:AD:AF:DP:F1R2:F2R1:SB0/0:51,0:0.019:51:31,0:19,0:18,33,0,0   0/1:102,11:0.104:113:54,6:47,5:57,45,6,5
chr1    1952695 .       G       A       .       PASS    CONTQ=93;DP=147;ECNT=2;GERMQ=93;MBQ=25,33;MFRL=376,367;MMQ=60,60;MPOS=39;NALOD=1.59;NLOD=11.29;POPAF=4.61;ROQ=69;SEQQ=93;STRANDQ=80;TLOD=43.4   GT:AD:AF:DP:F1R2:F2R1:SB0/0:38,0:0.025:38:19,0:16,0:21,17,0,0   0/1:87,18:0.178:105:41,12:41,6:60,27,13,5
chr1    2032531 .       C       T       .       PASS    CONTQ=93;DP=175;ECNT=1;GERMQ=93;MBQ=26,33;MFRL=391,407;MMQ=60,60;MPOS=27;NALOD=1.72;NLOD=15.63;POPAF=6;ROQ=66;SEQQ=81;STRANDQ=46;TLOD=13.34     GT:AD:AF:DP:F1R2:F2R1:SB0/0:52,0:0.019:52:25,0:27,0:27,25,0,0   0/1:112,8:0.073:120:57,4:49,4:59,53,4,4
chr1    2964332 .       G       A       .       PASS    CONTQ=93;DP=145;ECNT=1;GERMQ=93;MBQ=32,34;MFRL=380,355;MMQ=60,60;MPOS=54;NALOD=1.6;NLOD=11.73;POPAF=6;ROQ=57;SEQQ=93;STRANDQ=67;TLOD=26.28      GT:AD:AF:DP:F1R2:F2R1:SB0/0:39,0:0.024:39:20,0:19,0:21,18,0,0   0/1:91,11:0.115:102:48,7:43,4:46,45,5,6
chr1    3081664 .       A       G       .       PASS    CONTQ=93;DP=155;ECNT=1;GERMQ=93;MBQ=33,32;MFRL=396,389;MMQ=60,60;MPOS=38;NALOD=1.69;NLOD=14.44;POPAF=6;ROQ=93;SEQQ=93;STRANDQ=93;TLOD=128.56    GT:AD:AF:DP:F1R2:F2R1:SB0/0:48,0:0.02:48:25,0:23,0:27,21,0,0    0/1:61,44:0.42:105:32,19:29,25:30,31,24,20

It looks like sigLASSO removes the chr prefix from the chromosome ID without asking the user, which is bad form. Surely, the user should be allowed to use chromosome names such as chr1, chr2.

@ShantaoL
Copy link
Member

Hi,

Thanks for the interest in our work and the feedback. I developed sigLASSO really as a new statistical learning algorithm to better assign mutational signatures. The algorithm inputs are the "mutational context" and "signatures".

To help users to use the tool, we supplied a simple basic bash script (get_context.sh) to preprocess the vcf/fasta and did an R wrapper on top of it. We encourage users to use this script as a prototype and modify it as needed.

I apologize that you found it hard to use. I have added a quick patch for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants