Funseq2 output vcf parsing errors #1

lybird300 · 2018-04-09T02:52:58Z

Hi there, first of all thanks for sharing the code and I'm looking forward to your paper when it comes out. This is not an issue of your tool, but I would really appreciate your insight since you have been working with Funseq2 for so long (I guess). I was able to make funseq2.1.6 (downloaded from the official website) work on my data and generate the Output.vcf file, but reading the file with VariantAnnotation::scanVcf (invoked by VariantAnnotation::readVcf) gives me Error: scanVcf: invalid split pattern ',(?=(ID|Number|Type)=[[:alnum:]])|,(?=Description=".?")'. I googled but could not find any solution expect for this thread. I'm wondering if you ever encountered the same problem or if there is any way to bypass the issue and still make your tool work. Thank you for your time!

Below is the first several lines of my Output.vcf file:

##fileformat=VCFv4.0
##INFO=<ID=OTHER,Number=.,Type=String, Description = "Other Information From Original File">
##INFO=<ID=SAMPLE,Number=.,Type=String,Description="Sample id">
##INFO=<ID=CDS,Number=.,Type=String,Description="Coding Variants or not">
##INFO=<ID=VA,Number=.,Type=String,Description="Coding Variant Annotation">
##INFO=<ID=HUB,Number=.,Type=String,Description="Network Hubs, PPI (protein protein interaction network), REG (regulatory network), PHOS (phosphorylation network)...">
##INFO=<ID=GNEG,Number=.,Type=String,Description="Gene Under Negative Selection">
##INFO=<ID=GERP,Number=.,Type=String,Description="Gerp Score">
##INFO=<ID=NCENC,Number=.,Type=String,Description="NonCoding ENCODE Annotation">
##INFO=<ID=HOT,Number=.,Type=String,Description="Highly Occupied Target Region">
##INFO=<ID=MOTIFBR,Number=.,Type=String,Description="Motif Breaking">
##INFO=<ID=MOTIFG,Number=.,Type=String,Description="Motif Gain">
##INFO=<ID=SEN,Number=.,Type=String,Description="In Sensitive Region">
##INFO=<ID=USEN,Number=.,Type=String,Description="In Ultra-Sensitive Region">
##INFO=<ID=UCONS,Number=.,Type=String,Description="In Ultra-Conserved Region">
##INFO=<ID=GENE,Number=.,Type=String,Description="Target Gene (For coding - directly affected genes ; For non-coding - promoter, enhancer regulatory module)">
##INFO=<ID=CANG,Number=.,Type=String,Description="Prior Gene Information, e.g.[cancer][TF_regulating_known_cancer_gene][up_regulated][actionable]...";
##INFO=<ID=CDSS,Number=.,Type=String,Description="Coding Score">
##INFO=<ID=NCDS,Number=.,Type=String,Description="NonCoding Score">
##INFO=<ID=RECUR,Number=.,Type=String,Description="Recurrent elements / variants">
##INFO=<ID=DBRECUR,Number=.,Type=String,Description="Recurrence database">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr13 100252870 . T A . . SAMPLE=T100;GERP=-4.14;CDS=No;HUB=CLYBL:PPI(0.361),TM9SF2:PPI(0.236)REG(0.409);NCENC=DHS(MCV-7|chr13:100252825-100252975),Enhancer(Roadmap_stringent|chr13:100252624-100252923),Enhancer(chmm/segway|chr13:100252783-100253000),TFM(CTCF_DHS|SP1_disc3|chr13:100252864-100252879),TFP(CTCF|chr13:100252207-100252907),TFP(CTCF|chr13:100252476-100252894),TFP(CTCF|chr13:100252487-100252873),TFP(CTCF|chr13:100252497-100252927),TFP(CTCF|chr13:100252503-100252891),TFP(CTCF|chr13:100252503-100252930),TFP(CTCF|chr13:100252505-100252888),TFP(CTCF|chr13:100252514-100253036),TFP(CTCF|chr13:100252515-100252912),TFP(CTCF|chr13:100252515-100252924),TFP(CTCF|chr13:100252516-100252910),TFP(CTCF|chr13:100252517-100252885),TFP(CTCF|chr13:100252519-100252971),TFP(CTCF|chr13:100252524-100252882),TFP(CTCF|chr13:100252527-100252884),TFP(CTCF|chr13:100252531-100252949),TFP(CTCF|chr13:100252532-100252884),TFP(CTCF|chr13:100252533-100252888),TFP(CTCF|chr13:100252535-100252892),TFP(CTCF|chr13:100252539-100252884),TFP(CTCF|chr13:100252541-100252876),TFP(CTCF|chr13:100252548-100252894),TFP(CTCF|chr13:100252564-100252892),TFP(CTCF|chr13:100252575-100252873),TFP(CTCF|chr13:100252587-100252877),TFP(CTCF|chr13:100252589-100252892),TFP(RAD21|chr13:100252520-100252938),TFP(RAD21|chr13:100252524-100252936),TFP(RAD21|chr13:100252529-100252933),TFP(RAD21|chr13:100252531-100252912),TFP(RAD21|chr13:100252537-100252892),TFP(RAD21|chr13:100252537-100252897);HOT=H1hesc;MOTIFBR=CTCF_DHS#SP1_disc3#100252864#100252879#-#10#0.131148#0.245902;MOTIFG=MZF1_3#100252864#100252870#-#1#6.892#5.961;SEN=Yes;GENE=CLYBL(Distal)TM9SF2(Distal);NCDS=3.854994663534:4.854994663534;RECUR=TFP(CTCF|chr13:100252207-100252907):T100&T236,TFP(CTCF|chr13:100252476-100252894):T100&T236
chr4 60720 . T C . . SAMPLE=T100;GERP=0.69;CDS=No;HUB=ZNF595:REG(0.987);NCENC=TFM(MAFF_MAFK|CEBPG_1|chr4:60714-60727),TFP(CHD2|chr4:50497-61009),TFP(EP300|chr4:57632-61054),TFP(FAM48A|chr4:58427-60880),TFP(FAM48A|chr4:59515-61055),TFP(FOS|chr4:55158-60924),TFP(IRF3|chr4:60169-61169),TFP(JUND|chr4:50500-62894),TFP(KAT2A|chr4:59928-60898),TFP(MAFF|chr4:60004-61056),TFP(MAFK|chr4:50489-63382),TFP(MAFK|chr4:59870-61092),TFP(MAFK|chr4:60116-61036),TFP(MXI1|chr4:50616-61014),TFP(SIN3A|chr4:52691-62808),TFP(STAT1|chr4:60005-61046),TFP(ZZZ3|chr4:59671-60953);HOT=K562;MOTIFBR=MAFF_MAFK#CEBPG_1#60714#60727#+#6#0.000000#0.714286;SEN=Yes;USEN=Yes;GENE=ZNF595(Intron);NCDS=3.7416745628426:4.7416745628426;RECUR=TFP(CHD2|chr4:50497-61009):T100&T104&T106&T107&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T59&T70&T80&T92,TFP(EP300|chr4:57632-61054):T100&T104&T106&T107&T264&T272&T275&T333&T59&T80&T92,TFP(FAM48A|chr4:58427-60880):T100&T264&T272&T59&T80&T92,TFP(FAM48A|chr4:59515-61055):T100&T106&T264&T272&T333&T59&T80,TFP(FOS|chr4:55158-60924):T100&T104&T106&T107&T115&T256&T264&T272&T275&T294&T297&T318&T59&T80&T92,TFP(IRF3|chr4:60169-61169):T100&T106&T264&T272&T333&T80,TFP(JUND|chr4:50500-62894):T100&T104&T106&T107&T108&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T333&T52&T59&T70&T80&T92,TFP(KAT2A|chr4:59928-60898):T100&T264&T272&T80,TFP(MAFF|chr4:60004-61056):T100&T106&T264&T272&T333&T80,TFP(MAFK|chr4:50489-63382):T100&T104&T106&T107&T108&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T333&T52&T59&T70&T80&T92,TFP(MAFK|chr4:59870-61092):T100&T106&T264&T272&T333&T80,TFP(MAFK|chr4:60116-61036):T100&T106&T264&T272&T80,TFP(MXI1|chr4:50616-61014):T100&T104&T106&T107&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T59&T70&T80&T92,TFP(SIN3A|chr4:52691-62808):T100&T104&T106&T107&T108&T11&T115&T153&T252&T256&T258&T264&T272&T275&T284&T294&T297&T314&T318&T333&T59&T80&T92,TFP(STAT1|chr4:60005-61046):T100&T106&T264&T272&T333&T80,TFP(ZZZ3|chr4:59671-60953):T100&T264&T272&T59&T80;DBRECUR=TFP(CHD2|chr4:50497-61009):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(EP300|chr4:57632-61054):Lung_Adeno(Altered in 10/24(41.67%) samples.),TFP(FAM48A|chr4:58427-60880):Lung_Adeno(Altered in 7/24(29.17%) samples.),TFP(FAM48A|chr4:59515-61055):Lung_Adeno(Altered in 6/24(25.00%) samples.),TFP(FOS|chr4:55158-60924):Lung_Adeno(Altered in 14/24(58.33%) samples.)|Prostate(Altered in 2/64(3.12%) samples.),TFP(JUND|chr4:50500-62894):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(KAT2A|chr4:59928-60898):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MAFF|chr4:60004-61056):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MAFK|chr4:50489-63382):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(MAFK|chr4:59870-61092):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MXI1|chr4:50616-61014):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(SIN3A|chr4:52691-62808):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Prostate(Altered in 4/64(6.25%) samples.),TFP(STAT1|chr4:60005-61046):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(ZZZ3|chr4:59671-60953):Lung_Adeno(Altered in 4/24(16.67%) samples.)

mil2041 · 2018-04-09T21:07:16Z

Hi,

It seems VariantAnnotation::scanVcf cannot parse vcf output file from FunSeq2 correctly.
Because I have not encountered this error message in the VariantAnnotation package before, I am not sure which line of your Output.vcf generate this error.

You will need to figure out which line in the Output.vcf that cannot be recognized by the VariantAnnotation parser. Otherwise, making a custom VCF file parser will be another possible solution.

There could be several possible places you can check. (1) Does your file end with correct "\n" symbol in the file. Mac will generate "^\M" symbol in the file that may cause parser encounter error. (2) If you randomly pick several lines in the Output.vcf, and then VariantAnnotation::scanVcf does not generate error message anymore. You can be surer it is just some lines in the Output.vcf generate this error.

lybird300 · 2018-04-10T02:11:10Z

Hi Eric, thank you for your quick reply. I reinstalled all related R packages and the readVcf function works now! Still no clue what went wrong but I can live with that. Thanks again and good luck with everything!

lybird300 closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Funseq2 output vcf parsing errors #1

Funseq2 output vcf parsing errors #1

lybird300 commented Apr 9, 2018 •

edited

Loading

mil2041 commented Apr 9, 2018

lybird300 commented Apr 10, 2018

Funseq2 output vcf parsing errors #1

Funseq2 output vcf parsing errors #1

Comments

lybird300 commented Apr 9, 2018 • edited Loading

mil2041 commented Apr 9, 2018

lybird300 commented Apr 10, 2018

lybird300 commented Apr 9, 2018 •

edited

Loading