Problem with protein/transcript identifiers #2

TDDB-limagrain · 2022-08-16T10:05:58Z

Hi Hesham,

this is not a real problem but solving it would make life easier :-D
I successfully ran VCF2PROT v0.1.4 but I add to correct the transcript identifier in the VCF file as well as in the reference fasta file.

My bcftools-annotated VCF file has Solyc02g062560.3|Solyc02g062560.3.1 identifiers in the BCSQ fields and my protein file header is Solyc02g062560.3.1.
It seems that the . in the sequence name is causing some problem and in this case, the output .fasta file was empty. After removing the end of the sequence name (moving to Solyc02g062560.3|Solyc02g062560 in the VCF and to Solyc02g062560 in the reference fasta), vcf2prot finally succeeded in writing the proper corrected sequences.

Hope it will help for the future!

Best regards,

Thomas

The text was updated successfully, but these errors were encountered:

HeshamElAbd · 2022-08-16T10:37:50Z

Hi Thomas,

Indeed, this is an important feature to have, current version depend upon exact matching between the name of the transcript in the VCF file and in the reference FASTA file. Nevertheless, I will try my best to add a 'smarter function' that would parse the transcript names in the VCF and FASTA files, respectively, and then do the matching. This features shall be added to included in the next release of VCF2Prot (0.1.5). Hopefully coming within 3-4 weeks.

Thanks a lot for your time and feedback,
All the best,
Hesham

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with protein/transcript identifiers #2

Problem with protein/transcript identifiers #2

TDDB-limagrain commented Aug 16, 2022

HeshamElAbd commented Aug 16, 2022

Problem with protein/transcript identifiers #2

Problem with protein/transcript identifiers #2

Comments

TDDB-limagrain commented Aug 16, 2022

HeshamElAbd commented Aug 16, 2022