Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSG: Got a sequence without letters. Could not guess alphabet #127

Closed
mushalallam opened this issue May 27, 2015 · 13 comments
Closed

MSG: Got a sequence without letters. Could not guess alphabet #127

mushalallam opened this issue May 27, 2015 · 13 comments
Labels

Comments

@mushalallam
Copy link

Hi
I got this error when I try to create a core alignment
Thanks

@andrewjpage
Copy link
Member

Hi,
Just to double check, did you install muscle and RevTrans.py? Could you send me a directory listing (ls -alrt ) so I can get an idea about whats gone wrong.
Thanks,
Andrew

@mushalallam
Copy link
Author

Hi Andrew,
this how I run the command
roary -e -i 70 --core_definition 90 --dont_delete_files *.gff
I have Muscle and revtrans.py installed in my path, below is the ls -alrt
ma11:v3 ma11$ ls -alrt
total 34480
drwxr-xr-x 20 ma11 staff 680 May 27 10:39 ..
-rwxr-xr-x 1 ma11 staff 2818200 May 27 10:39 NT45_03212015.gff
-rwxr-xr-x 1 ma11 staff 2684389 May 27 10:39 NT224_03212015.gff
-rwxr-xr-x 1 ma11 staff 2753976 May 27 10:39 NT12_03212015.gff
-rwxr-xr-x 1 ma11 staff 2763286 May 27 10:39 NT11_03212015.gff
-rw-r--r-- 1 ma11 staff 37095 May 27 13:10 database_masking.asnb
-rw-r--r-- 1 ma11 staff 224049 May 27 13:10 _combined_files.groups
-rw-r--r-- 1 ma11 staff 571707 May 27 13:10 _combined_files
-rw-r--r-- 1 ma11 staff 115891 May 27 13:10 _clustered.clstr
-rw-r--r-- 1 ma11 staff 381953 May 27 13:10 _clustered
-rw-r--r-- 1 ma11 staff 211 May 27 13:10 blast_identity_frequency.Rtab
-rw-r--r-- 1 ma11 staff 41872 May 27 13:10 _uninflated_mcl_groups
-rw-r--r-- 1 ma11 staff 73 May 27 13:10 _gff_files
-rw-r--r-- 1 ma11 staff 125 May 27 13:10 _fasta_files
-rw-r--r-- 1 ma11 staff 604198 May 27 13:10 _blast_results
-rw-r--r-- 1 ma11 staff 314397 May 27 13:10 _labeled_mcl_groups
-rw-r--r-- 1 ma11 staff 288108 May 27 13:10 _inflated_unsplit_mcl_groups
-rw-r--r-- 1 ma11 staff 288108 May 27 13:10 _inflated_mcl_groups
-rw-r--r-- 1 ma11 staff 170 May 27 13:10 number_of_unique_genes.Rtab
-rw-r--r-- 1 ma11 staff 153 May 27 13:10 number_of_new_genes.Rtab
-rw-r--r-- 1 ma11 staff 200 May 27 13:10 number_of_genes_in_pan_genome.Rtab
-rw-r--r-- 1 ma11 staff 200 May 27 13:10 number_of_conserved_genes.Rtab
-rw-r--r-- 1 ma11 staff 413887 May 27 13:10 gene_presence_absence.csv
-rw-r--r-- 1 ma11 staff 0 May 27 13:10 core_accessory.tab
-rw-r--r-- 1 ma11 staff 314397 May 27 13:10 clustered_proteins
-rw-r--r-- 1 ma11 staff 156 May 27 13:10 core_accessory.header.embl
-rw-r--r-- 1 ma11 staff 0 May 27 13:10 accessory.tab
-rw-r--r-- 1 ma11 staff 156 May 27 13:10 accessory.header.embl
drwxr-xr-x 4569 ma11 staff 155346 May 27 13:13 pan_genome_sequences
-rw-r--r-- 1 ma11 staff 662815 May 27 13:14 NT11_03212015.gff.proteome.faa
-rw-r--r-- 1 ma11 staff 661577 May 27 13:14 NT12_03212015.gff.proteome.faa
-rw-r--r-- 1 ma11 staff 646061 May 27 13:14 NT224_03212015.gff.proteome.faa
-rw-r--r-- 1 ma11 staff 282267 May 27 13:14 pan_genome_results
-rw-r--r-- 1 ma11 staff 677878 May 27 13:14 NT45_03212015.gff.proteome.faa
drwxr-xr-x 38 ma11 staff 1292 May 27 13:16 .
-rw-r--r--@ 1 ma11 staff 15364 May 27 13:18 .DS_Store
-rw-r--r-- 1 ma11 staff 65 May 27 13:41 output_alignment.aln
-rw-r--r-- 1 ma11 staff 65 May 27 13:50 core_gene_alignment.aln
thanks

@andrewjpage
Copy link
Member

Thanks for that,
Could you email me the spreadsheet file called gene_presence_absence.csv ?
Its path-help@sanger.ac.uk as usual.
Regards,
Andrew

@andrewjpage andrewjpage added the bug label Jun 1, 2015
@andrewjpage
Copy link
Member

Hi Mushal,
I've just released a new version which I 'hope' will resolve the issue your having (2.3.0). Could you give it a whirl and let me know how you get along?
Andrew

@mushalallam
Copy link
Author

Many thanks @andrewjpage its working well :)

@andrewjpage
Copy link
Member

Thanks for letting me know.

@jsan4christ
Copy link

Hi @andrewjpage

I'm using roary 3.13.0 and have this problem:
--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet

and an alignment file the looks like this:


PRES009

PRES012

PRES014

PRES019

PRES021

PRES024

PRES025

PRES026

PRES028


The command line I used is:
roary -e --mafft -p 8 -t 1 -f prokka/gffs/roary_output/ prokka/gffs/*.gff

Please advise,

@tseemann
Copy link
Contributor

tseemann commented Feb 29, 2020

I think the current version of roary is 3.14.0
That warning comes from bioperl and it usually means you have lots of - or N letters in your sequence.
What version of prokka did you use.

@jsan4christ
Copy link

jsan4christ commented Mar 1, 2020 via email

@jsan4christ
Copy link

jsan4christ commented Mar 1, 2020 via email

@thorellk
Copy link

Hi! I would also like to revive this issue. I am running roary (3.12.0) on a dataset consisting of 2170 bacterial genomes. The command I ran was the following:

roary -p 16 -e -s -n -f roary_id85-s -i 85 *gff

The process runs seemingly fine and the correct output files are generated but I get the following error message twice

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
---------------------------------------------------

Also the core_gene_alignment.aln file is very small (seems to consist only of one gene or so) despite the summary statistics file stating that there should be 1141 core genes.

I have previously tried to QC my genomes by running sendsketch and validated with Kraken on dubious ones. I also made a mash tree to double check and remove outliers and by removing assemblies with over 200 contigs. After this I used prokka v 1.12 for annotation (I know it's an old version). Is this error message due to low quality/high divergence among the genomes as suggested by some answers I have found or N's in the sequences or what do you think? And most importantly; how can I mitigate it? I visualised the nwk and gene_presence_absence.csv file in Phandango and I cannot see any genome behaving weirdly (eg containing very few core genes/being very divergent from the others) there.

Thank you for your help!

@thorellk
Copy link

thorellk commented Mar 1, 2021

To follow up on this, is there anyway to identify the sequences that give rise to this error and modify them/exclude them? Since the error message was repeated twice I assume they are two? @andrewjpage @tseemann

@xin-bang
Copy link

xin-bang commented Jan 12, 2024

I had this problem too, but mine was caused by a roary version issue. When I initially installed conda, I didn't add a new channel to the conda config, which caused me to use: conda install bioconda::roary to install roary from conda's default chanel, version 3.7.0, instead of anaconda.org version of version 3.13.0. You can check your version with roary -w. Versions 3.7.0 will encounter this problem.
You can solve this troble with that commands :

conda config --add channels conda-forge
conda config --add channels r
conda config --add channels bioconda

and then useconda install bioconda::roaryto install roary to version 3.13.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants