New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in align #68
Comments
Hi Marsel, This happens when the protein translation sequence contains invalid letters, causing the aligner to error. I actually ran into this a while ago, and fixed the issue back then by extending the alphabet used by the aligner (345935a). Some of the protein sequences in your files must have something other than extended IUPAC codes in them (ACDEFGHIKLMNPQRSTVWYBXZJUO), so you'll have to check - the quick fix would be to delete/change anything in the sequence that is not a valid code from that list. |
Hi Cameron, Do you think gb files from Genbank include invalid letters? |
Oops. looks like the code for extending the matrix had a bug. Tried aligning your sequences - was crashing on one that contained a U, which is in the extended codes. Just pushed a fix that should resolve this - could you try updating to clinker 0.0.21? |
Thanks a lot, now everything works! |
Glad that fixed it :) Not sure, that should happen automatically - looks like it's ticked up to 0.0.21 now (https://bioconda.github.io/recipes/clinker-py/README.html). |
Hi Cameron!
I installed clinker v0.0.20 by using conda.
When I ran soft for two genome which are downloaded from Genbank the clinker gives error.
Is there any solution to this problem?
Best wishes,
Marsel
[09:02:59] INFO - Starting clinker
[09:02:59] INFO - Parsing files:
[09:02:59] INFO - PB12_4term_CP048407.gbk
/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
[09:03:03] INFO - T.marianensis_NC_014831.gbk
[09:03:06] INFO - Starting cluster alignments
[09:03:07] INFO - PB12_4term_CP048407 vs T.marianensis_NC_014831
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 377, in _align_clusters
aln = aligner.align(geneA.translation, geneB.translation)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Align/init.py", line 1592, in align
score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB)
ValueError: sequence contains letters not in the alphabet
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kabilov/anaconda3/envs/clinker/bin/clinker", line 10, in
sys.exit(main())
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 283, in main
clinker(
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 135, in clinker
globaligner = align.align_clusters(*clusters, cutoff=identity, jobs=jobs)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 57, in align_clusters
aligner.align_stored_clusters(cutoff, jobs=jobs)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 404, in align_stored_clusters
alignments = pool.starmap(_align_clusters, pairs_to_align)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: sequence contains letters not in the alphabet
The text was updated successfully, but these errors were encountered: