Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in align #68

Closed
kabilov opened this issue May 27, 2021 · 5 comments
Closed

Error in align #68

kabilov opened this issue May 27, 2021 · 5 comments

Comments

@kabilov
Copy link

kabilov commented May 27, 2021

Hi Cameron!

I installed clinker v0.0.20 by using conda.
When I ran soft for two genome which are downloaded from Genbank the clinker gives error.
Is there any solution to this problem?

Best wishes,
Marsel


[09:02:59] INFO - Starting clinker
[09:02:59] INFO - Parsing files:
[09:02:59] INFO - PB12_4term_CP048407.gbk
/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
[09:03:03] INFO - T.marianensis_NC_014831.gbk
[09:03:06] INFO - Starting cluster alignments
[09:03:07] INFO - PB12_4term_CP048407 vs T.marianensis_NC_014831
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 377, in _align_clusters
aln = aligner.align(geneA.translation, geneB.translation)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Align/init.py", line 1592, in align
score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB)
ValueError: sequence contains letters not in the alphabet
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/kabilov/anaconda3/envs/clinker/bin/clinker", line 10, in
sys.exit(main())
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 283, in main
clinker(
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 135, in clinker
globaligner = align.align_clusters(*clusters, cutoff=identity, jobs=jobs)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 57, in align_clusters
aligner.align_stored_clusters(cutoff, jobs=jobs)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 404, in align_stored_clusters
alignments = pool.starmap(_align_clusters, pairs_to_align)
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: sequence contains letters not in the alphabet

@gamcil
Copy link
Owner

gamcil commented May 28, 2021

Hi Marsel,

This happens when the protein translation sequence contains invalid letters, causing the aligner to error. I actually ran into this a while ago, and fixed the issue back then by extending the alphabet used by the aligner (345935a). Some of the protein sequences in your files must have something other than extended IUPAC codes in them (ACDEFGHIKLMNPQRSTVWYBXZJUO), so you'll have to check - the quick fix would be to delete/change anything in the sequence that is not a valid code from that list.

@kabilov
Copy link
Author

kabilov commented May 28, 2021

Hi Cameron,

Do you think gb files from Genbank include invalid letters?
Is there another possible reason?

@gamcil
Copy link
Owner

gamcil commented May 28, 2021

Oops. looks like the code for extending the matrix had a bug. Tried aligning your sequences - was crashing on one that contained a U, which is in the extended codes. Just pushed a fix that should resolve this - could you try updating to clinker 0.0.21?

@kabilov
Copy link
Author

kabilov commented May 29, 2021

Thanks a lot, now everything works!
When will the conda update happen?

@gamcil
Copy link
Owner

gamcil commented May 30, 2021

Glad that fixed it :)

Not sure, that should happen automatically - looks like it's ticked up to 0.0.21 now (https://bioconda.github.io/recipes/clinker-py/README.html).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants