Align Error: seq contains letters not in the alphabet #70

LeoBusse · 2021-06-02T22:07:26Z

Hello, thank you for developing a great tool!

I've been trying to figure out where the "ValueError: sequence contains letters not in the alphabet" error is coming from when I run my .gbf files/.gb files through Clinker. I went through issue #68 and I installed Clinker 0.0.21 through Conda again but to no avail. I have also tried the pip install but that didn't fix the problem. I double checked the align.py script on my local computer and it has the extend_matrix_alphabet addition, so I'm not sure what to do. You mentioned a quick fix would be to go through the sequence and delete anything not part of the extended IUPAC. Is there a particular way you recommend doing this? I have several sequences, so it seems like it would take a long time to identify anything wrong in the sequence (I would be looking for numbers, right?).

I attached an image with the traceback in case it's helpful.

Thank you so much!

gamcil · 2021-06-03T07:37:02Z

If your image is anything to go by, looks like your sequences have gaps (B starts with -) in them - I'll have to add them to the extended set. In the meantime, you could do a search and replace for the gap characters with X, or just delete them (it might only be a few rogue sequences, which is usually the case). I'll try and get a fix for this soon.

Looks like I also forgot to remove some logging calls there so thanks for reminding me haha.

LeoBusse · 2021-06-03T13:35:27Z

Thank you so much!

I looked for the gaps as suggested and it works perfectly now! I really appreciate the advice.

gamcil closed this as completed Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align Error: seq contains letters not in the alphabet #70

Align Error: seq contains letters not in the alphabet #70

LeoBusse commented Jun 2, 2021

gamcil commented Jun 3, 2021

LeoBusse commented Jun 3, 2021

Align Error: seq contains letters not in the alphabet #70

Align Error: seq contains letters not in the alphabet #70

Comments

LeoBusse commented Jun 2, 2021

gamcil commented Jun 3, 2021

LeoBusse commented Jun 3, 2021