You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to figure out where the "ValueError: sequence contains letters not in the alphabet" error is coming from when I run my .gbf files/.gb files through Clinker. I went through issue #68 and I installed Clinker 0.0.21 through Conda again but to no avail. I have also tried the pip install but that didn't fix the problem. I double checked the align.py script on my local computer and it has the extend_matrix_alphabet addition, so I'm not sure what to do. You mentioned a quick fix would be to go through the sequence and delete anything not part of the extended IUPAC. Is there a particular way you recommend doing this? I have several sequences, so it seems like it would take a long time to identify anything wrong in the sequence (I would be looking for numbers, right?).
I attached an image with the traceback in case it's helpful.
Thank you so much!
The text was updated successfully, but these errors were encountered:
If your image is anything to go by, looks like your sequences have gaps (B starts with -) in them - I'll have to add them to the extended set. In the meantime, you could do a search and replace for the gap characters with X, or just delete them (it might only be a few rogue sequences, which is usually the case). I'll try and get a fix for this soon.
Looks like I also forgot to remove some logging calls there so thanks for reminding me haha.
Hello, thank you for developing a great tool!
I've been trying to figure out where the "ValueError: sequence contains letters not in the alphabet" error is coming from when I run my .gbf files/.gb files through Clinker. I went through issue #68 and I installed Clinker 0.0.21 through Conda again but to no avail. I have also tried the pip install but that didn't fix the problem. I double checked the align.py script on my local computer and it has the extend_matrix_alphabet addition, so I'm not sure what to do. You mentioned a quick fix would be to go through the sequence and delete anything not part of the extended IUPAC. Is there a particular way you recommend doing this? I have several sequences, so it seems like it would take a long time to identify anything wrong in the sequence (I would be looking for numbers, right?).
I attached an image with the traceback in case it's helpful.
Thank you so much!
The text was updated successfully, but these errors were encountered: