Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with alignment #37

Open
MarineBio-LKRod opened this issue Aug 24, 2023 · 10 comments
Open

Error with alignment #37

MarineBio-LKRod opened this issue Aug 24, 2023 · 10 comments

Comments

@MarineBio-LKRod
Copy link

MarineBio-LKRod commented Aug 24, 2023

Hello!

I am getting the error "FileNotFoundError: [Errno 2] No such file or directory: 'ESV_007833.aln'" after running BLCA with this code (including a custom database): python 2.blca_main.py -i 2022.fasta -r TAXONOMY.txt -q CBBI_12S_renamed.fas.

I have tried (fix #26) incorporating "-p 1" into the script, but that leads me then to the common "ValueError: max() arg is an empty sequence" issue. Following this, I checked all of my files for potential incorrect formatting (tabs/speces, proper placement of : and ;, etc) and cannot determine any problem within the source files - even though the test.fasta file runs properly within the same script. I've reinstalled Clustalo as well as I thought that may be the issue since I got this error at one point: "FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support."

I'm not sure what else I can do at this point. The problem at its core seemingly stems from the creation of .aln files but I haven't seen anyone else post about this particular issue.

Thank you!

@qunfengdong
Copy link
Owner

Hmm, could you please test whether your installation of clustalo is successful? For example, use the following example sequences (you can also get those sequences from https://www.ebi.ac.uk/Tools/msa/clustalo/, click Use a [example sequence] in that URL):

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2
MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG
KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP
AVHASLDKFLASVSTVLTSKYR
sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2
MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG
SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA
DAHAAWDKFLSIVSGVLTEKYR

@qunfengdong
Copy link
Owner

Put the following test sequences in a file, and use it as input to test your clustalo program to see if it can successfully produce a multiple-sequence-alignment.

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2
MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG
KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP
AVHASLDKFLASVSTVLTSKYR
sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2
MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG
SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA
DAHAAWDKFLSIVSGVLTEKYR

@qunfengdong
Copy link
Owner

make sure that ">" is used for each headline (that is, make sure that those sequences are in FASTA format)

@MarineBio-LKRod
Copy link
Author

MarineBio-LKRod commented Aug 25, 2023

Hello again, thanks for the quick reply.

I ran the sequences that you've provided above (not with my custom database but with the default):
python 2.blca_main.py -i testtest.tex
And then received this error:
Warning: [blastn] Query_2 sp|P01942|HBA_MOU.. : Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options
I can see that in this case, the .blastn file was created but it's empty. The .blca.out file has each sequence listed as "Unclassified". This page indicates that the "ungapped Karlin-Altschul parameters" is not a fatal issue. Is the default database not appropriate for these sequences?

To test out Clustalo further, I went back to the source code example.
After running the original test.fasta file with the default database, I get the error I received earlier:

FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support.
Traceback (most recent call last):
  File "2.blca_main.py", line 350, in <module>
    alndic = get_dic_from_aln(k1 + ".aln")
  File "2.blca_main.py", line 82, in get_dic_from_aln
    alignment = AlignIO.read(aln, "clustal")
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 383, in read
    alignment = next(iterator)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 322, in parse
    with as_handle(handle) as fp:
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'seq1.aln'

However, everything then worked perfectly when running "python 2.blca_main.py -i test.fasta -p 1"! Upon seeing this, I retried my code with my fasta, taxonomy, and database file with "-p 1" at the end of the line. Again, I got:

>  > Start aligning reads...
FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support.
Traceback (most recent call last):
  File "2.blca_main.py", line 350, in <module>
    alndic = get_dic_from_aln(k1 + ".aln")
  File "2.blca_main.py", line 82, in get_dic_from_aln
    alignment = AlignIO.read(aln, "clustal")
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 383, in read
    alignment = next(iterator)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 322, in parse
    with as_handle(handle) as fp:
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'ESV_000419.aln'

The fact that the program is able to run with the default parameters (plus -p 1) but not with my own leads me to think that there is a problem in my files but I have checked things over many times and cannot find anything. Am I able to send you the files to double check? Or do you suspect something is truly erroneous with clustalo?

Thank you again for your help.

@MarineBio-LKRod
Copy link
Author

I found a page describing how to force OpenMP to support multi-threading, in case that's the core issue.
The code that they provide was successful but I am still getting an error that "Clustal Omega was build without OpenMP support"

@qunfengdong
Copy link
Owner

qunfengdong commented Aug 27, 2023 via email

@qunfengdong
Copy link
Owner

When I sent you the sequences for multiple sequence alignment, it was NOT for testing BLCA. Those test case was for testing your clustalo installation. That is, run your clustalo program with those sequences to see if you can successfully produce a multiple sequence alignment. If not, something is wrong with your clustalo installation.

@qunfengdong
Copy link
Owner

Using the sequences below, can you successfully produce multiple sequence alignment with the clustalo you installed?

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2
MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG
KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP
AVHASLDKFLASVSTVLTSKYR
sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2
MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG
SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA
DAHAAWDKFLSIVSGVLTEKYR

@MarineBio-LKRod
Copy link
Author

Oh, my bad! I used the example sequences you provided 3 days ago with a basic "clustalo -i testtest.tex -o test.fa" and got the proper output.

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2
MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG
KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP
AVHASLDKFLASVSTVLTSKYR
sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2
MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG
SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA
DAHAAWDKFLSIVSGVLTEKYR

@qunfengdong
Copy link
Owner

Thanks for sending us your input files. One thing that I noticed is that the ID in the Taxonomy file and the BLAST database file are different. According to our instructions at https://github.com/qunfengdong/BLCA, if you are using a custom database, the ID in the BLAST database file should be the same as in the Taxonomy file. For example, there is an ID "SERCFISH1257" in your taxonomy file, there should be a record in your BLAST database file with ID "SERCFISH1257". Instead, you have an ID "Acantharchus-pomotis_SERCFISH1257". Please reformat accordingly and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants