Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colabfold_search 1.5.5 names first sequence in .a3m file universally 101 #568

Open
wresch opened this issue Feb 7, 2024 · 2 comments
Open

Comments

@wresch
Copy link

wresch commented Feb 7, 2024

Expected Behavior

In 1.5.2 the name of the first sequence was the same as the name of sequence in the input fasta

Current Behavior

in 1.5.5 the first sequence is alwas named 101

Steps to Reproduce (for bugs)

Please make sure to reproduce the issue after a "Factory Reset" in Colab.
If running locally ypdate you local installation colabfold_batch to the newest version.
Please provide your input if you can share it.

$ cat <<__EOF__ > test.fa
>P78344
MESAIAEGGASRFSASSGGGGSRGAPQHYPKTAGNSEFLGKTPGQNAQKWIPARSTRRDD
NSAANNSANEKERHDAIFRKVRGILNKLTPEKFDKLCLELLNVGVESKLILKGVILLIVD
KALEEPKYSSLYAQLCLRLAEDAPNFDGPAAEGQPGQKQSTTFRRLLISKLQDEFENRTR
NVDVYDKRENPLLPEEEEQRAIAKIKMLGNIKFIGELGKLDLIHESILHKCIKTLLEKKK
RVQLKDMGEDLECLCQIMRTVGPRLDHERAKSLMDQYFARMCSLMLSKELPARIRFLLQD
TVELREHHWVPRKAFLDNGPKTINQIRQDAVKDLGVFIPAPMAQGMRSDFFLEGPFMPPR
MKMDRDPLGGLADMFGQMPGSGIGTGPGVIQDRFSPTMGRHRSNQLFNGHGGHIMPPTQS
QFGEMGGKFMKSQGLSQLYHNQSQGLLSQLQGQSKDMPPRFSKKGQLNADEISLRPAQSF
LMNKNQVPKLQPQITMIPPSAQPPRTQTPPLGQTPQLGLKTNPPLIQEKPAKTSKKPPPS
KEELLKLTETVVTEYLNSGNANEAVNGVREMRAPKHFLPEMLSKVIILSLDRSDEDKEKA
SSLISLLKQEGIATSDNFMQAFLNVLDQCPKLEVDIPLVKSYLAQFAARAIISELVSISE
LAQPLESGTHFPLFLLCLQQLAKLQDREWLTELFQQSKVNMQKMLPEIDQNKDRMLEILE
GKGLSFLFPLLKLEKELLKQIKLDPSPQTIYKWIKDNISPKLHVDKGFVNILMTSFLQYI
SSEVNPPSDETDSSSAPSKEQLEQEKQLLLSFKPVMQKFLHDHVDLQVSALYALQVHCYN
SNFPKGMLLRFFVHFYDMEIIEEEAFLAWKEDITQEFPGKGKALFQVNQWLTWLETAEEE
ESEEEAD
...full file attached...
__EOF__
$ colabfold_search \
    --threads 16 \
    test.fa $COLABFOLD_DB cf_out
$ head -1 cf_out/0.a3m
>101

test.fa.gz

ColabFold Output (for bugs)

Please make sure to also post the complete ColabFold output. You can use gist.github.com for large output.

colabfold_search.log.gz

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

  • Run as a slurm job on different compute node
  • installed as a singularity container
  • run with the same amount of memory (128G) as was used to build the mmseqs databases
@YoshitakaMo
Copy link
Collaborator

in 1.5.5 the first sequence is always named 101

This behavior is expected, but the resultant a3m file name (0.a3m) is not expected and difficult to handle. I've pushed a commit to use each header sequence name for the resultant file. Please update your ColabFold and try again.

@Nieto-CaballeroVE
Copy link

Nieto-CaballeroVE commented Apr 2, 2024

Hello, related to this issue, I'm running colabfold through singularity using the latest Docker image (colabfold_1.5.5-cuda12.2.2.sif).
The output a3m files are numeric (e.g. 0.a3m) instead of the header sequence name, I guess because the latest Docker image was released before you fixed this issue. Could you update it, please?
Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants