Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assess_specific_genes is taking days #15

Open
Malabady opened this issue Mar 28, 2024 · 6 comments
Open

assess_specific_genes is taking days #15

Malabady opened this issue Mar 28, 2024 · 6 comments

Comments

@Malabady
Copy link

Hello,

I am running the last workflow in the pipeline to assess four specific genes. My analysis included 10 plant species. the problem is that, this step is been running for days and it seems to be stuck at the KAKS analysis, see the following:

``
executor > local (6)
[3d/1f0900] process > DOWNLOAD_GENE_AASEQ [100%] 1 of 1 ✔
[db/177126] process > IDENTIFY_GENES_FROM_ORTHOGROUPS [100%] 1 of 1 ✔
[53/7659f6] process > GENE_EXPANSION_CONTRACTION [100%] 1 of 1 ✔
[ad/25dfde] process > EXTRACT_GENE_CDS [100%] 1 of 1 ✔
[14/f4c9be] process > ALIGN_GENE_CDS [100%] 1 of 1 ✔
[91/6cddca] process > KAKS [ 0%] 0 of 1

``
Is this normal?

@jacquelinemattos
Copy link

Hi,
I'm also having some issues in this step. Apparently it's not being able to download the fasta file from NCBI.

@Malabady I see yours passed through this step.
Could you please share how you formatted your "genes.txt" please?

Thanks!

@Malabady
Copy link
Author

Malabady commented Apr 5, 2024

Hi @jacquelinemattos
Here is my gene.txt file:

GSTU13,Arabidopsis_thaliana,https://www.uniprot.org/uniprotkb/Q9FUS6.fasta
Q3ECW8,Arabidopsis_thaliana,https://www.uniprot.org/uniprotkb/Q3ECW8.fasta
A0A1L7NZV0,Sarracenia_purpurea,https://www.uniprot.org/uniprotkb/A0A1L7NZV0.fasta
A0A1L7NZV4,Sarracenia_purpurea,https://www.uniprot.org/uniprotkb/A0A1L7NZV4.fasta
A0A1L7NZU7,Sarracenia_purpurea,https://www.uniprot.org/uniprotkb/A0A1L7NZU7.fasta

if you are using NCBI links, test the links first with wget.
Good luck.

@jacquelinemattos
Copy link

Hi @Malabady

Thanks for the reply and suggestion.
You're right, I tried with wget and the link is not working, for some reason. But when I put it on the web browser it works normally.

This is my genes.txt file:
AGL12,Arabidopsis_thaliana,https://www.ncbi.nlm.nih.gov/nuccore/U20193.1?report=fasta ANR1,Citrus_trifoliata,https://www.ncbi.nlm.nih.gov/nuccore/OP009587.1?report=fasta

Do you have any suggestions?
Thanks!!

@Malabady
Copy link
Author

Malabady commented Apr 5, 2024

Hi @jacquelinemattos

Your NCBI links are pointing to the HTML report page, not the plain fasta file. so, the workflow is not able to access the sequencing.
I suggest using the uniport database, here is your two genes

https://www.uniprot.org/uniprotkb/Q38841.fasta
https://www.uniprot.org/uniprotkb/Q9SI38.fasta

(please download them with wget to ensure they are the right genes)

Also, I noticed that you're pointing to nucleotide sequences, not protein sequences. I think the tool is expecting protein sequences (please double check).

Cheers,

@Malabady
Copy link
Author

Malabady commented Apr 5, 2024

sorry, the link to the citrus gene is: https://rest.uniprot.org/uniparc/UPI0003D74F46.fasta

@jacquelinemattos
Copy link

Hi @Malabady

Thanks so much for your help. I'm gonna try to use these uniprot links now - they worked using the wget, so let's hope it also works within the pipeline. I'll check if needs to be protein instead of nucleotides.

Thank you very much!
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants