Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem running findtirs #42

Open
JennyHTLee opened this issue Aug 24, 2019 · 5 comments
Open

problem running findtirs #42

JennyHTLee opened this issue Aug 24, 2019 · 5 comments

Comments

@JennyHTLee
Copy link

Hello,

I run tephra all and obtain an error at the findtirs step:

INFO - Command - 'tephra findtirs' started at: 10-08-2019 04:52:59.
ERROR - 'gt tirvish' failed exit value: 1. Here is the output: warning: terminal_inverted_repeat_element (generated, line 0) is too short to be translated (0 nt), skipped domain search

When running findtirs alone:

'gt tirvish' failed exit value: 1. Here is the output: warning: terminal_inverte
d_repeat_element (generated, line 0) is too short to be translated (0 nt), skipp
ed domain search
/root/.tephra/gt/bin/gt tirvish: error: query seqid 'chrlg1' could match more th
an one sequence description

The other steps look fine based on the output files. This is the log:

root@bp:~/tephra/db# grep "Output\|Command" tephra_full.log
Command - 'tephra findltrs' started at: 06-08-2019 12:38:21.
Command - 'tephra findltrs' started at: 06-08-2019 12:38:39.
Command - 'tephra findltrs' completed at: 08-08-2019 14:42:41.
Output files - /db/genome_v2_tephra_ltrs.gff3
Output files - /db/genome_v2_tephra_ltrs.fasta
Command - 'tephra maskref' for LTRs started at: 08-08-2019 14:42:41.
Command - 'tephra maskref' completed at: 08-08-2019 18:27:24. Final output file:
Output files - /db/genome_v2_tephra_masked.fasta
Command - 'tephra findtrims' started at: 08-08-2019 18:27:24.
Command - 'tephra findtrims' completed at: 09-08-2019 16:39:19.
Command - 'tephra classifyltrs' started at: 09-08-2019 16:39:24.
Command - 'tephra classifyltrs' completed at: 09-08-2019 18:19:09.
Output files - /db/genome_v2_tephra_ltrs_trims_classified.gff3
Output files - /db/genome_v2_tephra_ltrs_trims_classified.fasta
Command - 'tephra age' started at: 09-08-2019 18:19:09.
Command - 'tephra age' completed at: 09-08-2019 18:20:09.
Output files - /db/genome_v2_tephra_ltrages.tsv
Command - 'tephra maskref' for classified LTRs/TRIMs started at: 09-08-2019 18:20:09.
Command - 'tephra maskref' completed at: 09-08-2019 20:22:53. Final output file:
Output files - /db/genome_v2_tephra_masked2.fasta
Command - 'tephra sololtr' started at: 09-08-2019 20:22:53.
Command - 'tephra sololtr' completed at: 09-08-2019 21:38:58.
Output files - /db/genome_v2_tephra_sololtrs.gff3
Output files - /db/genome_v2_tephra_sololtrs_rep.tsv
Output files - /db/genome_v2_tephra_sololtrs_seqs.fasta
Command - 'tephra illrecomb' started at: 09-08-2019 21:38:58.
Command - 'tephra illrecomb' completed at: 10-08-2019 00:05:24.
Output files - /db/genome_v2_tephra_illrecomb.fasta
Output files - /db/genome_v2_tephra_illrecomb_rep.tsv
Output files - /db/genome_v2_tephra_illrecomb_stats.tsv
Command - 'tephra findhelitrons' started at: 10-08-2019 00:05:24.
Command - 'tephra findhelitrons' completed at: 10-08-2019 04:30:59.
Output files - /db/genome_v2_tephra_helitrons.gff3
Output files - /db/genome_v2_tephra_helitrons.fasta
Command - 'tephra maskref' for Helitrons started at: 10-08-2019 04:31:01.
Command - 'tephra maskref' completed at: 10-08-2019 04:52:59. Final output file:
Output files - /db/genome_v2_tephra_masked3.fasta
Command - 'tephra findtirs' started at: 10-08-2019 04:52:59.
Command - 'tephra findtirs' completed at: 10-08-2019 05:32:57.
Command - 'tephra classifytirs' started at: 10-08-2019 05:32:57.

Tephra docker version was used:

tephra (Tephra) version 0.12.4 (/usr/local/bin/tephra)

Thanks for your help

Regards,
Jenny

@sestaton
Copy link
Owner

Hi,

It looks like you have some duplicate IDs in your genome file perhaps. We can check that with the commands below:

grep ">" genome.fas | sed 's/>//' | sort -u | wc -l

and

grep -c ">" genome.fas

If the output of those are different there may be duplicates or some other issue. Seeing the ID format would be helpful too.

grep ">" genome.fas | head

Thanks.

@JennyHTLee
Copy link
Author

Thanks for your reply,

There seems to be no duplicated IDs, I am not sure if ID is the real issue because the shortened/edited IDs do not solve the problem. It is also probably not one particular sequence causing this, as the same error was obtained using both the full set/subset.

What could be other possibilities? The run halted when the IDs were listed to the gff, there are 809 IDs in total and it stopped at 631.

Best regards,
Jenny

@sestaton
Copy link
Owner

Hi Jenny,

It appears there is something odd with the IDs or sequences, and I thought it might be caused by having duplicate IDs based on the message. Though, that is not the case so it must be something else. It could also be something with the code but it is hard to say.

Can you share the file with me? I'd like to test it myself because that may be faster than trying to propose solutions from a distance.

Thanks,
Evan

@JennyHTLee
Copy link
Author

Hi Evan,

Sure, I've shared the file "genome.fasta.gz" through fex via your email at evanstaton.com

Thanks for your help!

Best regards,
Jenny

@sestaton
Copy link
Owner

Just FYI, I can recreate the error. This is something in the GenomeTools library and not Tephra, so it's not immediately clear how to resolve it. I will likely have to reduce the error to the problematic sequence and raise the issue to that group, but I will keep this issue updated as I find out more.

Thanks,
Evan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants