Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking rna_virus search to original contig #2

Closed
mihinduk opened this issue Jun 25, 2020 · 8 comments
Closed

Linking rna_virus search to original contig #2

mihinduk opened this issue Jun 25, 2020 · 8 comments

Comments

@mihinduk
Copy link

Hi,

Is it possible to have the original contig name in the output include the information after the space so that the name of the contig from the original contig dictionary would be captured?

KM_ct2761 contig_1689 (from non_viral_domains_contigs.fna) would be reported as KM_ct2761 contig_1689, not just KM_ct2761. This would save a lookup step.

Thank you,
Kathie Mihindukulasuriya

@mtisza1
Copy link
Owner

mtisza1 commented Jul 1, 2020

Hi Kathie,

My apologies for the slow response. I've been on vacation for the past week.

I think I understand your issue, and Cenote-Taker2 should be reporting the contigs in 'non_viral_domains_contigs.fna' exactly as you are requesting it. One consideration is that it will only retain header information before the first whitespace character. Is it possible that you included a whitespace directly after the '>' character?

If possible please attach the create a new .fasta file from "contig_1689" formatted like your original .fasta and I'll see if I can recapitulate the error.

Best,

Mike

P.S. Here's an example from the test contigs. Original .fasta input:
image

From non_viral_domains_contigs.fna:
image

@mihinduk
Copy link
Author

mihinduk commented Jul 1, 2020 via email

@mtisza1
Copy link
Owner

mtisza1 commented Jul 1, 2020

Kathie,

I'm not seeing the files you referred to. Could you try attaching them once more or send them to my email at michael.tisza@gmail.com?

Mike

@mtisza1
Copy link
Owner

mtisza1 commented Jul 1, 2020

OK I got your files, and they look as I expected. I think I misunderstood your question.

I thought you just wanted the string 'contig_1689' present in the output files, which it does in the various .fna, .fsa, .gbf, and tsv files.
Were you hoping that the files produced from the analysis of contig_1689 would contain the string 'contig_1689' in the file name? This would cause some issues for me and I wouldn't really want to do it. That said, you could write a bash script renaming the files after the run is over. Such as with the script below. Let me know if I'm understanding you correctly, and, if I'm not, please be more explicit about what you'd like from the output.



#!/bin/bash

for FSA in *fsa ; do
	ORIGINAL_TITLE=$( head -n1 $FSA | sed 's/.*note= \(.*\) ; .*/\1/' )
	echo ${FSA%.fsa} to ${ORIGINAL_TITLE} 
	mv $FSA ${ORIGINAL_TITLE}_${FSA}
	mv ${FSA%.fsa}.gbf ${ORIGINAL_TITLE}_${FSA%.fsa}.gbf
	mv ${FSA%.fsa}.cmt ${ORIGINAL_TITLE}_${FSA%.fsa}.cmt
	mv ${FSA%.fsa}.tbl ${ORIGINAL_TITLE}_${FSA%.fsa}.tbl
	mv ${FSA%.fsa}.val ${ORIGINAL_TITLE}_${FSA%.fsa}.val
	mv ${FSA%.fsa}.sqn ${ORIGINAL_TITLE}_${FSA%.fsa}.sqn
done

@mihinduk
Copy link
Author

mihinduk commented Jul 1, 2020 via email

@mtisza1
Copy link
Owner

mtisza1 commented Jul 2, 2020

OK so I was clearly misunderstanding your question before. I'm sorry about that.

There are a variety of reasons that I'm not willing to change the pipeline to preserve the fasta header information beyond the first whitespace character, so, unfortunately, you'll have to parse the 'non_viral_domains_contigs.fna' file before feeding it to another cenote-taker2 run. A quick manipulation would be to remove the space character in (from your example) 'KM_ct2761 contig_1689'. For example you could change it to an '@' character (KM_ct2761@contig_1689), so the information would all be preserved:

sed 's/ /@/g' non_viral_domains_contigs.fna > contigs_for_next_run.fna

I don't think this was the answer you were hoping for, but I'm hoping it won't be too onerous either.

Best,

Mike

@mihinduk
Copy link
Author

mihinduk commented Jul 2, 2020 via email

@mtisza1
Copy link
Owner

mtisza1 commented Jul 2, 2020

OK, great. And, please make me aware of any additional issues you encounter.

I am now closing this issue.

@mtisza1 mtisza1 closed this as completed Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants