New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error with phylosift search #501
Comments
you can specify a custom directory in the phylosiftrc file, make sure you
remove the # at the beginning of the line that you are using.
The default place that PS will look for things is in:
<$HOME>/share/phylosift
…On Thu, Jul 11, 2019 at 8:19 AM ucassee ***@***.***> wrote:
When I run phylosift search -help it went error with following:
NCBI taxonomy data not found and unable to connect to update server,
please check your phylosift configuration and internet connection!
at
/home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm
line 154.
I downloaded markers.tgz and ncbi.tgz. Where should I set them?
Thanks in advance
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTNJOHIZRI76P57TNP3P65FPXA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6VGS7A>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTI4LERZGAVWWGLT3IDP65FPXANCNFSM4IBKJEHA>
.
|
@gjospin Thanks for your reply.
|
you need your path to look like
/home/zhouyl/share/phylosift/ncbi
/home/zhouyl/share/phylosift/markers
…On Thu, Jul 11, 2019 at 9:10 AM ucassee ***@***.***> wrote:
@gjospin <https://github.com/gjospin> Thanks for your reply.
The phylosift is at /home/zhouyl/software/phylosift_v1.0.1/bin/ . I move
ncbi.tgz to /home/zhouyl/software/phylosift_v1.0.1/ . But it doesn't
work. It still with error
NCBI taxonomy data not found and unable to connect to update server,
please check your phylosift configuration and internet connection!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTLKPKAMXHQAXYXMVBTP65LONA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXGKMY#issuecomment-510551347>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTNOPGOUKMQR3XB5CVLP65LONANCNFSM4IBKJEHA>
.
|
@gjospin Thanks for your patience~ |
Change /home/zhouyl/software/phylosift_v1.0.1/phylosiftrc
find the line
# $marker_dir="";
change to
$marker_dir="/home/zhouyl/software/phylosift_v1.0.1/markers";
# $ncbi_dir
change to
$ncbi_dir="/home/zhouyl/software/phylosift_v1.0.1/ncbi"
…On Thu, Jul 11, 2019 at 9:23 AM ucassee ***@***.***> wrote:
@gjospin <https://github.com/gjospin> Thanks for your patience~
If I want to change the location of these two database, can I modify the
code
/home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm
? But in the line 154 I only see Phylosift::Utilities::data_checks( self
=> $self )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTJTFZ2U7CAK44OBXP3P65NALA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXHRSA#issuecomment-510556360>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTN4H2I5G5U3Q6PEM3DP65NALANCNFSM4IBKJEHA>
.
|
When I use phylosift to search conservative protein. I find there are too much same proteins in one of my genomes.
But when I use the protein sequences in these files to blast on ncbi, there is no significant similarity found. I fell confused. Could you help me ? |
When I see the
I think it is unlikely to have 35 |
I have another another question. I use the the sequences in the |
The search step is really permissive in the matches. It is really meant to
reduce the complexity of the alignment step so it doesn't take too long. I
would perform the align step and see what sticks after that.
Also keep in mind that PS was developed with short reads in mind. You may
want to adjust thresholds in the phylosiftrc file in the same way you
modified the paths yesterday. The thresholds might need to be adjusted
depending on the length of the marker you are looking at.
How big are the gaps? Could it be some frameshift happening? The hits are
done in protein space.
I hope this helps.
…On Fri, Jul 12, 2019 at 6:48 AM ucassee ***@***.***> wrote:
I have another another question. I use the the sequences in the
*candidate.ffn* file to blast with the original genome sequences. I find
when compared with original genome sequences, there are few gaps in the
sequences in *candidate.ffn* file. If the ffn was extracted from the
original genome, why are these gaps exist?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTOLGTK2AF2FZSYNCD3P7CDTJA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZZZYYI#issuecomment-510893153>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTNHFQY6IFBKJKRAJMDP7CDTJANCNFSM4IBKJEHA>
.
|
The gaps are not too big, just 3,6, or 9 bp in my sequences. |
No, I was suggesting to increase your threshold stringency to remove the
incorrect matches that should have lower hits. So if a default score is 150
in the phylosiftrc file, you would want to increase that to filter out
false positives. You could enforce a minimum number of bases getting
aligned also.
We have seen clades not have certain markers. It's possible marker 13
doesn't work well for your bug of interest. If the matches aren't good
enough, then this marker13 space in the concatenated alignment would be all
gaps. I would ignore it if you aren't happy with what comes out of it.
There is a way to give PS a list of markers (--custom flag), 1 per line to
only look at matches for markers in the list. You could give it the list of
37markers minus DNGNGWU00013
…On Fri, Jul 12, 2019 at 6:36 PM ucassee ***@***.***> wrote:
The gaps are not too big, just 3,6, or 9 bp in my sequences.
Most of the predicted DNGNGWU protein can find homologous protein when
blast in ncbi . But DNGNGWU00013 I mentioned before could not. So do you
suggest if I improve the threshold I can get more positive result?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTMQS6VVCOX5WY5EQZ3P7EWSRA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3GRZQ#issuecomment-511076582>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTKLO7X6MJVO5EODGBLP7EWSRANCNFSM4IBKJEHA>
.
|
Hi@jospin, Thanks for your hfelp. In the phylosiftrc file, I find too much parameters. I am not sure which one should I modify to increase the threshold. Is it the following ?
#$min_aligned_residues=50; |
I would target the $min_aligned_residues and extend that closer to the
gene(s) you are interested in.
You can find the length markers in the database's HMM files. (grep 'LEN'
DNGNG*/*.hmm for example, our system is down right now, so I can't check
the exact syntax).
Also keep in mind this is in AA space, so 50 represents 150 nucleotides.
…On Tue, Jul 16, 2019 at 6:13 AM ucassee ***@***.***> wrote:
***@***.***, Thanks for your hfelp. In the phylosiftrc file, I find too
much parameters. I am not sure which one should I modify to increase the
threshold. Is it the following ?
MarkerAlign default parameters
#$min_aligned_residues=50;
#$rna_split_size = 500; #sequences longer than this value will undergo the
long sequence pipeline
#$gap_character = "-";
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#501?email_source=notifications&email_token=AADQKTNZTPDIUNI6D5XAASTP7XCRHA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2AZU4I#issuecomment-511810161>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADQKTJ3M6KTFWHLL2OJCGLP7XCRHANCNFSM4IBKJEHA>
.
|
When I run
phylosift search -help
it went error with following:I downloaded markers.tgz and ncbi.tgz. Where should I set them?
Thanks in advance
The text was updated successfully, but these errors were encountered: