error with phylosift search #501

ucassee · 2019-07-11T15:19:22Z

When I run phylosift search -help it went error with following:

NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection!
at /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm line 154.

I downloaded markers.tgz and ncbi.tgz. Where should I set them?

Thanks in advance

The text was updated successfully, but these errors were encountered:

gjospin · 2019-07-11T15:23:33Z

you can specify a custom directory in the phylosiftrc file, make sure you remove the # at the beginning of the line that you are using. The default place that PS will look for things is in: <$HOME>/share/phylosift

…

On Thu, Jul 11, 2019 at 8:19 AM ucassee ***@***.***> wrote: When I run phylosift search -help it went error with following: NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection! at /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm line 154. I downloaded markers.tgz and ncbi.tgz. Where should I set them? Thanks in advance — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTNJOHIZRI76P57TNP3P65FPXA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6VGS7A>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTI4LERZGAVWWGLT3IDP65FPXANCNFSM4IBKJEHA> .

ucassee · 2019-07-11T16:10:13Z

@gjospin Thanks for your reply.
The phylosift is at /home/zhouyl/software/phylosift_v1.0.1/bin/ . I move ncbi.tgz to /home/zhouyl/software/phylosift_v1.0.1/ . But it doesn't work. It still with error

NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection!

gjospin · 2019-07-11T16:13:24Z

you need your path to look like /home/zhouyl/share/phylosift/ncbi /home/zhouyl/share/phylosift/markers

…

On Thu, Jul 11, 2019 at 9:10 AM ucassee ***@***.***> wrote: @gjospin <https://github.com/gjospin> Thanks for your reply. The phylosift is at /home/zhouyl/software/phylosift_v1.0.1/bin/ . I move ncbi.tgz to /home/zhouyl/software/phylosift_v1.0.1/ . But it doesn't work. It still with error NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTLKPKAMXHQAXYXMVBTP65LONA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXGKMY#issuecomment-510551347>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTNOPGOUKMQR3XB5CVLP65LONANCNFSM4IBKJEHA> .

ucassee · 2019-07-11T16:23:33Z

@gjospin Thanks for your patience~
If I want to change the location of these two database, can I modify the code /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm ? But in the line 154 I only see Phylosift::Utilities::data_checks( self => $self )

gjospin · 2019-07-11T16:35:22Z

Change /home/zhouyl/software/phylosift_v1.0.1/phylosiftrc find the line # $marker_dir=""; change to $marker_dir="/home/zhouyl/software/phylosift_v1.0.1/markers"; # $ncbi_dir change to $ncbi_dir="/home/zhouyl/software/phylosift_v1.0.1/ncbi"

…

On Thu, Jul 11, 2019 at 9:23 AM ucassee ***@***.***> wrote: @gjospin <https://github.com/gjospin> Thanks for your patience~ If I want to change the location of these two database, can I modify the code /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm ? But in the line 154 I only see Phylosift::Utilities::data_checks( self => $self ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTJTFZ2U7CAK44OBXP3P65NALA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXHRSA#issuecomment-510556360>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTN4H2I5G5U3Q6PEM3DP65NALANCNFSM4IBKJEHA> .

ucassee · 2019-07-12T12:30:34Z

When I use phylosift to search conservative protein. I find there are too much same proteins in one of my genomes.

-rw-r--r-- 1 zhouyl microbial 1515 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.10
-rw-r--r-- 1 zhouyl microbial 1440 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.11
-rw-r--r-- 1 zhouyl microbial 1151 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.14
-rw-r--r-- 1 zhouyl microbial 377 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.15
-rw-r--r-- 1 zhouyl microbial 761 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.16
-rw-r--r-- 1 zhouyl microbial 750 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.19
-rw-r--r-- 1 zhouyl microbial 771 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.2
-rw-r--r-- 1 zhouyl microbial 389 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.20
-rw-r--r-- 1 zhouyl microbial 379 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.3
-rw-r--r-- 1 zhouyl microbial 1138 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.4
-rw-r--r-- 1 zhouyl microbial 1112 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.5
-rw-r--r-- 1 zhouyl microbial 373 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.6
-rw-r--r-- 1 zhouyl microbial 773 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.7
-rw-r--r-- 1 zhouyl microbial 413 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.8
-rw-r--r-- 1 zhouyl microbial 1159 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.9

But when I use the protein sequences in these files to blast on ncbi, there is no significant similarity found. I fell confused. Could you help me ?

ucassee · 2019-07-12T12:47:44Z

When I see the marker_summary.txt.

DNGNGWU00010 1
DNGNGWU00011 1
DNGNGWU00012 1
DNGNGWU00013 35
DNGNGWU00014 1
DNGNGWU00015 1
DNGNGWU00016 1

I think it is unlikely to have 35 DNGNGWU00013 proteins in the genome. And combined with no hit in ncbi, I guess whether it is a mistake in the search progress.

ucassee · 2019-07-12T13:48:36Z

I have another another question. I use the the sequences in the *candidate.ffn* file to blast with the original genome sequences. I find when compared with original genome sequences, there are few gaps in the sequences in *candidate.ffn* file. If the ffn was extracted from the original genome, why are these gaps exist?

gjospin · 2019-07-12T16:05:35Z

The search step is really permissive in the matches. It is really meant to reduce the complexity of the alignment step so it doesn't take too long. I would perform the align step and see what sticks after that. Also keep in mind that PS was developed with short reads in mind. You may want to adjust thresholds in the phylosiftrc file in the same way you modified the paths yesterday. The thresholds might need to be adjusted depending on the length of the marker you are looking at. How big are the gaps? Could it be some frameshift happening? The hits are done in protein space. I hope this helps.

…

On Fri, Jul 12, 2019 at 6:48 AM ucassee ***@***.***> wrote: I have another another question. I use the the sequences in the *candidate.ffn* file to blast with the original genome sequences. I find when compared with original genome sequences, there are few gaps in the sequences in *candidate.ffn* file. If the ffn was extracted from the original genome, why are these gaps exist? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTOLGTK2AF2FZSYNCD3P7CDTJA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZZZYYI#issuecomment-510893153>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTNHFQY6IFBKJKRAJMDP7CDTJANCNFSM4IBKJEHA> .

ucassee · 2019-07-13T01:36:40Z

The gaps are not too big, just 3,6, or 9 bp in my sequences.
Most of the predicted DNGNGWU protein can find homologous protein when blast in ncbi . But DNGNGWU00013 I mentioned before could not. So do you suggest if I improve the threshold I can get more positive result?
I can't find the threshold setting line in Phylosift.pm. Could you help me ?

gjospin · 2019-07-15T17:19:36Z

No, I was suggesting to increase your threshold stringency to remove the incorrect matches that should have lower hits. So if a default score is 150 in the phylosiftrc file, you would want to increase that to filter out false positives. You could enforce a minimum number of bases getting aligned also. We have seen clades not have certain markers. It's possible marker 13 doesn't work well for your bug of interest. If the matches aren't good enough, then this marker13 space in the concatenated alignment would be all gaps. I would ignore it if you aren't happy with what comes out of it. There is a way to give PS a list of markers (--custom flag), 1 per line to only look at matches for markers in the list. You could give it the list of 37markers minus DNGNGWU00013

…

On Fri, Jul 12, 2019 at 6:36 PM ucassee ***@***.***> wrote: The gaps are not too big, just 3,6, or 9 bp in my sequences. Most of the predicted DNGNGWU protein can find homologous protein when blast in ncbi . But DNGNGWU00013 I mentioned before could not. So do you suggest if I improve the threshold I can get more positive result? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTMQS6VVCOX5WY5EQZ3P7EWSRA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3GRZQ#issuecomment-511076582>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTKLO7X6MJVO5EODGBLP7EWSRANCNFSM4IBKJEHA> .

ucassee · 2019-07-16T13:13:54Z

Hi@jospin, Thanks for your hfelp. In the phylosiftrc file, I find too much parameters. I am not sure which one should I modify to increase the threshold. Is it the following ?

MarkerAlign default parameters

#$min_aligned_residues=50;
#$rna_split_size = 500; #sequences longer than this value will undergo the long sequence pipeline
#$gap_character = "-";

gjospin · 2019-07-16T17:33:45Z

I would target the $min_aligned_residues and extend that closer to the gene(s) you are interested in. You can find the length markers in the database's HMM files. (grep 'LEN' DNGNG*/*.hmm for example, our system is down right now, so I can't check the exact syntax). Also keep in mind this is in AA space, so 50 represents 150 nucleotides.

…

On Tue, Jul 16, 2019 at 6:13 AM ucassee ***@***.***> wrote: ***@***.***, Thanks for your hfelp. In the phylosiftrc file, I find too much parameters. I am not sure which one should I modify to increase the threshold. Is it the following ? MarkerAlign default parameters #$min_aligned_residues=50; #$rna_split_size = 500; #sequences longer than this value will undergo the long sequence pipeline #$gap_character = "-"; — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#501?email_source=notifications&email_token=AADQKTNZTPDIUNI6D5XAASTP7XCRHA5CNFSM4IBKJEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2AZU4I#issuecomment-511810161>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQKTJ3M6KTFWHLL2OJCGLP7XCRHANCNFSM4IBKJEHA> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error with phylosift search #501

error with phylosift search #501

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 12, 2019

ucassee commented Jul 12, 2019

ucassee commented Jul 12, 2019

gjospin commented Jul 12, 2019 via email

ucassee commented Jul 13, 2019 •

edited

gjospin commented Jul 15, 2019 via email

ucassee commented Jul 16, 2019

MarkerAlign default parameters

gjospin commented Jul 16, 2019 via email

error with phylosift search #501

error with phylosift search #501

Comments

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 11, 2019

gjospin commented Jul 11, 2019 via email

ucassee commented Jul 12, 2019

ucassee commented Jul 12, 2019

ucassee commented Jul 12, 2019

gjospin commented Jul 12, 2019 via email

ucassee commented Jul 13, 2019 • edited

gjospin commented Jul 15, 2019 via email

ucassee commented Jul 16, 2019

MarkerAlign default parameters

gjospin commented Jul 16, 2019 via email

ucassee commented Jul 13, 2019 •

edited