Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with phylosift search #501

Open
ucassee opened this issue Jul 11, 2019 · 13 comments
Open

error with phylosift search #501

ucassee opened this issue Jul 11, 2019 · 13 comments

Comments

@ucassee
Copy link

ucassee commented Jul 11, 2019

When I run phylosift search -help it went error with following:

NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection!
at /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm line 154.

I downloaded markers.tgz and ncbi.tgz. Where should I set them?

Thanks in advance

@gjospin
Copy link
Owner

gjospin commented Jul 11, 2019 via email

@ucassee
Copy link
Author

ucassee commented Jul 11, 2019

@gjospin Thanks for your reply.
The phylosift is at /home/zhouyl/software/phylosift_v1.0.1/bin/ . I move ncbi.tgz to /home/zhouyl/software/phylosift_v1.0.1/ . But it doesn't work. It still with error

NCBI taxonomy data not found and unable to connect to update server, please check your phylosift configuration and internet connection!

@gjospin
Copy link
Owner

gjospin commented Jul 11, 2019 via email

@ucassee
Copy link
Author

ucassee commented Jul 11, 2019

@gjospin Thanks for your patience~
If I want to change the location of these two database, can I modify the code /home/zhouyl/software/phylosift_v1.0.1/bin/../lib/Phylosift/Phylosift.pm ? But in the line 154 I only see Phylosift::Utilities::data_checks( self => $self )

@gjospin
Copy link
Owner

gjospin commented Jul 11, 2019 via email

@ucassee
Copy link
Author

ucassee commented Jul 12, 2019

When I use phylosift to search conservative protein. I find there are too much same proteins in one of my genomes.

-rw-r--r-- 1 zhouyl microbial 1515 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.10
-rw-r--r-- 1 zhouyl microbial 1440 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.11
-rw-r--r-- 1 zhouyl microbial 1151 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.14
-rw-r--r-- 1 zhouyl microbial 377 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.15
-rw-r--r-- 1 zhouyl microbial 761 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.16
-rw-r--r-- 1 zhouyl microbial 750 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.19
-rw-r--r-- 1 zhouyl microbial 771 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.2
-rw-r--r-- 1 zhouyl microbial 389 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.20
-rw-r--r-- 1 zhouyl microbial 379 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.3
-rw-r--r-- 1 zhouyl microbial 1138 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.4
-rw-r--r-- 1 zhouyl microbial 1112 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.5
-rw-r--r-- 1 zhouyl microbial 373 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.6
-rw-r--r-- 1 zhouyl microbial 773 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.7
-rw-r--r-- 1 zhouyl microbial 413 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.8
-rw-r--r-- 1 zhouyl microbial 1159 Jul 12 12:31 DNGNGWU00013.lastal.candidate.aa.1.9

But when I use the protein sequences in these files to blast on ncbi, there is no significant similarity found. I fell confused. Could you help me ?

@ucassee
Copy link
Author

ucassee commented Jul 12, 2019

When I see the marker_summary.txt.

DNGNGWU00010 1
DNGNGWU00011 1
DNGNGWU00012 1
DNGNGWU00013 35
DNGNGWU00014 1
DNGNGWU00015 1
DNGNGWU00016 1

I think it is unlikely to have 35 DNGNGWU00013 proteins in the genome. And combined with no hit in ncbi, I guess whether it is a mistake in the search progress.

@ucassee
Copy link
Author

ucassee commented Jul 12, 2019

I have another another question. I use the the sequences in the *candidate.ffn* file to blast with the original genome sequences. I find when compared with original genome sequences, there are few gaps in the sequences in *candidate.ffn* file. If the ffn was extracted from the original genome, why are these gaps exist?

@gjospin
Copy link
Owner

gjospin commented Jul 12, 2019 via email

@ucassee
Copy link
Author

ucassee commented Jul 13, 2019

The gaps are not too big, just 3,6, or 9 bp in my sequences.
Most of the predicted DNGNGWU protein can find homologous protein when blast in ncbi . But DNGNGWU00013 I mentioned before could not. So do you suggest if I improve the threshold I can get more positive result?
I can't find the threshold setting line in Phylosift.pm. Could you help me ?

@gjospin
Copy link
Owner

gjospin commented Jul 15, 2019 via email

@ucassee
Copy link
Author

ucassee commented Jul 16, 2019

Hi@jospin, Thanks for your hfelp. In the phylosiftrc file, I find too much parameters. I am not sure which one should I modify to increase the threshold. Is it the following ?

MarkerAlign default parameters

#$min_aligned_residues=50;
#$rna_split_size = 500; #sequences longer than this value will undergo the long sequence pipeline
#$gap_character = "-";

@gjospin
Copy link
Owner

gjospin commented Jul 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants