Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to select the neighbors genomes? #13

Open
wangzhichao1990 opened this issue Sep 12, 2023 · 8 comments
Open

How to select the neighbors genomes? #13

wangzhichao1990 opened this issue Sep 12, 2023 · 8 comments

Comments

@wangzhichao1990
Copy link

Hi,

Thanks for developing this software.

I would like to ask how to select the neighbors genomes. Sometimes there are no sequences in the results of fur.

Thanks.

@haubold
Copy link
Contributor

haubold commented Sep 12, 2023

We are developing the package neighbors for that. The Tutorial in its documentation might help you complete a given set of targets and to find the corresponding neighbors.

@wangzhichao1990
Copy link
Author

We are developing the package neighbors for that. The Tutorial in its documentation might help you complete a given set of targets and to find the corresponding neighbors.

Thank you for your reply. Is this package currently available? I basically choose other genomes of the same genus as the neighbors, maybe I need to change the - q parameter.

@haubold
Copy link
Contributor

haubold commented Sep 13, 2023

Yes, that package is available. A common problem with fur is mixture between target and neighbor genomes. To make sure this doesn't happen, one should reconstruct a phylogeny from the genomes. If you'd like to see how this is done, take a look at the tutorial in the neighbors package.

@wangzhichao1990
Copy link
Author

Yes, that package is available. A common problem with fur is mixture between target and neighbor genomes. To make sure this doesn't happen, one should reconstruct a phylogeny from the genomes. If you'd like to see how this is done, take a look at the tutorial in the neighbors package.

Is there a way to automatically correct the target and neighbor genomes?
manual inspection is a time-consuming step and I have many species to construct these genomes. Looking forward to your reply. Thank you.

@haubold
Copy link
Contributor

haubold commented Oct 12, 2023

There's a semi-automatic way, which scales well to hundreds of bacterial genomes. It is described in the tutorial of the neighbors package. It consists of the following steps:

  1. Download target and neighbor genomes as listed by neighbors
  2. Construct tree from genomes
  3. Label tree nodes with and inspect tree to pick the target clade
  4. List the leaves in the target clade and automatically create symbolic links for them to the targets directory
  5. Repeat for the neighbors
  6. Run makeFurDb followed by fur

If this sounds promising to you, take a look at the neighbors tutorial for the details.

@wangzhichao1990
Copy link
Author

Thank you for your reply.
I have read the documentation for this package. Actually, I want to automate step 3. At present, I don't have a good idea.

@haubold
Copy link
Contributor

haubold commented Oct 13, 2023

It is a good question, here is a sketch of an answer: Let's say the leaves in the tree are labelled with prefix 'n' for taxonomic neighbors and prefix 't' for taxonomic targets, as explained in the neighbors tutorial. We are looking for the node, v, that maximizes the number of targets in the subtree rooted on v and the number of neighbors elsewhwere.

For example, the tree file o157.txt contains 449 E. coli genomes, with the pathogen O157:H7 as the target. We iterate over the 448 internal nodes, which were labeled with lande, and for each node calculate the node score consisting of the sum of the targets found in that node and the neighbors found outside:

for a in $(seq 448); do
t=$(pickle $a o157.txt | grep '^t' | wc -l)
n=$(pickle -c $a o157.txt | grep '^n' | wc -l)
((s=$t+$n))
echo $a $s
done > score.txt

We sort the scores to find that the desired node is 300, which is also the one I'd pick from visual inspection.

sort -n -k 2 -r score.txt | head

Hope this helps.

@haubold
Copy link
Contributor

haubold commented Oct 18, 2023

Just in case this is still of interest, the Neighbors package now also contains the program fintac for finding the target clade according to the procedure sketched above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants