-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No output for some species #3
Comments
Thanks for pointing this out. Could you please also post the
accessions of the targets and neighbors you used in at least one of
your examples?
…On Fri, Aug 20, 2021 at 02:52:10AM -0700, Katie wrote:
I have been trying to use FUR to find unique genomic islands in a number of bacterial species, for example Strep Pyogenes. I am downloading the latest complete genomes from the relevant taxa id from NCBI for my target and similar for a close relative for my neighbourhood.
This produces output for some species- e.g. Rickettsia prowazekii, but only from the first step, i.e. using the -u flag. Even when I do get output it is usually very short sequences, and a very small amount of the genome.
According to the outputs of multiple alignments, and other techniques such as ANI analysis, there should be output for these species. For those species for which I do obtain some output, I would expect much more.
Is this an issue with the software or with the way I am running it? I.e. if I reduced the number of target genomes, would that help, and if so how should I select the ones to include? I really liked the paper and the algorithm so had high hopes for this software, but at present am not able to use it due to this issue.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#3
--
Prof. Dr. Bernhard Haubold
Groupleader Bioinformatics
Max-Planck-Institute for Evolutionary Biology
Department of Evolutionary Genetics
Room 254
August-Thienemann-Strasse 2
24306 Ploen
Germany
***@***.***
Tel: +49 4522 763 276
Fax: +49 4522 763 281
http://guanine.evolbio.mpg.de/
|
i can go (hopefully) one better and provide the links i used to download them. these are for strep pyogenes ftpfilepaths_genbank.txt |
and this is the target for Rickettsia prowazekii ftpfilepaths_genbank_target_RP.txt my neighbourhood folder was empty because there were no close relative genomes fitting my criteria, i assume this doesn't affect the output of the first step, but if I can get fur running so that it outputs things from the last step, I'll obviously need a few neighbourhood files. let me know if you need anything else, i'd be really happy to get this running! |
Thanks, that's really helpful. I just noticed that your neighbor,
GCA_000221985.1_ASM22198v1_genomic.fna, is also among the targets. If
you remove it from there, you get 31.4 kb of marker candidates.
…--
Prof. Dr. Bernhard Haubold
Groupleader Bioinformatics
Max-Planck-Institute for Evolutionary Biology
Department of Evolutionary Genetics
Room 254
August-Thienemann-Strasse 2
24306 Ploen
Germany
***@***.***
Tel: +49 4522 763 276
Fax: +49 4522 763 281
http://guanine.evolbio.mpg.de/
|
thank you! i will check if this is for the case for my others.... it shouldn't be because i filtered by species/close relative tax ids. i'll let you know if the problem persists after i've checked for duplication, it did not occur to me this could have happened so i feel a bit silly now. any idea what the issue could be for rickettsia? |
I'm a bit surprised you get nothing for the Rickettsia. In my hands,
there is over half a megabase of "unique" sequence. - Of course,
without a neighbor the result needs to be slightly reinterpreted. In
that case we get the regions that are unique within the target
representative, by default the longest target, which also occur among
all other targets. I think that's pretty interesting, too.
…--
Prof. Dr. Bernhard Haubold
Groupleader Bioinformatics
Max-Planck-Institute for Evolutionary Biology
Department of Evolutionary Genetics
Room 254
August-Thienemann-Strasse 2
24306 Ploen
Germany
***@***.***
Tel: +49 4522 763 276
Fax: +49 4522 763 281
http://guanine.evolbio.mpg.de/
|
Let me do some checking- i need to find out what caused the duplication in my source/target dirs. i automated for every species, manually rebuilding the rickettsia db and running does indeed get half a megabase of sequence. if this is all down to bugs in my automation script then i sincerely apologise for wasting your time!! it will take me a while to check things manually but i hope to get back to you by the end of today (UK time). thank you for checking this for me. |
No worries, I'm glad it's working.
|
I have now revised makeFurDb so that it catches duplicates between
targets and neighbors.
|
brilliant- thank you! i have been going through everything manually, and in some cases i had duplicates between target and neighbourhood and am now getting results. in others i can't see any problems. I have added an example from Clostridium botulinum plus a screenshot showing how i make the db and run fur. as i've done this manually i unfortunately don't have the files with all the ftp links. I get similar for Chlamydia pneumoniae. |
ok, i follow- i am going to use phylonium to validate my other target/neighbourhood designations, it is clearly far more suited to the task at hand than just grepping through taxids in the NCBI genbank taxids. i will leave this open for the time being but hopefully once i have completed the above i will be able to close this and the other issue. i would like to thank you again for your patience and for taking the time to address my problems. i recognise these were due to a lack of knowledge on my behalf (i'm pretty new to taxonomy/phylogeny!!) and not due to shortcomings of the software. |
|
I will, i'm glad its been useful. I am getting on much better by validating my initial species lists using the phylonium command you mentioned above. Do you have a recommendation for species that are very close to their close relatives? For example phylonium gives the attached matrix for the first chromosome of brucella suis vs canis, and the distances between the two species are significantly smaller than for other species I've tested. The visualisation shows that they have been assigned to different branches, so they are separable. But very little genetic material is found to be unique in suis (like < 1 kb), do we just conclude that this is all that distinguishes between them? From alignments between sequences from the two species I would expect more than I get from fur, and can do if I reduce the window size significantly but am uneasy about this. I will be consulting my lab about this as well, but if you have any input from the perspective of fur I would be grateful to hear it. |
That's an nice example, Brucella suis in black, B. canis in red: And as you said, there is little output. B. suis as target gives 1433 bp with default values and B. canis 618 bp, which increases to 707 bp if megablast is used instead of standard blast ( |
Apologies for the delay in my reply- I have been going through with a fine toothed comb and it appears the alignment was a glitch in my viewer as when using a different one it is not present. I went through the entirety of the alignments to the second chromosome (where there are slightly more differences) and am beyond impressed that phylonium can separate canis from suis, because even by eye the differences are difficult to discriminate. I think in this case there is genuinely this little to distinguish between the two species and doing anything to increase it would be artificial rather than reflecting a true, biologically meaningful, difference. With that, I am happy to close this issue unless you would like to say anything else. I would like to thank you again for your help, as a result of your input I will be using fur and related software and recommending it to colleagues and collaborators. |
Thanks for going to this length comparing |
I have been trying to use FUR to find unique genomic islands in a number of bacterial species, for example Strep Pyogenes. I am downloading the latest complete genomes from the relevant taxa id from NCBI for my target and similar for a close relative for my neighbourhood.
This produces output for some species- e.g. Rickettsia prowazekii, but only from the first step, i.e. using the -u flag. Even when I do get output it is usually very short sequences, and a very small amount of the genome.
According to the outputs of multiple alignments, and other techniques such as ANI analysis, there should be output for these species. For those species for which I do obtain some output, I would expect much more.
Is this an issue with the software or with the way I am running it? I.e. if I reduced the number of target genomes, would that help, and if so how should I select the ones to include? I really liked the paper and the algorithm so had high hopes for this software, but at present am not able to use it due to this issue.
The text was updated successfully, but these errors were encountered: