Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTB phylogenetics tutorial #3220

Merged
merged 28 commits into from
Mar 16, 2022
Merged

MTB phylogenetics tutorial #3220

merged 28 commits into from
Mar 16, 2022

Conversation

cstritt
Copy link
Contributor

@cstritt cstritt commented Mar 4, 2022

This is a second tutorial for the planned Galaxy workshop on WGS of M. tuberculosis (see request #3211). It covers the interpretation and inference of phylogenetic trees.

@hexylena
Copy link
Member

hexylena commented Mar 7, 2022

@cstritt @pvanheus do you think this fits ok to any of the existing GTN topics? https://training.galaxyproject.org/ we try and avoid creating new topics for just a single tutorial, when possible. Maybe visualisation? Sequence analysis feels very NGS-y, but we're trying to expand it, maybe there?

@hexylena hexylena mentioned this pull request Mar 7, 2022
@pvanheus
Copy link
Collaborator

pvanheus commented Mar 8, 2022

@cstritt @pvanheus do you think this fits ok to any of the existing GTN topics? https://training.galaxyproject.org/ we try and avoid creating new topics for just a single tutorial, when possible. Maybe visualisation? Sequence analysis feels very NGS-y, but we're trying to expand it, maybe there?

So I see two issues here:

  1. the work here forms part of a theme - which is quite an exciting development and not particularly well catered for in the GTN repo. I.e. @cstritt et al probably have a workshop website that pulls together at least 3 Galaxy tutorials with other background into a coherent exploration of the topic (M. tuberculosis sequence analysis / bioinformatics). Does that mean there should be some tags to make this theme easier to follow?
  2. perhaps we need a phylogeny category? You and I have discussed SARS-CoV-2 phylogeny, now this is M. tuberculosis phylogeny - maybe a new category won't be on its own for long? On the other hand, where does the transmission analysis tutorial fit? Is there perhaps a larger category of "relatedness analysis" or "evolution" that is a better fit here?

@cstritt
Copy link
Contributor Author

cstritt commented Mar 8, 2022

@hexylena , @pvanheus , many thanks for the helpful comments! I'll start working on them today.
Regarding the category for the tutorial, I like the idea of an 'evolution' topic, as suggested by @pvanheus (there already is 'ecology'). The current topics don't really fit, I'd be surprised to find phylogenetics there...

@hexylena
Copy link
Member

hexylena commented Mar 8, 2022

perhaps we need a phylogeny category? You and I have discussed SARS-CoV-2 phylogeny, now this is M. tuberculosis phylogeny - maybe a new category won't be on its own for long? On the other hand, where does the transmission analysis tutorial fit? Is there perhaps a larger category of "relatedness analysis" or "evolution" that is a better fit here?

That can make sense to me. The thing we try and avoid is topics with a single tutorial, but with our discussed covid phylogeny, yeah, that makes more sense. Evolution it is.

First round of revisions for the MTB phylogenetics tutorial
@pvanheus
Copy link
Collaborator

pvanheus commented Mar 8, 2022

Just one more thought here - there really is not much of a workflow for this tutorial because it follows on from previous work. Its not a stand-alone. I understand the desire to note make the "transmission" tutorial too long, but perhaps add a workflow that illustrates the process from VCF to phylogeny at least?

@cstritt
Copy link
Contributor Author

cstritt commented Mar 10, 2022

BTW thinking about your workflow again, I realised that you don't address ascertainment bias. Perhaps constant sites can be computed in the previous tutorial (snp_sites has a mode for computing constant sites... its actually aimed at IQ-TREE's -fconst parameter... I'm not sure if RAxML has a direct equivalent?) and copied over to here? (As an example, here's a workflow that is similar to what is done in your set of tutorials but adds that constant site calculation: https://galaxy.sanbi.ac.za/u/pvanheus/w/snippy-tb-sample-iqtree-015)

@pvanheus , This was indeed a weighty omission. I now address it in the alignment part, and added a section at the end about rescaling and dating the tree. I use the rescaled branch lengths = (branch lengths * alignment length) / genome size approach, and ask in the exercise what could be the problem of assuming that sites not present in the SNP alignment are invariant.

@pvanheus
Copy link
Collaborator

On the linting errors:

  1. The link error is waiting for the TB transmission tutorial to be merged, so hopefully that can be merged soon
  2. The tag issue... we said there needs to be a evolution category, right. What is involved in making such a thing @hexylena ?

@shiltemann
Copy link
Member

@pvanheus, to create a new topic: https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-topic/tutorial.html

(and I am realising I forgot to add instructions for faq folder there, but I can help too)

@cstritt
Copy link
Contributor Author

cstritt commented Mar 16, 2022

So the only thing which remains to be done on our side is to create the 'evolution' topic and move both tutorials there, right? As far as I can see this would only involve renaming the existing folder ('phylogenetics') and modify the corresponding metadata.yml. I'm not sure, though, how both tutorials can be moved there, given that they are both in open pull requests

@shiltemann
Copy link
Member

@cstritt yes, @hexylena and I will deal with the renaming and moving this morning. We will merge it as draft tutorials, so that it will be accessible for your course next week, and afterwards we can polish all the last things.

(We have been thinking for a while already to rename metagenomics topic to "microbial analysis" so then it could fit there as well)

@cstritt cstritt requested a review from a team as a code owner March 16, 2022 09:25
@shiltemann
Copy link
Member

Users can install libraries as needed in Rstudio in Galaxy. That said, if this would e.g. take too much time we can look into changing the base image to include the library.

install.packages("ape") crashes with:

/bin/sh: 1: x86_64-conda-linux-gnu-cc: not found make: *** [/opt/miniconda/lib/R/etc/Makeconf:170: BIONJ.o] Error 127 ERROR: compilation failed for package ‘ape’

* removing ‘/opt/miniconda/lib/R/library/ape’

@cstritt You might be able to install in via conda (using the terminal tab in Rstudio) ..I'm testing it now and will add it to the instructions in the tutorial if it works 👍

@shiltemann
Copy link
Member

ok @cstritt, it appears to work if you install via conda 👍 ..it does give a warning that the package was built with R 4.1.2 while the Rstudio runs 4.1.0. It probably won't be a problem, but maybe good to test

I will merge this now

@shiltemann shiltemann merged commit 81d59fe into galaxyproject:main Mar 16, 2022
@cstritt
Copy link
Contributor Author

cstritt commented Mar 16, 2022

Excellent, thanks a lot for the great support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants