| layout | title | questions | objectives | time_estimation | key_points | contributors | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tutorial_hands_on |
Creating a new tutorial |
|
|
15m |
|
|
Introduction
{:.no_toc}
Galaxy is a great solution to train bioinformatics concepts:
- numerous bioinformatics tools are available (almost 6,000 in the ToolShed)
- it can be used by people without any computer science skills
- it trains to use technology, outlining available resources and efforts that have made them accessible to researchers
- it is scalable
In 2016, the Galaxy Training Network decide to set up a new infrastructure for easily delivering Galaxy related training material. The idea was to develop something open, online, based on a community effort, and on top of the Galaxy platform.
We took inspiration from Software Carpentry and collected everything on a GitHub repository: https://github.com/galaxyproject/training-material. We decided on a structure focusing on tutorials with hands-on activities; fitting both for online self-training but also for workshops. Each tutorial follows the same structure and comes with a virtualised instance allowing you to run the training anywhere you have resources available.
Here you will learn how to create a new tutorial by developing a small tutorial that explains how to use BLAST.
Agenda
In this tutorial, we will cover:
- TOC {:toc}
{: .agenda}
{% icon comment %} Comment
This tutorial explains the different steps to create a tutorial for the Galaxy Training Material. It may require some knowledge that you may not have or do not have the time to learn. If this is the case, you can create a skeleton of a tutorial with whatever existing materials you have, using your prefered text editor, and then share it with us by opening [issue on GitHub]({{ site.github_repository }}/issues/new), writing us on [Gitter]({{ site.gitter_url }}), or sending us an [email](mailto:{{ site.email }}). {: .comment}
Define the topic
The first question we need to answer is in which topic to place our new tutorial. This can be tricky. When we structured the repository, we decided to use the categories that are used in the ToolShed as our initial list of topics. Since every tool uploaded to the ToolShed must be in at least one category, you can look at the main tools in your tutorial and see which categories they are placed in within the ToolShed. This can provide a guide for where you might put your new tutorial. For example, this tutorial will rely on the NCBI Blast+ tool:
{% icon hands_on %} Hands-on: Defining the topic for the tutorial
- Search for NCBI Blast+ on the ToolShed
- Check in which category it has been placed
{% icon solution %} Solution
There are a couple steps to reaching the answer:
- Search for
ncbi blast+- Press the Enter key to search
- Click on the result named
ncbi_blast_plus- At the bottom of this page there is a box labelled "Categories"
It is placed in two categories, "Next Gen Mappers" and "Sequence Analysis" {: .solution}
{: .hands_on}
{% icon comment %} Creating a new topic
Want to create a new topic? [Check out our tutorial to create a new topic]({% link topics/contributing/tutorials/create-new-topic/tutorial.md %}) {: .comment}
Keep track of the changes
The material is stored in a [GitHub repository]({{ site.github_repository }}), a code hosting platform for version control and collaboration. So to develop training material, we are following the GitHub flow, which is based on fork, branches, and pull requests.
This can be done online via the GitHub interface or locally on your computer via command-line.
{% icon comment %} Learning how to contribute
Want to learn how to contribute? Check our tutorials:
- [Contributing with GitHub via its interface]({% link topics/contributing/tutorials/github-interface-contribution/tutorial.md %})
- [Contributing with GitHub via command-line]({% link topics/contributing/tutorials/github-command-line-contribution/tutorial.md %}) {: .comment}
Create the directory for the tutorial
Each training material is related to a topic. All training materials (slides, tutorials, ...) related to a topic are found in a dedicated directory (e.g. transcriptomics directory contains the material related to exome sequencing analysis). Each topic have the following structure:
├── README.md
├── metadata.yaml
├── images
├── docker
│ ├── Dockerfile
├── slides
│ ├── index.html
├── tutorials
│ ├── tutorial1
│ │ ├── tutorial.md
│ │ ├── slides.html
│ │ ├── data-library.yaml
│ │ ├── workflows
│ │ │ ├── workflow.ga
│ │ ├── tours
│ │ │ ├── tour.yaml
Once the topic has been chosen and you set up your contribution environment, you can create the tutorial. An ideal tutorial in the Galaxy Training Network contains:
- a tutorial file
tutorial.mdwritten in Markdown with hands-on - an optional slides file
slides.mdin Markdown with slides to support the tutorial - a directory
tourswith Galaxy Interactive Tours to reproduce the tutorial - a directory
workflowswith workflows extracted from the tutorial - a YAML file
data-library.yamlwith the links to the input data needed for the tutorial
The most important file is the tutorial.md where the content of the tutorial is. The other files are there to support the tutorial and make it robust and usable across many environments.
{% icon hands_on %} Hands-on: Create all the required files and folders structures automatically
Run (by adapting the information between the quotes)
$ planemo training_init \ --topic_name "my-topic" \ --tutorial_name "my-new-tutorial" \ --tutorial_title "Title of the tutorial" \ --hands_onCheck that a new directory (with your tutorial name) has been generated in the topic folder
Make sure that Jekyll is running
{% icon comment %} Jekyll
Want to learn how to start Jekyll? [Check out our tutorial to serve the website locally]({% link topics/contributing/tutorials/running-jekyll/tutorial.md %}) {: .comment}
Check if the tutorial has been correctly added at http://localhost:4000/training-material/ {: .hands_on}
A toy dataset
Our tutorials try to follow the "learn by doing" approach; they combine both theoretical and practical sections. The practical sections (or hands-on) are supposed to be done on Galaxy.
The first task is to select some data to use for the Hands-on sections. The selected data must be informative enough to illustrate the meaning of using a tool or a given technique, but not too big to require long waiting times for processing during a workshop. Upload and download of files into and out of Galaxy is usually quick, but the time taken for a tool to run can be long. Tool run times of no more than 10-15 mins are recommended. Typically, the selected data should be the informative subset of a full real-life dataset.
Below we describe two examples of how toy datasets were generated for tutorials:
-
Example 1: creating a toy dataset from scratch
- Take one 16S sequence (for example found in the test case of a Galaxy tool):
- Generate a reference database
- Blast it on the NR database on NCBI Blast
- Extracting one similar sequence found with Blast
- Search and extract 2 other sequences of the same species using the NCBI Nucleotide database
-
Example 2: creating a toy dataset from an existing larger one
- When the experiment takes a FASTQ as input and a few reads are sufficient:
- Use seqtk_sample {% icon tool %} to extract randomly reads from your input fastq.
- However, when it requires a lot of reads to be meaningful, you can use the following strategy (used for the ATAC-seq tutorial using this workflow):
- Run the workflow until the mapping step on the full dataset (or big enough to have good results).
- Select IDs of reads which map on the smallest chromosome (for example chr22 for human data).
- In order to keep in the toy dataset enough diversity, you can also take randomly 1% of the reads IDs.
- Concatenate the two lists and remove the duplicated IDs.
- Use seqtk_subseq {% icon tool %} to sample your original FASTQ with the list of IDs.
- When the experiment takes a FASTQ as input and a few reads are sufficient:
We would then develop the tutorial and test it on this toy dataset. Once we were ready to share it, we would upload the datasets on Zenodo to store them on long-term and obtain a dedicated DOI in the Galaxy training network community.
{% icon hands_on %} Hands-on: Upload the dataset to Zenodo
Go to Zenodo
Log in using your GitHub credentials
You may need to authorize Zenodo to access your GitHub account (only to read your information)
Click on Upload (top panel)
Start a new upload
Upload the files corresponding to your datasets
{% icon comment %} No possible changes in the files after publication
File addition, removal or modification are not allowed after you have published your upload. So be careful when you start your upload that all your needed files are ready.
The metadata can be changed after publication. {: .comment}
Search for and Select Galaxy training network in Communities
Select Dataset in Upload type
Use the title of your tutorial and mention also Galaxy Training Material
Add all the persons who contributed to the tutorial as authors
Add a short description of the tutorial and a link to the training material website
Keep Open Access as Access right and Creative Commons Attribution 4.0 as License
Fill out any remaining information
Click on Publish
Copy the DOI link in the new page
Paste the link in
zenodo_linkin the tutorial header {: .hands_on}
Write the tutorial
Now that you have the structure in place, you can then fill the tutorial per se.
{% icon hands_on %} Hands-on: Write the tutorial
- Open the
tutorial.mdfile with your favorite text editor- Fill out the tutorial by following the [dedicated tutorial]({% link topics/contributing/tutorials/create-new-tutorial-content/tutorial.md %})
- (Optional) Build the website locally and check that the tutorial is there by following the [Jekyll tutorial]({% link topics/contributing/tutorials/running-jekyll/tutorial.md %}) {: .hands_on}
Add some technical support (recommended)
To able to run the tutorial, we need a Galaxy instance where the needed tools and the data are available. We need then to describe the required technical infrastructure. Tools are installed based on the workflows in the workflows directory.
This description will be used to automatically set up a Docker Galaxy flavour, to set up an existing Galaxy instance and also to test if a public Galaxy instance is able to run the tool.
The technical support are different files:
- workflow file(s) in the
workflowsdirectory - the
data-library.yamlfile with the links to the input data needed for the tutorial - interactive tour file in the directory
toursdirectory
{% icon hands_on %} Hands-on: Add technical support for the tutorial
- Add some technical support for the tutorial following the [tutorial]({% link topics/contributing/tutorials/create-new-tutorial-technical/tutorial.md %})
- Add the workflow
- (Recommended) Generate the
data-library.yaml- (Optional) Create an interactive tour {: .hands_on}
Add slides (optional)
Sometimes, you may want to have slides to support a tutorial and introduce it during a workshop. Sometimes, a set of slides is better than a tutorial to cover a specific topic.
{% icon hands_on %} Hands-on: Add slides
- Create a slide deck in
slides.htmlfollowing the [Slide tutorial]({% link topics/contributing/tutorials/create-new-tutorial-slides/slides.html %}) {: .hands_on}
Conclusion
{:.no_toc}
To develop a new tutorial:
- Determine the topic
- Create the directory for the tutorial
- Add some metadata
- Find a good toy dataset and upload it on Zenodo
- Write the tutorial
- Add some technical support (recommended)
- Add slides (optional)
For the next times, you can make it quicker.
{% icon hands_on %} Hands-on: Generation of a tutorial
Determine the topic
Create your workflow on a running Galaxy instance
Add the topic name as Tag and the tutorial title as Annotation/Notes to the workflow using the workflow editor.
Create a Zenodo record with the input data
Generate the skeleton of your tutorial
option 1: from a workflow located on a Galaxy
$ planemo training_init \ --topic_name "my-topic" \ --tutorial_name "my-new-tutorial" \ --tutorial_title "Title of the tutorial" \ --galaxy_url "URL to Galaxy instance in which you created the workflow" \ --galaxy_api_key "Your API key on the Galaxy instance" \ --workflow_id "ID of the workflow on the Galaxy instance" \ --zenodo_link "URL to the Zenodo record"option 2: from a local workflow file (
.ga) (use only if your workflow is composed of tools from the main ToolShed)$ planemo training_init \ --topic_name "my-topic" \ --tutorial_name "my-new-tutorial" \ --tutorial_title "Title of the tutorial" \ --workflow "path/to/workflow" \ --zenodo_link "URL to the Zenodo record"You can use the example workflow file located in
topics/contributing/tutorials/create-new-tutorial/workflows/example-workflow.gaif you do not have a workflow of your own. This is the workflow belonging to the Galaxy 101 introduction tutorial.Fill the remaining metadata in the
tutorial.mdFill the content of the
tutorial.mdCheck it using Jekyll {: .hands_on}