Snakespeare is a simple and entertaining text mining workflow designed for first-time Snakemake users and developers.
This tutorial is ideal for beginners, including folks who have never used the terminal before.
After working through this tutorial, you will learn:
- How to access the terminal on your computer (Windows, Mac, or Linux)
- How to clone a repository from GitHub
- How to build and activate a conda environment
- How to run a Snakemake workflow and view the results
In addition, this repository is a simple demonstration of Snakemake for workflow developers.
The moving parts of Snakespeare are identical to the Snakemake pipelines I use every day for bioinformatics work:
Snakefile
– contains rules for all steps of workflowconfig.yaml
– parameters users can customize are listed hereenvironment.yaml
– lists software dependencies to be installed into conda virtual environmentscripts/
– all Python and R scripts live in this directorydata/
– all input and output files live in this directory
This workflow calculates and plots how much characters speak in Shakespeare's tragedies Hamlet and Romeo & Juliet.
-
Hamlet talks the most with over 1428 lines of iambic pentameter.
-
Hamlet's uncle Claudius talks the second most with over 500 lines of iambic pentameter. It must run in the family.
-
Besides the Chorus in Romeo and Juliet, the Ghost of King Hamlet is the most long-winded with an average speech length of 6.3 lines.
-
Friar Lawrence is a close second with an average speech length of 6.2 lines.
-
Romeo talks slightly more than Juliet (however, Juliet's lines are wittier).
To run Snakespeare, you will need two pieces of software: git and conda.
- git is a tool for downloading the code for Snakespeare from GitHub.
- conda is a tool for accessing all software dependencies (including R, Python, and Snakemake).
All software dependencies will be installed into a "virtual environment," so Snakespeare will not conflict with any Python or R software you have set up already.
Click here for instructions for Windows
Run Snakespeare via Anaconda prompt (easiest for beginning users)Head over to the Anaconda website and download a Windows installer for Miniconda3.
Run the installer and follow the instructions to complete the installation. This software bundle includes Miniconda3 as well as Anaconda Prompt, which is a terminal app that you can use to run Snakespeare. Now click the Start menu and search for "Anaconda prompt." This is a modified version of Windows Command Prompt ( In Anaconda prompt, copy and paste the following to install git: conda install -y git That's it! Continue to STEP 2. |
Run Snakespeare via Git Bash (good for beginning users)Head over to the Anaconda website and download a Windows installer for Miniconda3.
Run the installer and follow the instructions to complete your installation of Miniconda3. Head to the git website and download an installer for Windows. Run the installer and follow the instructions to complete the installation. This software bundle includes git as well as Git Bash, which is a terminal app that you can use to run Snakespeare. Important: While installing Git for Windows, be sure to check the box to add "Git Bash here" to the File Explorer context menu. You'll need it for the next step. To enable Conda within Git Bash, you'll need to add the Conda startup script to your From the Start menu, search for "Miniconda3" and click "Open File Location." Within that folder, navigate to Run the following command to to add the Conda startup script to your echo ". '${PWD}'/conda.sh" >> ~/.bashrc After that, close the terminal window. Finally, let's double-check that conda is working in your new Git Bash terminal. From the Start menu, open Git Bash. Type |
Run Snakespeare via Windows Subsystem for Linux (advanced users)If you are already using Windows Subsystem for Linux, follow the instructions below for how to install miniconda and git in your Ubuntu terminal.Head to the git website and follow the installation instructions for Ubuntu. Head to the Anaconda website for instructions to download and run a Miniconda installer for Linux. After installing git and miniconda, close any terminal windows you have open and continue to STEP 2. |
Click here for instructions for Mac
On your Mac, open Terminal. Type
Done! Make sure you close any terminal windows that you have open, then continue to STEP 2. |
Click here for instructions for Linux
Linux desktop usersHead to the git website for instructions to install git with your distribution's package manager. Head to the Anaconda website for instructions to download and run a Miniconda installer. After installing git and miniconda, close any terminal windows you have open and continue to STEP 2. |
Linux server usersIf you would like to run Snakespeare on a work or lab server, check with your supervisor or sysadmin to see if git and conda are installed already. If so, continue to STEP 2. Otherwise, if you need to install software (and have permission to do so), follow the instructions below. To install miniconda from the command line: wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh The installer will ask you some questions to complete installation. Review and accept the license, accept or change home location, and answer yes to placing it in your path. To finish configuring miniconda: source $HOME/.bashrc
To install git: conda install git |
Open a new terminal window and navigate to where you want to download Snakespeare.
If you are not sure, I recommend you copy and paste these commands to make a new directory called GitHub_repos
, then "change directory" into the folder:
mkdir GitHub_repos
cd GitHub_repos
Copy and paste these commands to clone this repository and then "change directory" into the folder.
git clone https://github.com/lisakmalins/Snakespeare.git
cd Snakespeare
When you build the conda environment, Conda obtains all the software listed in environment.yaml
. You only need to do this step once.
conda env create -f environment.yaml
Finally, you will need to activate the environment. The environment is named "snakespeare," and the software will only be accessible while the environment is active.
conda activate snakespeare
Note: for older versions of Anaconda, you may need to use the command
source activate snakespeare
instead.
When you want to deactivate the environment later, you can do so with the command conda deactivate
.
Run the snakemake workflow like this:
snakemake
That's it! The workflow should finish within a few seconds. The output plot showing all dialogue statistics will appear in the folder Snakespeare/data/plots/
.