Skip to content

Latest commit

 

History

History
254 lines (185 loc) · 11.8 KB

README.md

File metadata and controls

254 lines (185 loc) · 11.8 KB

Snakespeare



Snakespeare logo


Snakespeare is a simple and entertaining text mining workflow designed for first-time Snakemake users and developers.

Snakespeare learning goals

This tutorial is ideal for beginners, including folks who have never used the terminal before.

After working through this tutorial, you will learn:

  • How to access the terminal on your computer (Windows, Mac, or Linux)
  • How to clone a repository from GitHub
  • How to build and activate a conda environment
  • How to run a Snakemake workflow and view the results

Diving deeper

In addition, this repository is a simple demonstration of Snakemake for workflow developers.

The moving parts of Snakespeare are identical to the Snakemake pipelines I use every day for bioinformatics work:

  • Snakefile – contains rules for all steps of workflow
  • config.yaml – parameters users can customize are listed here
  • environment.yaml – lists software dependencies to be installed into conda virtual environment
  • scripts/ – all Python and R scripts live in this directory
  • data/ – all input and output files live in this directory

Snakespeare results

This workflow calculates and plots how much characters speak in Shakespeare's tragedies Hamlet and Romeo & Juliet.

Example output from Snakespeare workflow.

Interesting statistics

  • Hamlet talks the most with over 1428 lines of iambic pentameter.

  • Hamlet's uncle Claudius talks the second most with over 500 lines of iambic pentameter. It must run in the family.

  • Besides the Chorus in Romeo and Juliet, the Ghost of King Hamlet is the most long-winded with an average speech length of 6.3 lines.

  • Friar Lawrence is a close second with an average speech length of 6.2 lines.

  • Romeo talks slightly more than Juliet (however, Juliet's lines are wittier).

Usage

STEP 1: Install miniconda and git

To run Snakespeare, you will need two pieces of software: git and conda.

  • git is a tool for downloading the code for Snakespeare from GitHub.
  • conda is a tool for accessing all software dependencies (including R, Python, and Snakemake).

All software dependencies will be installed into a "virtual environment," so Snakespeare will not conflict with any Python or R software you have set up already.

Click here for instructions for Windows
Run Snakespeare via Anaconda prompt (easiest for beginning users)

Installing Miniconda3 + Anaconda Prompt for Windows

Head over to the Anaconda website and download a Windows installer for Miniconda3.

If you are not sure which to choose, pick the highest version of Python.

You can check whether your system is 64-bit or 32-bit under Settings > System > About > Device specifications > System type.

Run the installer and follow the instructions to complete the installation. This software bundle includes Miniconda3 as well as Anaconda Prompt, which is a terminal app that you can use to run Snakespeare.

Open Anaconda Prompt

Now click the Start menu and search for "Anaconda prompt." This is a modified version of Windows Command Prompt (cmd.exe) that is pre-loaded with the conda executable.

Installing Git in Anaconda Prompt

In Anaconda prompt, copy and paste the following to install git:

conda install -y git

That's it! Continue to STEP 2.

Run Snakespeare via Git Bash (good for beginning users)

Installing Miniconda3 for Windows

Head over to the Anaconda website and download a Windows installer for Miniconda3.

If you are not sure which to choose, pick the highest version of Python.

You can check whether your system is 64-bit or 32-bit under Settings > System > About > Device specifications > System type.

Run the installer and follow the instructions to complete your installation of Miniconda3.

Installing Git + Git Bash for Windows

Head to the git website and download an installer for Windows. Run the installer and follow the instructions to complete the installation. This software bundle includes git as well as Git Bash, which is a terminal app that you can use to run Snakespeare.

Important: While installing Git for Windows, be sure to check the box to add "Git Bash here" to the File Explorer context menu. You'll need it for the next step.

Enabling Conda in Git Bash

To enable Conda within Git Bash, you'll need to add the Conda startup script to your ~/.bashrc file, which executes every time you open Git Bash.

From the Start menu, search for "Miniconda3" and click "Open File Location." Within that folder, navigate to etc and then profile.d. You should see a file called conda.sh in this folder. Right-click inside the window and select "Git Bash here" to open a terminal window in this folder.

Run the following command to to add the Conda startup script to your ~/.bashrc:

echo ". '${PWD}'/conda.sh" >> ~/.bashrc

After that, close the terminal window.

Finally, let's double-check that conda is working in your new Git Bash terminal. From the Start menu, open Git Bash. Type conda and press Enter. If a bunch of text appears (these are the usage instructions for conda), congratulations, you're all set up! Continue to STEP 2.

Run Snakespeare via Windows Subsystem for Linux (advanced users) If you are already using Windows Subsystem for Linux, follow the instructions below for how to install miniconda and git in your Ubuntu terminal.

Installing Git in WSL

Head to the git website and follow the installation instructions for Ubuntu.

Installing Miniconda in WSL

Head to the Anaconda website for instructions to download and run a Miniconda installer for Linux.

After installing git and miniconda, close any terminal windows you have open and continue to STEP 2.

Click here for instructions for Mac

Installing Git for Mac

On your Mac, open Terminal. Type git and press Enter.

  • If a bunch of text appears (these are the usage instructions for git), congratulations, you already have git installed! Skip to Installing Miniconda for Mac.
  • If you see git: command not found, then you will need to get git for Mac. The easiest method is to install Xcode, which is a suite of developer tools provided by Apple.
  • After installing Xcode, open a new terminal window and try typing git again. You should see the usage instructions now.

If you still see git: command not found, please let me know so I can help.

Installing Miniconda for Mac

  • To get Miniconda for Mac, download an installer from the Anaconda website.
  • If you are not sure which to choose, download the Python 3.9 Miniconda3 MacOSX 64-bit pkg.
  • Run the installer that just downloaded, and follow the instructions to complete your installation of Miniconda.

Done! Make sure you close any terminal windows that you have open, then continue to STEP 2.

Click here for instructions for Linux
Linux desktop users

Installing Git for Linux

Head to the git website for instructions to install git with your distribution's package manager.

Installing Miniconda for Linux

Head to the Anaconda website for instructions to download and run a Miniconda installer.

After installing git and miniconda, close any terminal windows you have open and continue to STEP 2.

Linux server users

If you would like to run Snakespeare on a work or lab server, check with your supervisor or sysadmin to see if git and conda are installed already. If so, continue to STEP 2.

Otherwise, if you need to install software (and have permission to do so), follow the instructions below.

Installing Miniconda on a Linux Server

To install miniconda from the command line:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

The installer will ask you some questions to complete installation. Review and accept the license, accept or change home location, and answer yes to placing it in your path.

To finish configuring miniconda:

source $HOME/.bashrc

Note: If your home folder is not writable on your server, conda will crash. If you experience this issue, run these commands to tell conda to store the environment in the current folder.

conda config --add envs_dirs ./.conda/envs
conda config --add pkgs_dirs ./.conda/pkgs

Installing Git on a Linux Server

To install git:

conda install git

STEP 2: Clone the repository

Open a new terminal window and navigate to where you want to download Snakespeare.

If you are not sure, I recommend you copy and paste these commands to make a new directory called GitHub_repos, then "change directory" into the folder:

mkdir GitHub_repos
cd GitHub_repos

Copy and paste these commands to clone this repository and then "change directory" into the folder.

git clone https://github.com/lisakmalins/Snakespeare.git
cd Snakespeare

STEP 3: Build and activate the conda environment

When you build the conda environment, Conda obtains all the software listed in environment.yaml. You only need to do this step once.

conda env create -f environment.yaml

Finally, you will need to activate the environment. The environment is named "snakespeare," and the software will only be accessible while the environment is active.

conda activate snakespeare

Note: for older versions of Anaconda, you may need to use the command source activate snakespeare instead.

When you want to deactivate the environment later, you can do so with the command conda deactivate.

STEP 4: Run Snakespeare

Run the snakemake workflow like this:

snakemake

That's it! The workflow should finish within a few seconds. The output plot showing all dialogue statistics will appear in the folder Snakespeare/data/plots/.