Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time

Tutorial 1: Running batch analyses on the Campus Cluster

If you are in Champaign, you should be working on this assignment as a group between 3-6pm in 3401 Siebel Center. Although you may work together, each of you should write and commit your own script to Github. If you do not have a github account, please make one now and share your Github user name on slack in the channel #tutorials. Any questions/issues with the tutorial should also be posted to slack.

In order to complete this assignment, you will need to access the Campus Cluster remotely. If you are using a Windows system, you may need to download an application for this purpose. If you are using a Mac or Linux system, then you can remotely access the Campus Cluster; just open the terminal the Terminal application and type

ssh [YourNetId]@golub.campuscluster.illinois.edu

To see your present working directory, type

pwd

The output should be your home directory on the Campus Cluster: /home/[YourNetID].

In the assignment, we will be running a software package called FastME. To see the help message for FastME, type

fastme-2.1.5-linux64-omp -h

The output should include command not found. This means that you need to specify where the FastME software is installed on the Campus Cluster by updating the PATH variable in your bash profile. To edit the bash profile using vim, type

module load vim
vim ~/.bash_profile

Press the "i" key to insert text, and then copy the following lines at the end of your bash profile.

# Modules to load automatically
module load vim
module load git

# Paths to phylogenetic software packages
PATH="/projects/tallis/reu2019/software/fastme-2.1.5/binaries/:$PATH"

To save these updates, press the following keys: ":" and "w" and "q". The "w" key writes (i.e., saves) the file and the "q" key quits vim. At this point, you should have updated your bash profile and exited vim. When you log onto the Campus Cluster, your bash profile is automatically sourced. Because you just edited your bash profile, you need to manually source it; type

source ~/.bash_profile

To see the path to the FastME software (i.e., where the FastMe software is installed on the Campus Cluster), type

which fastme-2.1.5-linux64-omp

The output should match the PATH variable in your bash profile, i.e., /projects/tallis/reu2019/software/fastme-2.1.5/binaries/fastme-2.1.5-linux64-omp. I have installed several phylogenetic software packages for the REU program in /projects/tallis/reu2019/software/. If you need to use software that is not currently installed in this directory, please ask me to install it for you; please do not install any software on the Campus Cluster! Now make a second attempt to see the help message for FastME; type

fastme-2.1.5-linux64-omp -h

The first line of the output should be ``. You will be completing this assignment in the TALLIS project directory. To change directories, type

cd /projects/tallis

To make your own directory, type

mkdir [YourNetID]
cd [YourNetID]

To clone the tutorial directory, type

git clone https://github.com/[YourGithubUserName]/reu2019-tutorials.git

To see the files/directories that are in your tutorial directory, type

ls

The output should include the directory data, which contains 100-sequence datasets, specifically, 5 replicate datasets, labeled R0, R1, R2, R3, and R4 for each of the three model conditions, labeled 100M1, 100M2, and 100M3. To see some of the data files, type

ls data/100M1/R0

The output should include the true mulitple sequence alignment, labeled rose.aln.true.fasta, and the true tree topology, labeled rose.tt.

For this assignment, you will estimate trees by running FastME given the true multiple sequence alignments as input. Specifically, you should run FastME using two different methods for estimating distances between sequences, e.g.,

  • p-distances
  • K2P-corrected distances
  • log-det distances

and two different mehtods for estimating trees from distance matrix, e.g.,

  • Neighbor-Joining (NJ)
  • BioNJ
  • taxon addition by optimizing the Balanced Minimum Evolution (BME) criterion

This means that you will be running 4 different analyses on each of the 15 datasets (3 model conditions, each with 5 replicates).

For this assignment, you will be writing all files in your own directory. To create and enter your directory, type

cd 1-campus-cluster
mkdir [YourNetID]
cd [YourNetID]

and then create a new file

vim a_run_fastme.pbs

At the top of this file, copy the following text

#!/bin/bash
#PBS -N "tutorial-1-campus-cluster"
#PBS -W group_list=tallis
#PBS -q secondary
#PBS -l nodes=1:ppn=12
#PBS -l walltime=01:00:00
#PBS -j oe
#PBS -M [YourNetID]@illinois.edu
#PBS -m be

cd $PBS_O_WORKDIR

Note that this will request that your job be run on one node with at least 12 processors for a maximum wallclock time of 1 hour.

Below these lines, write a bash script to run FastME (in four different ways) on the 15 datasets. If you are not familar with bash scripting, then you may want to look at this tutorial. Remember that you should only be writing files, including the output of FastME, in YOUR directory (i.e., /projects/tallis/reu2019-tutorials/1-campus-cluster/[YourNetID])! When you finish writing your script, submit it as a job to the Campus Cluster queue; type

qsub a_run_fastme.pbs

To see that your job has been submitted, type

qstat -u [YourNetID]

To add your script to the repository, type

git add a_run_fastme.pbs
git commit -m "Add a message here"
git push

You will be asked to enter your Github user name and password.

When you are finished, go to the next page.