# 4.0 Introduction to shell scripts#

### IMPORTANT: Please make sure you are using the bash kernel to run this notebook ###

Running commands one at a time on the terminal in bash isn't particularly efficient (or reproducible). Shell scripts enable you to place multiple commands in a single executable file, leading to greater ease of programming and research reproducibility. 
 
For example, we create a shell script called myFirstShellScript.sh with the following contents:

In [None]:
#you can ignore this  line -- it checks to see if the file exists, and remove it if it does. So we don't end up writing the same information to the file multiple times. 
if [ -f myFirstShellScript.sh ] ; then rm myFirstShellScript.sh; fi

touch myFirstShellScript.sh 
echo '#!/bin/sh'>> myFirstShellScript.sh
echo '#this line is a comment; it is ignored during execution' >> myFirstShellScript.sh
echo '#you can put any commands that you would normally type into the command line in here.' >>myFirstShellScript.sh
echo '#for example, this shell script just creates a file' >> myFirstShellScript.sh
echo 'touch thisFileCreatedFromShellScript.txt' >> myFirstShellScript.sh

In [None]:
cat myFirstShellScript.sh

The **#!/bin/sh** at the beginning tells the operating system what software to use to interpret the script (in this case, it uses the program located at **/bin/sh**). Don't worry if you don't understand; just make sure your scripts begin with it.

Once you have created the script, make it executable:

In [None]:
./myFirstShellScript.sh

In [None]:
#this command makes the script executable 
chmod a+x myFirstShellScript.sh

Then, run it:

In [None]:
#this command runs the shell script
./myFirstShellScript.sh

In [None]:
#the ls command indicates that "thisFileCreatedFromShellScript.txt" was created 
ls

Shell scripts can also accept arguments (a fancy word for extra commands/options that you pass to the shell script), similar to any Unix command. `$1` `$2` `$3` ... refer to the first, second, third etc. arguments passed into the shell script. Create another shell script called myFirstShellScriptWithArguments.sh with the following contents:

In [None]:
#you can ignore this  line -- it checks to see if the file exists, and remove it if it does. So we don't end up writing the same information to the file multiple times. 
if [ -f myFirstShellScriptWithArguments.sh ] ; then rm myFirstShellScriptWithArguments.sh; fi

touch myFirstShellScriptWithArguments.sh
echo #!/bin/sh >> myFirstShellScriptWithArguments.sh
echo touch "$"1 >> myFirstShellScriptWithArguments.sh
echo mkdir "$"2 >> myFirstShellScriptWithArguments.sh

Once again, make the shell script executable: 

In [None]:
chmod a+x myFirstShellScriptWithArguments.sh

Now run the following:

In [None]:
./myFirstShellScriptWithArguments.sh customFileName.txt customDirectoryName
ls

This was just an example, but hopefully you can see the potential power of using scripts like these to launch complicated bioinformatics processing jobs.

### The .bashrc

Wouldn't it be nice to have everything ready to run when you log into the cluster?
To avoid having to run module load commands every time you log in, you can add these commands to a `.bashrc` file, which is a shell script located in your home directory. The `.bashrc` file contains a set of commands that get executed every time you log into the server. In this way, every time you log in, you will be all set to run all code you wish.

Note: Technically, the `~/.bashrc` file is not what's executed on login; it's `~/.bash_profile`, which in turn calls `~/.bashrc`. If your .bash_profile does not call `.bashrc`, put the line `source ~/.bashrc` in your `.bash_profile`. The difference between the two files is explained here: http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html


Your `.bashrc` is a hidden file (shown by the `.` at the beginning of its name), so it won't show up if we use `ls` to look at our home directory contents unless we add the flag `-a`.

In [None]:
ls  # no .bashrc 

In [None]:
ls -a

Your `~/.bashrc` script is automatically run each time you log in (ssh) to a machine from your terminal on a cluster like Sherlock. Because of this, it can be helpful to add commands to your `.bashrc` that are the kinds of things you want run by default every time you work on the cluster.

As an example, your `.bashrc` is already set up in a way that is ideal for the training camp project:

In [3]:
cat ~/.bashrc

module load bedtools/2.26.0
#shortcuts_defined:
export SUNETID="$(whoami)"
export WORK_DIR="/scratch/${SUNETID}"
export DATA_DIR="${WORK_DIR}/data"
[[ ! -d ${WORK_DIR}/data ]] && mkdir "${WORK_DIR}/data"
export SRC_DIR="${WORK_DIR}/src"
[[ ! -d ${WORK_DIR}/src ]] && mkdir -p "${WORK_DIR}/src"
export METADATA_DIR="/metadata"
export AGGREGATE_DATA_DIR="/data"
export AGGREGATE_ANALYSIS_DIR="/outputs"
export YEAST_DIR="/saccer3"
export TMP="${WORK_DIR}/tmp"
export TEMP=$TMP
export TMPDIR=$TMP
[[ ! -d ${TMP} ]] && mkdir -p "${TMP}"


You can see that the script contains a command to load the bedtools software, which we use often throughout this project, plus export commands for the variables that we commonly used and commands that create directories we needed to work in. On the terminal, you would need to enter all these commands every time you logged in to a cluster. In jupyter, you needed to run these commands at the top of every notebook. Instead, using a `.bashrc` to keep these common commands in one (runnable) place can make your life easier!