# BFX Workshop Prerequisite Review

Welcome back to the Fall 2020 and Spring 2021 Bioinformatics (BFX) Workshop. 


## Unix command line familiarity

Whether on a macOS or Linux operating system, the Unix family of operating systems all have a similar look and feel. There are standard commands for navigating a filesystem and utilities for performing common tasks on files and directories. This workshop does not specifically cover Unix commands, but a few good resources:

LinkedIn Learning is Free to all WUSTL faculty, students and staff. Although several years old and designed for Max OS X, not much has changed with Unix. [This](https://www.linkedin.com/learning/unix-for-mac-os-x-users) video tutorial should be easy to follow along on a macOS.

For traditional users, the classic look and feel of [this](http://www.ee.surrey.ac.uk/Teaching/Unix/unixintro.html) tutorial will provide an introduction to UNIX commands, utilities, and navigation tools.

Numerous other Linux/UNIX tutorials exist, feel free to post your favorite in the #bfx_workshop slack channel.


## Software you'll need to have installed:

NOTE: There are numerous ways to install Python, R, Notebooks, etc. In fact, you will see various ways of installing software and configuring your environment throughout the workshop. Each presenter and/or software toolkit may require or suggest a different approach. The following installs are a minimum list of steps to both load, execute, and edit the Jupyter Notebooks used for sharing material and instructions used throughout this workshop.

### Miniconda
Conda is a package manager that works on Windows, Linux, and macOS. We will use a lightweight version of the package manager called Miniconda to install packages and manage the system environment. This tool should help minimize dependencies and differences in system-level libraries or operatings system across the participants in the BFX Workshop.

Available installers for Miniconda can be found [here](https://docs.conda.io/en/latest/miniconda.html#miniconda) through the official Conda project documentation. 

For macOS, assumed to be the most common OS used in this course, the suggested installer is circled in red on the image of the avaiable installers:
![macOS Miniconda Installers](https://raw.githubusercontent.com/genome/bfx_workshop/master/images/miniconda_macos_installers.png)

Once you've complete the install process, open a fresh terminal and check that conda is now installed by listing the Help menu:

```
conda -h
```

An example image of what a terminal might look like when running the above command:
![Example Conda Help](https://raw.githubusercontent.com/genome/bfx_workshop/master/images/conda_help.png)

### Jupyter
Now that conda is installed, installing Jupyter is straightforward. Other installation methods will work, but for the sake of simplicity, we will install using conda like so:

```
conda install jupyter
```

Check to see that a recent version of Python3 is now used in your base conda environment:
```
python -V
```

As of September 9th, 2020, the base conda python version is 3.8.3, see the example image below:
![Example Python Version](https://raw.githubusercontent.com/genome/bfx_workshop/master/images/python_version.png)


### Docker
We will not be walking through all of the steps involved with installing Docker. For complete and updated instructions for installing Docker for Mac, please see this [link](https://docs.docker.com/docker-for-mac/install/).

```
docker --version
```

Now let's walk through the most basic Docker example:

```
docker run hello-world
```

What just happened?

### Java
Java is required to use the Integrated Genomics View (IGV) locally on your workstation or laptop. Please see the operating system install instructions for Java located [here](https://www.java.com/en/download/help/download_options.xml).

### R

We will use conda to install the essential R libraries to at minimum create an R Jupyter Notebook.

```
conda install r-essentials
```

### Git
We will not be installing git as part of this tutorial. 

However with macOS, you can install git as part of the XCode Command Line Tools which are Apple distributed developer tools for compiling and developing software on macOS. Only the Command Line Tools are needed to use `git` commands.

```
git --version
```

#### Git Exercise
Let's clone the class git repository from GitHub, see [link](https://github.com/genome/bfx_workshop).

![Example GitHub Clone/Download Code](https://raw.githubusercontent.com/genome/bfx_workshop/master/images/github_clone_repo.png)

NOTE: You have the option of using either HTTPS or SSH to clone the class git repo. Below is an example of using SSH.

![Example Git Clone Commmand](https://raw.githubusercontent.com/genome/bfx_workshop/master/images/github_clone_cmd.png)


### Jupyter Notebook
Now that we have Miniconda, Jupyter, and Git installed and functional, we can begin using the Jupyter Notebook as an interactive shell and development environment.

First we should navigate on the filesytem, using `cd`, to the directory where we cloned the course repository.

A list, or `ls`, of the directory should show a `README.md` file and an `images` directory and perhaps several `*.ipynb` files.

NOTE: What does the `*` character represent above?

From the terminal command line in the git repo directory, start a Jupyter Notebook. The following command should launch a browser window showing the contents of the repo. From there, you can launch *THIS* tutorial in an interactive browser session.

```
juptyer notebook
```

NOTE: at this point, you can switch from the static GitHub

#### iPython and Shell Commands

Many common shell commands can be executed directly through a Notebook using iPython
For more on iPython, see this [link](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html).


In [7]:
cd ~

/Users/jwalker


In [8]:
ls

[34m94334a6a-b3c4-4a2b-ae31-5c4bdaef2021[m[m/
[34mApplications[m[m/
[34mBox Sync[m[m/
[34mCreative Cloud Files[m[m/
[34mDesktop[m[m/
[34mDocuments[m[m/
[34mDownloads[m[m/
[34mLibrary[m[m/
MFN2_Part1.png
MFN2_Part2.png
MFN2_Part3.png
[34mMovies[m[m/
[34mMusic[m[m/
NCBI_SARS-CoV-2.fa
NCBI_SARS-CoV-2.fa.fai
[34mOneDrive - Washington University in St. Louis[m[m/
[34mPictures[m[m/
[34mPublic[m[m/
[34mbashrc.d[m[m/
[34mbd0e9f28-8708-4288-9beb-79b59a91285b[m[m/
[34mbfx-workshop[m[m/
[34md293fe48-db49-41f0-8d19-1a5caa12721d[m[m/
[34mf2e01a48-b2a0-4af0-8aba-8ecdc8c1293e[m[m/
[34mgit[m[m/
[34mgoogle-cloud-sdk[m[m/
[34migv[m[m/
[34mminiconda[m[m/
miniconda.sh
[34mopt[m[m/


In [3]:
mkdir -p ~/bfx-workshop

In [4]:
cd ~/bfx-workshop

/Users/jwalker/bfx-workshop


In [5]:
echo "Hello World"

SyntaxError: invalid syntax (<ipython-input-5-1c643a414f93>, line 1)

In [6]:
!echo "Hello World"

Hello World


## Compute and Storage Access

The remainder of this tutorial includes resources to ensure all participants have access to a High Performance Compute (HPC) cluster.

Prerequisites:

- Compute1/Storage1 access:
 - https://docs.ris.wustl.edu/
- Basic understanding of the command line and how to submit jobs to the cluster
 - https://confluence.ris.wustl.edu/display/ITKB/Workshops+and+Training


### Vitual Private Network (VPN)
This course will at times require access to the compute1/storage1 High Perfomance Computing & Storage solutions managed by the Research Infrastructure Services (RIS) team through WUSTL IT. In order to log into this compute environment, first one needs access to the WUSM VPN. For more information, please see the VPN section of this [link](https://it.wustl.edu/items/connect/).

### Scientific Compute/Storage Platform

This workshop will cover workflows and pipelines that use the compute1/storage1 environment. This compute and storage environment is not required for all workshops. However, there is a complete RIS seminar series from April 2020 available [here](https://www.youtube.com/playlist?list=PLc5dxOEco26RhSbhBaRLeZoUFOTn7GM-Y) if you need more instruction than listed below.

#### Getting Started
Please see the "Getting Started" section of this [link](https://ris.wustl.edu/services/compute/) if you do not have access to the compute1 platform.

#### Getting Connected
If you have access, but are unsure how to connect to compute1, please see this [link](https://confluence.ris.wustl.edu/display/ITKB/Compute+Quick+Start#ComputeQuickStart-1.GettingConnected). 

NOTE: The above link is only accessible from within WUSM's network.


###  LSF and Docker information

Working through this [LSF and Docker tutorial](https://gist.github.com/chrisamiller/4b17a8dd310374f078da2bf12b3e2a49) might prove useful in conjunction with this week's homework.

## Homework Assignments

Please attempt the following exercises on your own and post questions to the mgibio #bfx_workshop Slack channel.

1. Ensure you can login to compute1 and have access to a storage allocation.
2. Use the RIS Knowledge Base and Tutorials to launch a Docker image on a compute1 node.
3. Drop into the #bfx_workshop slack channel and introduce yourself!
