# BFX Workshop Week 01

Welcome to the Bioinformatics (BFX) Workshop, a.k.a Applied Genomics for Bioinformatics I. In order to get the most out of this course, you're going to need some things:

- a reasonably modern computer with some tools/programs installed
- basic Unix command line skills

This document describes setup instructions that have been tested on **Windows 11** and we believe that they will also work on up-to-date versions of Windows 10.  If you're using MacOSX or a Linux distro, go to the other document. If you have a very old version of windows, you may need to do some reading or update your OS. 

## Software you'll need to have installed:


### WSL/Ubuntu Linux
The lingua franca of bioinformatics is Unix command line tools, and there's no way around that. If your computer runs windows, there is good news: Microsoft has recently made it quite easy to run a linux operating system inside your windows install, using something they call "Windows Subsystem for Linux" (WSL).  

* [Follow the instructions here to install WSL](https://learn.microsoft.com/en-us/windows/wsl/install)
(stop when you get to "Change the default Linux distribution installed")

* Once you've done that, a "Ubuntu" app will show up in your start menu. Fire it up and you should land at a terminal.

* Start by typing `ls`.  This will list the contents of this directory, and should return nothing for now, since our folder is empty.  Try running `ls -al` to see "hidden" files which are prefixed with a period.

* Next, let's make a folder for course materials:

```mkdir workshop```

* Now run `ls` again to see the directory that you just made

**IMPORTANT:**  From now on, unless otherwise specified, most things that you do for this course will take place on your new Ubuntu Linux install.  For certain things, like browsing to a webpage, you'll use your windows browser, but any command line work, installations, etc, will be Linux installs unless otherwise specified!

### Miniconda
Conda is a package manager that works on Windows, Linux, and macOS. We will use a lightweight version of the package manager called Miniconda to install packages and manage the system environment.

Follow the [linux instructions for installing Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install#linux-2). 

Once you've complete the install process, you'll need to add conda to your PATH variable, so that the programs in that directory can be executed without typing the full path.  This involves editing a file, which may be a bit of a challenge if you're not familiar with Linux.

We're going to use the Linux program `vim`, directly from your terminal. To do this:

- open the file with vim:

```vim ~/.bashrc```

- hit `i` to enter editing mode.

- Use the arrows to scroll to the bottom of the file and add the following text:

```PATH=$PATH:/home/cmiller/miniconda3/bin```

- IMPORTANT: replace `cmiller` above with the username you set for your WSL installation

- hit return to add a newline, then hit ESC to exit editing mode

- type `:wq` to save and quit

- finally, back at your terminal prompt, type `source ~/.bashrc` to import that new info into your working environment. 

Some familiarity with a unix text editor like vim or nano will be a good thing for this course, for editing files within your Linux install. 

Now let's check to make sure conda is installed.  At your terminal, list the conda help by typing:

```conda -h```

An example image of what that look like when running the above command:
![Example Conda Help](https://github.com/genome/bfx-workshop/raw/master/archive/v2020-2021/images/conda_help.png)

(With a lot more lines underneath, providing info on specific parameters and subcommands)


### Jupyter
Jupyter is a web-based interactive computing platform that allows users to create and run "notebooks" that mix code and data. We'll be exploring some details below (and in future workshop sessions), but for now, just get it set up. Other installation methods will work, but for the sake of simplicity, we will install using conda like so:

```
conda install jupyter
```

Check to see that a recent version of Python3 is now used in your base conda environment:

```
python -V
```

As of Fall 2024, the base conda Python version is 3.12.4. See the example image below from 2020 (v3.8.3):

![Example Python Version](https://github.com/genome/bfx-workshop/raw/master/archive/v2020-2021/images/python_version.png)


### Optional step - browser integration

By default, running jupyter notebook in WSL will spit out a whole bunch of text that includes things like this:

```[I 2024-08-21 21:14:45.038 ServerApp] Serving notebooks from local directory: /home/cmiller/workshop/bfx-workshop
[I 2024-08-21 21:14:45.038 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-08-21 21:14:45.038 ServerApp] http://localhost:8888/tree?token=0cca5488d2f9e179afb661cd5b5e69c51f17dbde1e8afec4
[I 2024-08-21 21:14:45.038 ServerApp]     http://127.0.0.1:8888/tree?token=0cca5488d2f9e179afb661cd5b5e69c51f17dbde1e8afec4
[I 2024-08-21 21:14:45.038 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-08-21 21:14:45.283 ServerApp]

    To access the server, open this file in a browser:
        file:///home/cmiller/.local/share/jupyter/runtime/jpserver-18964-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/tree?token=0cca5488d2f9e179afb661cd5b5e69c51f17dbde1e8afec4
        http://127.0.0.1:8888/tree?token=0cca5488d2f9e179afb661cd5b5e69c51f17dbde1e8afec4
[I 2024-08-21 21:14:45.493 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, 
```

And then you'll have to hunt through, find the URL, and paste it into your internet browser.  To make this easier, we can make jupyter pop up your browser automatically by installing a small utility. Run the following commands, saying Y when prompted:
 
```
sudo add-apt-repository ppa:wslutilities/wslu
sudo apt update
sudo apt install wsl
echo "BROWSER=wslview" >>~/.bashrc
```

We'll test this below.

### Docker

Docker is a system that allows you to launch "containers" on your computer that contain alternate operating systems and all the dependencies for specific tools. We'll discuss this in detail during the course. 

Follow [these instructions to install docker on Windows, follow these instructions](https://learn.microsoft.com/en-us/windows/wsl/tutorials/wsl-containers#install-docker-desktop) (AFTER installing WSL/Ubuntu above). Stop when you get to the next section ("Develop in remote containers using VS Code") which we will not need.

Once it's installed, launch the docker app from your Windows start menu (if it isn't already running). Then move into your Ubuntu terminal and confirm that it's working using:

```
docker --version
```

And make sure it can download and run images by running: 

```
docker run hello-world
```


### Java
Java is required to use the Integrated Genomics Viewer (IGV) locally on your workstation or laptop. Use the Windows install instructions for Java located [here](https://www.java.com/en/download/help/download_options.xml).


### R

We'll be using the R Studio framework in most sessions that involve creating and running R code. Please follow the Windows instructions for [installing R and R Studio here](https://posit.co/download/rstudio-desktop/)

### Git

Git is a a version-control program for source code. A WSL install of ubuntu should already contain git.  You can verify that it's installed by running:

```
git --version
```

If you want to become more familiar with Git and version control for code, you can check out the [Linkedin Learning Git and Github module](https://www.linkedin.com/learning-login/share?account=57884865&forceAccount=false&redirect=https%3A%2F%2Fwww.linkedin.com%2Flearning%2Flearning-git-and-github-14213624%3Ftrk%3Dshare_ent_url%26shareId%3DBNNkL8hRQAqPOHmqiNOyZg%253D%253D) which should be available freely on campus.

In the meantime, let's start simple and clone the class git repository from GitHub. Use `cd` to move into the `workshop` directory you created and then clone the [course repository](https://github.com/genome/bfx_workshop).

![Example GitHub Clone/Download Code](https://github.com/genome/bfx-workshop/raw/master/archive/v2020-2021/images/github_clone_repo.png)

Pull down the "Code" button to get the path of the class repository. Then use `git clone PATH` (replacing PATH) to grab your copy.


### Jupyter Notebook

Now that we have Miniconda, Jupyter, and Git installed and functional, we can begin using the Jupyter Notebook as an interactive shell and development environment.

First we should navigate on the filesytem, using `cd`, to the directory where we cloned the course repository.

```cd ~/workshop/bfx-workshop```

A list, or `ls`, of the directory should show files like a `README.md` and directories like `lectures`.

From the terminal, start a Jupyter Notebook. The following command should launch a browser window showing the contents of the repo. 

```jupyter notebook```

From there, you can navigate to lectures/week_01, and click on the bfx_workshop_01_overview_windows.ipynb file to launch *THIS* tutorial in an interactive browser session.  


## Unix command line

You've set up Ubuntu linux on your system now, but in order to complete this course, you'll need to get familiar with the command line: working at a shell and running standard commands for navigating a filesystem, performing tasks on files and directories, etc. This workshop will offer a *brief* introduction to the unix shell in Week 2, but you will not be able to succeed in this course without developing some strong command line skills.

If you are not comfortable at the unix command line right now, that's okay - there's still time! 

**If you're new to this, start by working through the [Terminal Basics tutorial](https://sandbox.bio/tutorials?id=terminal-basics) at sandbox.bio this week**.  Next week, we'll dive further into the command line and present more resources that you can work through to build a solid foundation.

## Other Resources

### Compute and Storage Access

Almost all of the modules in this course will be able to be run on your local computer, using small data sets. If you wanted to, you could also run them on the local high-performance cluster or on the cloud. These resources are not required right now, but worth becoming familiar with, especially as you begin to extend your knowledge to running tools on your own (probably much larger) data sets.

### Google Cloud

* [WUIT Google Cloud](https://it.wustl.edu/services/cloud-computing/google-cloud-platform/)
* [Google Cloud Console](https://console.cloud.google.com/)
* The [Division of Oncology has some guidelines and code](https://github.com/wustl-oncology) that may help with getting up and running 

### WashU Local Compute and Storage

* There is training available through Becker Library in September covering many basics of computing on the local research cluster. Topics include Open On Demand, Command Line, High Performance Computing, and the RIS Scientific Compute Platform.  [Details and Registration for these upcoming workshops are here](https://becker.wustl.edu/services/research-computing/)

* WUIT's Research Infrastructure Services (RIS) supported [Scientific Compute Platform](https://ris.wustl.edu/services/compute/) provides info on getting started with the compute cluster and storage, both linked from their homepage, and with [more details on their documentation page](https://docs.ris.wustl.edu/).

* RIS has done their own trainings, and you can [view their archived videos and resources](https://docs.ris.wustl.edu/doc/compute/compute-workshops.html)

* If you're looking to get started on submitting jobs to the cluster, working through this [LSF and Docker tutorial](https://gist.github.com/chrisamiller/4b17a8dd310374f078da2bf12b3e2a49) might prove useful.

*  In order to connect to the VPN, you'll either need to be connected to WUSM-Secure (on campus), or logged into the VPN. [The VPN section of this page](https://it.wustl.edu/items/connect/) has details. 

### Unix Skills

If you can't wait to get started with the command line (or think you might need extra time) you can jump ahead to the [Week 2 notes](https://github.com/genome/bfx-workshop/blob/master/lectures/week_02/) and get started working through the exercises, or peruse the other resources linked there. 
