# Introduction
The focus of this class is on using the Python programming language and specialized packages to analyze and explore biological datasets.
We will use the same tools in class that you will use when you, as a scientist, are working with your own data.

My hope is that this class will be discussion oriented.
I love to lecture, and if you don't stop me I will do so far too much. This would be bad. 
Come to class with questions. Ask them. If you don't understand something that comes up in class, ask. Others are probably having the same question. 


Our "standard workflow", to use bioinformatics jargon, will be:

0. You will learn material outside of class through tutorials and worksheets (like this one). You **must** do this in order to be prepared for class. The class will move quickly, so be prepared. 

1. You will do regular homework. Some of this will be in the form of worksheets like this that you will turn in. Other homework will be be in the form of python programs that you will write and turn in.

    a. If you are a graduate student, you will work on a project that you choose with my approval. This project should use what you are learning in the class on data from your lab. You will present these projects in the last few classes.
    
2. On Thursdays the classroom will be available after class until 17.00. You are welcome to use the classroom anytime it is available, also.

3. Later in the semester, I hope to have more "open labs", where you can work on homework in class. 
 

>**NOTE:** you must be in class and you must be prepared or you will not learn the material. 

## To get this notebook

```
cmd-space (to open Spotlight)
enter "terminal" (to find the terminal command)
click on "terminal" (to start a terminal)
navigate to the appropriate directory (should be *~/CompSkills_F18/Worksheets*)
enter: "jupyter notebook 1_" and press tab-enter (because I'm lazy)
Watch what happens (ask me if you really want to know)
```

## Keyboard and screen tips

* In most cases, tab autocompletes, saving a lot of typing (you will love this)
* In bash (in the terminal) the up/down arrow keys scroll backward/forward through the commands you have entered. This can save a lot of re-typing.
* ctl-a isually goes to the beginning of a line
* ctl-e usually goes to the end of a line
* cmd-arrow jumps back (left arrow) or forward (right arrow) one word
* ctl-C usually makes things stop, especially if your system hangs
* ctl-Q stops whatever you're doing, the hard way
* ctl-W closes a window (as does clicking on the red x)
    
To quit the mac, click on the apple in the upper left corner and select *log out*.

To close a window, click on the red x x in its upper left hand corner.

You can start apps by clicking on their icon.

The Dock is a bar with icons for apps on it. It's usually at the bottom of the screen and pops up if you move the cursor to the bottom.

To find things (like apps), enter cmd-space, then start typing the name of the thing you're looking for. This finds files, apps, and more. Just click on the item to open it.

On the mac, the *Finder* (a stupid two-blue-colored smiley face) shows your files. 

The default web browser on mac is *Safari* (the blue compass icon).

# Infrastructure for class

Homework 0 comprises activities that set you up for this class. Be sure to do it asap.

In particular, you will need some specific directories, and some git repos. See Homework 0.

You should make a copy of this notebook on your own machine, so you can change it. That way, you can try out the "Do It" sections. 
To do that:

```
cd ~/CompSkills_F18/*your sandbox*
cp ~/CompSkills_F18/Class_Resources/Worksheets/1_class_overview.ipynb .
```

Then to open the notebook (see below on jupyter notebooks to see what this does):

```
jupyter notebook 1_class_class_overview.ipynb
```

# BASH, the Bourne Again Shell

GUIs (all those pretty windows and icons) hide a lot of detail that you need to know to do serious bioinforamatics or data processing. You need direct access to the operating system (OS). To get that, you open a "terminal", which connects you directly to the OS via a command line. On most Unix systems, the command to open a terminal is:

>terminal

When the terminal opens, it runs a *shell*. A shell is a wrapper around a computer operating system that lets you interact directly with the computer. The wrapper has commands for the type of things you want to tell the OS to do. We will use *bash*, the Bourne Again Shell.

The shell takes stuff from "Standard Input" (STDIN) and prints text to "Standard Output" (STDOUT).
In the simplest case, the command line is where you enter stuff into STDIN. Press enter to send that to the shell. The shell sends it's response to your terminal. 

## Starting the shell in a terminal

1. open a terminal. You CAN find the *terminal* program and click on it, but it's easier on a mac to:

    a. Enter command-space (this puts you in Spotlight, which searches for stuff) 
    
    b. Enter "terminal" (this finds the terminal program)
    
    c. Press enter (this runs the terminal program)
    
To close the shell, enter *exit*. To close the terminal (which also ends the shell), click on the red x in the upper left hand corner, or choose "quit" from the "file" menu, or type cmd-Q ("cmd" is the key with the strange hash-sign-with-loops to the left of the space bar).
I recommend not exiting the shell without quitting the terminal.
(but what happens if you do? Try it once!)

## Bash commands, part 1
Bash commands control your computer. Bash commands are usually just a few characters, and may be followed by "modifiers" (also known as "flags") that modify the behavior of the command. For example, enter *ls* to get a *l*i*s*t of the files in your current directory. Enter *ls -l* to get a *l*ong *l*i*s*t of your files. (ask my why bash commands are so short! I love to tell stories.)

To get information on any command, use the *man* command. For example, *man ls* will tell you about the *ls* command. More on *man* below.

Here we will present common commands to manipulate the file system (to create, delete, move, copy, and get information about, files). In a windows system, you would do these things in a directory manager, such as Finder (on macs). 

Unix files and directories are arranged hierarchically, like folders with folders and files in them. Each level of the hierarchy is separated by */* (slash).
When you start the terminal, you are by default in your "home directory", also known as *~* (yes, the tilde character is a special file name). 
There is a single "root" directory for the entire computer, called */* (yes, a slash character is a name). 
Also, *.* and *..* are names for the current directory and the directory above the current one.

### Some useful bash commands:

Command | what it does           | examples/Notes
:------ |:---------------------- |:--------------------------------------
ls      | list files in the current directory | *ls b\** lists all files beginning with b
        |                        | *ls -alt* (or *ls -lta*) long list of files sorted by timestamp
        | | *ls ../..* list files in the grandparent directory
pwd     | show current directory | show full path from system "root"
cd dir | change directory to one named dir | *.* is the current directory, *..* is the next one up
 | | *cd* changes to your home directoery, *~*. so does *cd ~*
 | | *cd -* changes to the last directoery you were in
mv from to| moves file or directory "from" to a new one named "to" | *mv old.name new.name* renames a file
cp from to| copies file or directory "from" to a new one named "to" | *cp old.name backup.name* makes a copy of old.name
mkdir dir | creates directory named dir in current directory | *mkdir foo/bar* creates bar in foo
rmdir dir | removes directory dir (if it's empty) | *rmdir -rf dir* also removes all contents of dir (**dangerous**)
rm file | deletes *file* | there is no recovery. there is no trash can. be careful.
touch file | creates *file*, makes it empty  

### Getting help within bash

#### The *man* command

*man* stands for "manual. 

*man x* looks up "x" in "the manual", telling you everything about *x*.

The display is actually a text editor on an un-changable file.
You scroll throiugh the file by pressing space or n (to page forward) or b (to page backward). 
Enter Q to quit. 
You will see this editor again (hint, this is the format from the *less* command).
(ask me why it doesn't just pop open a new window, if you don't see why.)

The output is a complete description of the command you look up.
It has lots of information, most of which you won't need. 
Familiarize yourself with the contents.

#### The *info* command

*info* is an alternative to man. 

*info x* looks up "x" in "the manual", telling you everything about *x*.

The display is actually a text editor on an un-changable file.
This is a bit easier to use than man, because all the possible things you can do are listed at the bottom of the screen. 

The output is a complete description of the command you look up.
It has lots of information, most of which you won't need. 
Familiarize yourself with the contents.

I don't like *info* because it's too new-fangled. Too much like Windows before there were real windows. 
There are probably 21st century ways to get help (*google* anyone?), but I prefer to use tools like man and info that don't rely on having a connection and that are tailored for my specific environment.


### Do it

Poke around in the file system, using cd, pwd, ls, and such. 
What do you find? what happens if you look in someone else's home directory? What sort of things are at the system root?

Look up some of the commands in "the manual". 

# Jupyter notebooks

You are reading a jupyter notebook. 
Notebooks are tools for developing code and presenting text. 

The notebook has "cells" where you enter code or text. 
We will use *python code* in the code cells, and will format text with *markdown* (both discussed below.

To do what is in a cell (running code for "code" cells or formatting text for "markdown" cells), press <shift><enter>

## Creating/editing notebooks
Run a jupyter notebook *server* from the command line:

0. open a terminal
1. move to the directory you want
2. enter ```jupyter notebook```
3. open or create a notebook by clicking on *File*

To enter a new cell, click the "+" button, or enter esc-b (for "**b**elow"

To edit a cell, double click on it.

To execute a cell (format text or run code), press shift-enter.

### do it

1. Create a notebook called "Learning_Jupyter" in *~/Comp_Skills_F18* somewhere. 

If you created a *~/Comp_Skills_F18/sandbox* or similar directory in HW0 (as I recommended), this would be a good location.

2. Add some markdown (next section) test. Be sure to create markedown cells, using "+" to create the cell and the dropdown box at the top to make it a markdown cell. Press <shift><enter> to get the formatting.
    
3. Create a "code" cell. Copy and paste the following into it (delete spaces if you need to, so that all the lines are flush left). Press <shift><enter>. You just ran your first python program.


`
greeting = 'hello'
subject = 'world'
print(greeting, subject)
`

# Markdown
Markdown is a simple way to format text. It is like html or xml (which are used to format webpages, among other things), but is much simpler.

To use markdown:
1. create a cell in your notebook
2. From the dropdown at the top of the page, select "Markdown"
3. Enter your text with markup (*Help* at the top has a summary of markdown format)
4. Format by pressing shift-enter

### Do it
In your *Learning_Jupyter* notebook:

1. enter a text cell or cells (I recommend a different cell for each heading) with the following outline:

```
Introduction
    Practicing markup
    Adding code
```

2. In the *Introduction* add text to remind yourself what you're doing (learning Jupyter)
3. in *Practicing markup* add some text that uses highlighting, bulleted lists, and numbered lists. 
4. After "Adding code" enter a new code cell (select *code* from the dropdown box at the top).
5. In this cell, add the line

>%ls lat

6. Format and save the notebook.
7. Log out from the server.
8. Close your terminal window

What did that code cell do?

# Git

Git is the most commonly used *version control system*. 
With git, you keep track of different versions of files, timestamp changes, roll back changes if necessary, and share versions of your files with others.
Git is designed for team sharing. 

We will use *git* to share course materials and to share and turn in homework.

A *git repository* (aka "repo") is a collection of files that the git program keeps track of, along with a history of all changes
and associated statistics. 
One should have separate repositories for separate projects, such as research projects, classes, notes, manuscripts, or software. 
Many labs have one git repo for the lab, or one for each project in the lab. 

*github* is an online collection of git repos (see https://github.com). 
Repos may exist on dedicated servers (check out https://github.com/ibest/ to see the software developed by IBEST, for example) or even on individual workstations. 

You access git repos with git client software or via the *git* command on the command line. 
Most online tutorials use the command line, since that has the most power and flexibility. 
We will use the github website (https://github.com) and the desktop version of github, *Github Desktop*.

For this course, you will use two git repos, one (a public one) for course materials, and one (private) for your homework and other assignments. See Homework 0. 