# How to work in a Jupyter notebook with the Bash language
The webpage you are looking at, is called a jupyter notebook. It is a webpage on which you can write text (like this text) and also code. The code can ben executed an the output you see again within the same notebook! This may sound trivial, but it's really "cool" to put it in non-scientific terms. Code and text are entered in individual cells, a code cell or a text cell. This cell you are reading now is obviously a text cell. Now let's look at a code cell and execute it. There are two ways of executing a cell. First you select a cell, either with your mouse pointer, or you move there with the up and down arrow keys on your keyboard. Then you execute it by hitting the 'run' button in the toolbar above, or you hit CTRL+RETURN.

## bash basics
execute the code cell below:

In [None]:
%%bash
echo "hello world"

So what happened in the above code cell. First I declared that I want to write code in a language called `bash` by typing:
> %%bash

In the bash language, the first word you type is always the command. So in this case that was:
> echo

This command 'echoes' whatever you give it in the terminal. After you 'call' the command, you give it an argument in this case that is: 
> "hello world"

This basic structure of 'command' 'arguments' commes back through this practical. 

*do* now try to change "hello world" to something else in the cell below:

In [None]:
%%bash
echo "hello world"

Often, an argument is a path to a file. To see what files we have, we have the `ls` command. (short for list)

In [None]:
%%bash
ls

When you precede an ls command with a `%%bash` or an `!` (like you should) the output is black. Oddly enough, for the `ls` command you can skip these and just type `ls`. Then the files are black, folders are purple/blue and trailed by a '/'   Now lets use the `ls` command to see what is in the metagenomics folder

In [None]:
ls data/

We are learning fast. So we now know what a `command` is, we know what an `argument` is. Finally we also need to know what options are. options are often provided in between the `command` and the `argument`. They look either like this
> ls --size --human-readable

or in shortened versions like this
> ls -sh

Note that the above two command are synonimous. Try out in the cell below:

In [None]:
%%bash 
ls --size --human-readable data/

`Commands`, `options` and `arguments` are separated by spaces always, at least one. Note that also `options` can have their own `arguments`. If this is the case, then the manual or help page will specify this. We will get the the manual and help pages later.


Finally, you may also choose not to specify `%%bash` at the beginning of a cell, but the precede a command with an exclamation mark instead. Like so

In [None]:
!ls -sh ./data/

For some commands like `ls` you don't actually need to specify either `%%bash` or the `!`. However, for some commands you do need to. Personally, I find this more confusing than convienent so beware, and work orderly: it is best to specify  either `%%bash` or `!` whenever you write `bash` code.

Wether you choose to use `%%bash` or `!` . Both work. Realise however that a `!` only works for that specific line you type in, and `%%bash` works for the entire cell. 

More experienced users may wonder why we don't work in a Bash notebook, or in a bash script. Bash notebooks are not always properly installed on Jupyter notebook servers and have been unstable in the past. Hence we choose for a more stable python2 notebook. The price we pay for this stability is the slight annoyance that we have to type either `%%bash` or an `!` in every code cell.

## auto-complete
auto-complete is on of the best features of the `bash` language, and your greatest friend during this practical. Lets say we want to list (`ls`) the contents of the `data/` folder, but are to lazy to type the whole word 'data/'. Then we can type 
> ls metag

and then hit the TAB button on your keyboard. Bash should either automatically complete the path to
> ls data/

or if there are multiple options to auto-complete, `bash` will give you a little menu with these options.

Using autocomplete does not only make your life a lot easier, it also prevents you from making typos! If bash autocomplete doesn't work, odds are something in your command or argument is wrong. Best to check before you proceed!

Try out auto-complete below


In [None]:
!ls da

## pipes
Bash can hand the output of one programm to another. This is called piping, if you pipe output of mulitple programms to each other, you made a 'pipeline'. Pipelines look somewhat like this

> command1 | command2 | command 3

One trick with pipes that we will use quite often is the `| head` pipe. This pipe shows you only the first 10 lines of the output of some command. ` | head -n 1` changes this Number to 1. See for yourself below

In [None]:
!ls -1 data/reads/ 

In [None]:
!ls -1 data/reads/ | head -n 1

## loops
Loops are one of the most usefull features of any programming language, and quite intuitive to use. A loop simply is a series of commands that does one thing multiple times, although often slightly different. Lets make a loop together, but first we need to have two concepts clear

* variable
* array

A **variable** is a specific word that means something else, this something may vary. Hence the name. We can specify a variable like this:
> variable1=coffee

To refer to the content of a variable we use a `$`sign. So this looks like so
> echo $variable1

Now create a new cell below and try for yourself. You can name a variable anything you want.

An array is a list of variables, its that simple. To make an array we type something like this
>samples=(E1 E2 E3)

To refer back to an array, we type this
>echo ${samples[@]}

This looks a bit more complicated. the `[@]` part just means: 'all contents in the array' Hence, if you type ` echo ${samples[0]} ` you will only get the first variable in the array. Again, try for yourself below in a new cell.

Now we get to loops.
Lets keep it simple, I will define a loop for you, and you see how it works.

In [None]:
%%bash
break=(coffee thee cookies)
for   i in ${break[@]}
do    echo $i
done

Do you get the loop? Make sure you do. You will make your own loops in the following parts of the practical.

## Jupyter

Working with Cells in jupyter is quite straight forward. You learn best by doing, so do all of the things listed below:

* You can select a cell with your mouse or the arrow keys on your keyboard. 
* You can edit a cell by hitting RETURN or by double-clicking it. 
* A new cell is a code cell by default turn in into a markdown (text) cell by hitting 'm'
* A code cell can be executed by hitting CTRL+RETURN.
* A markdown cell can be rendered by hitting CTRL+RETURN.
* Add an additional cell by hitting the '+' button in the toolbar.
* Add an additional cell by clicking between two cells
* Add an additional cell by using the keyboard
 + add a cell below by hitting the 'b' key
 + and above by hitting the 'a' key
* Whenever your notebook turns out to be unresponsive, you may interupt the underlying programme running the code: the kernel, by
 + hitting the square stop button in the toolbar
 + clicking 'restart' or 'interupt' in the 'kernel menu' in your menu bar.
 + clicking 'close and halt' in the File menu.
 
 
 Try creating a new cell below this cell. Make at least a text cell, and a code cell.


Have you completed all exercises above? Then move on to this: 

## Bash basics extra

* Wildcards*
* Base filenames.
* paths
* manual /help pages


### wildcards 
wildcards can be used in commandlines. For example: list every folder/file inside the `./data/ folder`

In [None]:
!ls ./data/*

or list every file in `./data/reads` that ends on `.gz`

In [2]:
!ls ./data/reads/*.gz

./data/reads/E1.R1.fastq.gz  ./data/reads/P1.R1.fastq.gz
./data/reads/E1.R2.fastq.gz  ./data/reads/P1.R2.fastq.gz
./data/reads/E2.R1.fastq.gz  ./data/reads/P2.R1.fastq.gz
./data/reads/E2.R2.fastq.gz  ./data/reads/P2.R2.fastq.gz
./data/reads/E3.R1.fastq.gz  ./data/reads/P3.R1.fastq.gz
./data/reads/E3.R2.fastq.gz  ./data/reads/P3.R2.fastq.gz


### Base filenames
The base of a file is the part before the extention or extentions, you will need this later in the practical.

### paths
As we have seen now, you can specify folders with a `/`. You can move from folder to folder. If you ever wonder what folder you are in now, you can 'print work directory' or `pwd`.

In [4]:
!pwd

/home/laura/gitprojects/metagenomicspractical


The current folder you are in is denoted as a dot: `.`  Hence, if you type a path like `./data/reads` you tell the computer explicitly to start in the current folder, then move to the data folder, and then move to the reads folder.

If you type `ls /` then you ask the computer to list the root of the filesystem, the highest level on the harddrive. Somewhat like `C:/` on windows computers.

Whenever you see in a manual page or in a prewritten command a bit of code like this
> somecommand /path/to/file

Then it is implied that you substitute the `/path/to/file` with you own path, to your own file.

### help and manual pages

Whenever you are asked to use some command or programme and you don't now exactly how it works, we can ask the computer for help.
* type the command without any argument or options
* type the command with option `--help`
* type the command with option `-h`
* get the manual page `man some-command`

Not all of these always work for every command, but one or two always do: trial and error.

On these webpages, the `man` command doesn't work too well. Better to stick to the `--help` pages.

In [10]:
!head --help

Usage: head [OPTION]... [FILE]...
Print the first 10 lines of each FILE to standard output.
With more than one FILE, precede each with a header giving the file name.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -c, --bytes=[-]NUM       print the first NUM bytes of each file;
                             with the leading '-', print all but the last
                             NUM bytes of each file
  -n, --lines=[-]NUM       print the first NUM lines instead of the first 10;
                             with the leading '-', print all but the last
                             NUM lines of each file
  -q, --quiet, --silent    never print headers giving file names
  -v, --verbose            always print headers giving file names
  -z, --zero-terminated    line delimiter is NUL, not newline
      --help     display this help and exit
      --version  output version information and exit


Quite often, you'll find a 'usage' line at the top of the help page. This tells you how to use the command. In the example of `head` it tells you to first type `head` then any options, and then any file. those entries in \[square brackets\] are optional. Entries without any brackets, or with <arrows\> are required.

## That's it!
You are now ready to work with Bash in Jupyter notebooks! Congratulations. Whenever you get stuck in the subsequent notebooks, maybe comeback here for advise.