**Machine Learning** <br/>

**Toolchain Tutorial Part 1 : Jupyter Notebook** <br/>

Resources:
- https://nbviewer.jupyter.org/github/ipython/ipython/blob/3.x/examples/Notebook/Notebook%20Basics.ipynb
- https://nbviewer.jupyter.org/github/ipython/ipython/blob/3.x/examples/Notebook/Running%20Code.ipynb

# Jupyter Notebook Basics

After the complete installation of the working environment as described in **`Setup_Working_Environment.pdf`** (ILIAS), you are finally able to start your own Jupyter Notebook Server on your PC. 

### But what is actually a Jupyter Notebook and why do we need it?

<img src="Images/jupyter-header.jpg"  style="width: 800px;"/>

>"*The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.*"
>
> Resource: https://jupyter.org/

>"*A notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media. In other words: it's a single document where you can run code, display the output, and also add explanations, formulas, charts, and make your work more transparent, understandable, repeatable, and shareable.*
>
>*Using Notebooks is now a major part of the data science workflow at companies across the globe. If your goal is to work with data, using a Notebook will speed up your workflow and make it easier to communicate and share your results.*"
>
>Resource: https://www.dataquest.io/blog/jupyter-notebook-tutorial/

### What are the advantages of Jupyter Notebook?

<div class="alert alert-success">   
    
* Jupyter Notebook supports over 40 programming languages, including Python, R, Julia and Scala. In this course we will only focus on the Python programming language, as it is the most common use case in context of machine learning and data science applications.
</div>

<div class="alert alert-success">   

* Jupyter Notebooks can be shared with others using E-Mail, Dropbox, GitHub and other file sharing platforms. That makes it easier for us to work together on this course.
</div>

<div class="alert alert-success">
    
* Your code can be easily implemented into Jupyter Notebooks and produce rich, interactive outputs e.g. HTML, images, videos, LaTeX and more.
</div>

<div class="alert alert-success">
    
* The code is executed cell by cell in Jupyter Notebook. This makes it easy to understand complex programs. This also makes troubleshooting easier.
</div>

### How does Jupyter Notebook work?

The answer to this question can be complex and time consuming. Therefore we will only focus on the essentials here.

**Notebook Document Format**

First of all, Jupyter Notebook is an open document format based on JSON. The easiest way to recognize it is by the file extension. For example, this file ends with `.ipynb`. The abbreviation `nb` represents the Jupyter Notebook document. The first part of the file extension `ipy` is a reference to the so-called kernel. In this case it is the IPython Kernel, which is used to process our code written in the Python programming language.

**Interactive Computing Protocol**

The Jupyter Notebooks communicates with the computationals Kernels using the Interactive Computing Protocol, which is an open network protocol based on JSON data over ZMQ and WebSockets. For this purpose, a Jupyter web server is running in the background of the application, which manages the individual kernels, among other things.

**Kernels**

The Kernels which are used in Jupyter Notebook documents are processes that run interactive code in a particular programming language and return output to the user. We will use the IPython Kernel, which provides a rich toolkit to help us make the most of using the Python programming language interactively.

---
If you want to learn how the processes of Jupyter Notebook work under the hood, you can take a look into the official documentation: https://jupyter.org/documentation

However, the information listed above is sufficient for a basic understanding. We will use Jupyter Notebook as a tool in this course. For this reason, the more important question for us is how we use this tool correctly for our purposes! So let's start with the tutorial and get to know the essential functions of Jupyter Notebook.

---
## 1. Dashboard
---

When the Jupyter Notebook web server is first started, a browser will be opened to the notebook dashboard. The dashboard serves as a home page for the notebook. Its main purpose is to display the portion of the filesystem accessible by the user, and to provide an overview of the running kernels, terminals, and parallel clusters.

The Jupyter Notebook can only access to files, which are accessible by the user, e.g. files in your main hard drive. To prevent problems, we recommend working on the main system partition, e.g. `C:\..` for windows users.

During this course we wont use terminals or clusters, as they are not needed to work on our projects. We will mainly work on notebooks. Therefore, we will speak about the notebook in detail on the following sections. 

But before we start exploring Jupyter Notebook, we first need to understand two of the most essential terms: *`cells`* and *`kernels`*:
- A kernel is a “computational engine” that executes the code contained in a notebook document. In our case it is a IPython Kernel which runs on Python version 3.8 and implements all necessary packages for this course.
- A cell is a container for text to be displayed in the notebook or code to be executed by the notebook’s kernel. We will cover all of the relevant content for cells in a moment.

### 1.1. Files Tab

The files tab provides an interactive view of the portion of the filesystem which is accessible by the user. This is typically rooted by the directory in which the notebook server was started.

The top of the files list displays clickable breadcrumbs of the current directory. It is possible to navigate the filesystem by clicking on these breadcrumbs or on the directories displayed in the notebook list.

A new notebook can be created by clicking on the **`New`** dropdown button at the top of the list, and selecting the desired language kernel.

Notebooks can also be uploaded to the current directory by dragging a notebook file onto the list or by clicking the **`Upload`** button at the top of the list.

<img src="Images/files_tab.png" />

### 1.2. Running Tab

The running tab displays the currently running notebooks which are known to the server. This view provides a convenient way to track notebooks that have been started during a long running notebook server session.

Each running notebook will have an orange **`Shutdown`** button which can be used to shutdown its associated kernel. Closing the notebook's page is not sufficient to shutdown a kernel. 

<img src="Images/running_tab.png" />

---
## 2. Notebook
---

When a notebook is opened, a new browser tab will be created which presents the notebook user interface (UI). This UI allows for interactively editing and running the notebook document.

A new notebook can be created from the dashboard by clicking on the **`Files`** tab, followed by the **`New`** dropdown button, and then selecting the language of choice for the notebook.

An interactive tour of the notebook UI can be started by selecting **`Help -> User Interface Tour`** from the notebook menu bar.

### 2.1. Header

At the top of the notebook document is a header which contains a menubar and a toolbar. This header remains fixed at the top of the screen, even as the body of the notebook is scrolled. The menubar and toolbar contain a variety of actions which control notebook navigation and document structure.

<img src="Images/header.png" />

### 2.2. Body

The body of a notebook is composed of cells. Each cell contains either markdown, code input, code output, or raw text. Cells can be included in any order and edited at-will, allowing for a large ammount of flexibility for constructing a narrative.

- **Markdown cells**: <br>
These are used to build a nicely formatted narrative around the code in the document. The majority of this lesson is composed of markdown cells. 
<br>

- **Code cells**: <br>These are used to define the computational code in the document. They come in two forms: the *input cell* where the user types the code to be executed, and the *output cell* which is the representation of the executed code. Depending on the code, this representation may be a simple scalar value, or something more complex like a plot or an interactive widget.
<br>

- **Raw cells**: <br> These are used when text needs to be included in raw form, without execution or transformation.
<br>

Example for different cell types: <br>
<img src="Images/body_type.jpg" />

#### Example from the picture above

I`m a **markdown** cell.

In [1]:
print("I´m a code cell")

I´m a code cell


#### Modality

The notebook user interface is *modal*. This means that the keyboard behaves differently depending upon the current mode of the notebook. A notebook has two modes: **edit** and **command**.

**Edit mode** is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor.

<img src="Images/edit_mode.png" style="width: 800px;">

**Command mode** is indicated by a grey cell border. When in command mode, the structure of the notebook can be modified as a whole, but the text in individual cells cannot be changed. Most importantly, the keyboard is mapped to a set of shortcuts for efficiently performing notebook and cell actions. For example, pressing **`c`** when in command mode, will copy the current cell; no modifier is needed.

<img src="Images/command_mode.png" style="width: 800px;">

<br>
<div class="alert alert-success">
Enter edit mode by pressing `Enter` or using the mouse to click on a cell's editor area.
</div>
<div class="alert alert-success">
Enter command mode by pressing `Esc` or using the mouse to click *outside* a cell's editor area.
</div>
<div class="alert alert-warning">
Do not attempt to type into a cell when in command mode; unexpected things will happen!
</div>

#### Mouse navigation

The first concept to understand in mouse-based navigation is that **cells can be selected by clicking on them.** The currently selected cell is indicated with a grey or green border depending on whether the notebook is in edit or command mode. Clicking inside a cell's editor area will enter edit mode. Clicking on the prompt or the output area of a cell will enter command mode.

The second concept to understand in mouse-based navigation is that **cell actions usually apply to the currently selected cell**. For example, to run the code in a cell, select it and then click the <button class='btn btn-default btn-xs'><i class="fa fa-play icon-play"></i></button> button in the toolbar or the **`Cell -> Run`** menu item. Similarly, to copy a cell, select it and then click the <button class='btn btn-default btn-xs'><i class="fa fa-copy icon-copy"></i></button> button in the toolbar or the **`Edit -> Copy`** menu item. With this simple pattern, it should be possible to perform nearly every action with the mouse.

Markdown cells have one other state which can be modified with the mouse. These cells can either be rendered or unrendered. When they are rendered, a nice formatted representation of the cell's contents will be presented. When they are unrendered, the raw text source of the cell will be presented. To render the selected cell with the mouse, click the <button class='btn btn-default btn-xs'><i class="fa fa-play icon-play"></i></button> button in the toolbar or the **`Cell -> Run`** menu item. To unrender the selected cell, double click on the cell.

#### Keyboard Navigation

The modal user interface of the IPython Notebook has been optimized for efficient keyboard usage. This is made possible by having two different sets of keyboard shortcuts: one set that is active in edit mode and another in command mode.

The most important keyboard shortcuts are **`Enter`**, which enters edit mode, and **`Esc`**, which enters command mode.

In edit mode, most of the keyboard is dedicated to typing into the cell's editor. Thus, in edit mode there are relatively few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more possibilities.

The following images give an overview of the available keyboard shortcuts. These can viewed in the notebook at any time via the **`Help -> Keyboard Shortcuts`** menu item.

<img src="Images/notebook_shortcuts_4_0.png">

The following shortcuts have been found to be the most useful in day-to-day tasks:

- Basic navigation: **`enter`**, **`shift-enter`**, **`up/k`**, **`down/j`**
- Saving the notebook: **`s`**
- Cell types: **`y`**, **`m`**, **`1-6`**, **`r`**
- Cell creation: **`a`**, **`b`**
- Cell editing: **`x`**, **`c`**, **`v`**, **`d`**, **`z`**

You will learn most of them as you work on your projects.

---
## 3. Running Code
---

First and foremost, the IPython Notebook is an interactive environment for writing and running code. IPython is capable of running code in a wide range of languages. However, this notebook, and the default kernel in IPython 2.0, runs Python code.

### 3.1. Code cells allow you to enter and run Python code

Run a code cell using `Shift-Enter` or pressing the <button class='btn btn-default btn-xs'><i class="icon-play fa fa-play"></i></button> button in the toolbar above:


In [2]:
a = 10

In [3]:
print(a)

10


There are two other keyboard shortcuts for running code:

* `Alt-Enter` runs the current cell and inserts a new one below.
* `Ctrl-Enter` run the current cell and enters command mode.

### 3.2. Managing the IPython Kernel

Code is run in a separate process called the IPython Kernel.  The Kernel can be interrupted or restarted.  
Try running the following cell and then hit the <button class='btn btn-default btn-xs'><i class='icon-stop fa fa-stop'></i></button> button in the toolbar above.

In [4]:
import time
time.sleep(10)  # sleep for 10 seconds

If the Kernel dies you will be prompted to restart it. 

<img src="Images/dead_kernel.jpg" style="width: 600px;">

#### Restarting the kernel manually

The kernel maintains the state of a notebook's computations. You can reset this state by restarting the kernel. This is done by clicking on the <button class='btn btn-default btn-xs'><i class='fa fa-repeat icon-repeat'></i></button> in the toolbar above.

#### Output is asynchronous
All output is displayed asynchronously as it is generated in the Kernel. If you execute the next cell, you will see the output one piece at a time, not all at the end.

In [5]:
import time, sys
for i in range(8):
    print(i)
    time.sleep(0.5)

0
1
2
3
4
5
6
7


#### Large outputs
To better handle large outputs, the output area can be collapsed. Run the following cell and then single- or double- click on the active area to the left of the output:

In [6]:
for i in range(50):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49


Beyond a certain point, output will scroll automatically:

In [7]:
for i in range(500):
    print(2**i)

1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
2147483648
4294967296
8589934592
17179869184
34359738368
68719476736
137438953472
274877906944
549755813888
1099511627776
2199023255552
4398046511104
8796093022208
17592186044416
35184372088832
70368744177664
140737488355328
281474976710656
562949953421312
1125899906842624
2251799813685248
4503599627370496
9007199254740992
18014398509481984
36028797018963968
72057594037927936
144115188075855872
288230376151711744
576460752303423488
1152921504606846976
2305843009213693952
4611686018427387904
9223372036854775808
18446744073709551616
36893488147419103232
73786976294838206464
147573952589676412928
295147905179352825856
590295810358705651712
1180591620717411303424
2361183241434822606848
4722366482869645213696
9444732965739290427392
18889465931478580854784
37778931862957161709568
75557863725914323419136
151

---
## 4. Recommandations
---

Make sure to toggle the line numbers for the code cells. The line numbers help you to orientate yourself in the code. They also help you to follow the Juypter Notebook tutorials. You can toglle the line numbers by clicking on `View` and then on `Toggle Line Numbers` on the dropdown menu.

<img src="Images/toggle-line-numbers.png">

After you have toggled on the line numbers, your code cells will look like this:

<img src="Images/line-numbers-example.png">

---
## 5. Further Information
---

Further informations about notebook functions can be read from the official documentation of IPython and Jupyter Notebook.

To open the documentation select  **`Help -> Notebook Help `** from the navigation bar.