# Introduction to Jupyter (IPython) Notebook
**CS2545 - Data Science for Big Data Analytics**

Francis Palma, UNB
(Acknowledgement: Alfred Essa)


Jupyter is a language-agnostic interactive development environment for scientific computing and data science.   

Jupyter's kernel and messaging architecture now supports different programming languages, including Python, R, Julia and many others.
The Jupyter Notebook has three components:


* **Web application**. A web application for writing and running code interactively and authoring notebooks.
* **Kernels**. Backend processes that run code in a specific language and return output back to the web application for display or further computation.
* **Notebook documents**. Documents consisting of code, text annotations, images, and video. Each document is stored as a JSON file.


## Architecture

The Jupyter (IPython) Kernel is a separate process which is responsible for running user code, and things like computing possible completions. Frontends, like the notebook, communicate with the Kernel using JSON messages sent over ZeroMQ sockets
 
<img src="http://www.cs.unb.ca/~sray/teaching/datascience/architecture.png"/>


- Help
- Tab Completion
- Shell Commands
- Magic Commands

## Why should I learn Jupyter Notebook?

### Reason 1

It optimizes the workflow for data science. 


### Reason 2 
The Jupyter environment is at the intersection of innovation in both the scientific computing and data science communities. By staying abreast of advances in Jupyter you will be able to leverage and incorporate innovations far more quickly in your own work.

### Help

The main IPython help panel can be opened with a single **`?`**.

In [4]:
?

Adding a **`?`** after an object name will show rich details about that object, including: docstrings, function signatures, and constructor details.

In [5]:
import pandas as pd


In [None]:
import pandas as pd

In [7]:
pd.read_csv??

Adding a **`??`** after an object name will show the full source code of that object, if it is available.

A quick reference card for IPython can also be opened at any time.

In [8]:
%quickref

### Tab Completion

Code cells support tab completion, which is a convenient way to explore the attribute structure of an object and the symbols available in the Python namespace. It can be triggered by partially typing an object name and then pressing **`<TAB>`**. It can also be used to complete file and directory names.

In [15]:
pd.

SyntaxError: invalid syntax (<ipython-input-15-bc888235687a>, line 1)

### Shell Commands

It's also possible to execute commands directly on the underlying operating system. Expressions which are prefixed with a **`!`** are passed along to the system shell. This is typically Bash on Unix/Linux systems, and CMD.exe on Windows.

In [9]:
!dir

 Volume in drive C has no label.
 Volume Serial Number is 643D-BFC4

 Directory of C:\_work\_Teaching\CS2545_DataScienceForBigDataAnalysis\Fall2021\Code\Handson\Handson1

09/17/2021  12:58 PM    <DIR>          .
09/17/2021  12:58 PM    <DIR>          ..
09/16/2021  07:30 PM    <DIR>          .ipynb_checkpoints
09/16/2021  04:55 PM    <DIR>          2018
09/16/2021  06:18 PM    <DIR>          2020
09/17/2021  12:04 PM             4,554 Handson1_F2021.ipynb
09/17/2021  12:58 PM            20,196 IntroductionToJupyterNotebook.ipynb
09/17/2021  09:25 AM    <DIR>          soln
09/17/2021  12:05 PM    <DIR>          tmp
               2 File(s)         24,750 bytes
               7 Dir(s)  54,453,563,392 bytes free


# hello world

In [18]:
i = "CS2545 class"
print ("hello world", i)

hello world CS2545 class


In [3]:
i = "CS2545 class"
print ("hello world", i)


hello world CS2545 class


In [10]:
!ls

'ls' is not recognized as an internal or external command,
operable program or batch file.


The output of the shell command can be stored as a Python variable. The command output is split on newlines and returned as a Python list of strings.

In [11]:
files = !dir
print("My current directory's files:")
print(files)

My current directory's files:
[' Volume in drive C has no label.', ' Volume Serial Number is 643D-BFC4', '', ' Directory of C:\\_work\\_Teaching\\CS2545_DataScienceForBigDataAnalysis\\Fall2019\\Code\\Handson\\Handson1', '', '09/13/2019  12:42 PM    <DIR>          .', '09/13/2019  12:42 PM    <DIR>          ..', '09/12/2019  03:10 PM    <DIR>          .ipynb_checkpoints', '09/16/2016  03:29 PM            30,974 architecture.png', '09/16/2016  03:19 PM           101,277 cat.png', '09/15/2017  11:37 AM            42,297 D2L.png', '08/25/2019  03:59 PM             1,206 flower.py', '09/13/2019  12:20 PM             4,111 Handson1_F2019.ipynb', '08/22/2019  05:59 PM    <DIR>          images', '09/13/2019  12:42 PM            12,164 IntroductionToJupyterNotebook.ipynb', '09/14/2017  06:51 PM            17,288 Notebook_tutorial.ipynb', '08/25/2019  03:59 PM             1,396 pie.py', '08/25/2019  03:59 PM             2,012 polygon.py', '08/25/2019  05:00 PM             2,915 polygon.pyc', '09

## Magic Commands

The IPython 'magic' functions are a set of commands, invoked by prepending one or two **`%`** signs to their name.  

Magics invoked with a single `%` are "line magics" and act on the rest of the line.  Magics invoked with `%%` are "cell magics" and act on the rest of the cell.

In [10]:
%hist

print ("hello world")
i = "CS2545 class"
print ("hello world" + i)
i = "CS2545 class"
print ("hello world", i)
?
import pandas as pd
pd.read_csv??
pd.read_csv??
%quickref
!dir
%hist


Show documentation for the magic system.

In [11]:
%magic

Use the `timeit` magic for timing snippets of code.

In [12]:
%timeit range(10)

1000000 loops, best of 3: 453 ns per loop


In [13]:
%%timeit
range(10)
range(100)

1000000 loops, best of 3: 1.96 µs per loop


Show all magic commands available for the system.

In [14]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %colors  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %matplotlib  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%rub

## Jupyter Notebook
---

A new notebook can be created from the dashboard by clicking on the Files tab, followed by the **`New dropdown`** button, and then selecting the language of choice for the notebook.


### Header

At the top of the notebook document is a header which contains the notebook title, a menubar, and toolbar. This header remains fixed at the top of the screen, even as the body of the notebook is scrolled. The title can be edited in-place (which renames the notebook file), and the menubar and toolbar contain a variety of actions which control notebook navigation and document structure.


### Body

The body of a notebook is composed of cells. Each cell contains either markdown, code input/output, or raw text. Cells can be included in any order and edited at-will, allowing for a large amount of flexibility for constructing a narrative.

- **Markdown cells** - These are used to build a nicely formatted narrative around the code in the document. The majority of this lesson is composed of markdown cells.

- **Code cells** - These are used to define the computational code in the document. They have two parts: the *input* editor where the user types the code to be executed, and the *output* region which is the representation of the executed code. Depending on the code, this representation may be a simple scalar value, or something more complex like a plot or an interactive widget.

- **Raw cells** - These are used when text needs to be included in raw form, without execution or transformation.

### Running Code

Run a code cell using **`Shift-Enter`** or pressing the <button class='btn btn-default btn-xs'><i class="fa fa-step-forward"></i></button> button in the toolbar above. This executes the cell and puts the cursor in the next cell below, or makes a new cell if the cursor is at the end.  Alternatively:

- **`Alt-Enter`** - force the creation of a new cell unconditionally (useful when inserting new content in the middle of an existing notebook).
- **`Ctrl-Enter`** - execute the cell and keep the cursor in the same cell (useful for quick experimentation of snippets that don't need to be kept permanently).

### Markdown


Rich text can added to notebook cells using Markdown - a popular plain text formatting syntax. This allows for authoring rich narratives to support the code in a notebook document.

*italic text*


**bold text**


Ordered List:

1. First
2. Second
3. Third


Unordered List:
* One
  - table
  - chair
* Two
  - cat
* Three
  - dog
  
Embedded image: <img src="http://www.cs.unb.ca/~sray/teaching/datascience/cat.png" />

Embedded Code:

```python
def f(x):
    """a docstring"""
    return x**2
```

In [14]:
print ("hello")

hello
