# Jupyter Notebook

1. Is an application to build interactive computational notebooks. 
2. Requires a server that runs on your computer. 
3. Running the server:
    * execute `jupyter notebook` in the folder where all your notebooks are, so you can access them easier
    * create a link to start it in the Apps menu under the Anaconda folder

### Jupyter cells

Notebooks are composed of many cells, which can contain descriptions (like this one) or executable code. Types of cells are:
1. Markdown cells are used for **body text** and contain text in [Markdown format](https://www.markdownguide.org/cheat-sheet/). Within Markdown you can also write a *formula* using the [Latex](https://www.latex-project.org/) format (dollar sign):
$f(x) = x^3$, or write code with three ticks and a python keyword, etc. (see documentation for more features):
```python
a = 3
b = 2 + 3
c = a * b
```

2. Code cells are the core cells of a notebook. They contain code in the language of the document's associated kernel (in our case Python3) and an output window showing result of that code. 


### Notebook shortcuts

There are two modes in the Jupyter Notebook: **command mode** and **edit mode**.

- `Enter` enters edit mode
- `Esc` enters command mode
- `Shift + Enter` run the current cell, select below
- `Ctrl + Enter` run the current cell

In command mode:
- `A` insert cell above
- `B` insert cell below
- `X` cut selected cells
- `C` copy selected cells
- `V` paste cells below
- `DD` (press twice) delete selected cells
- `Z` undo cell deletion
- `S` save the notebook
- `Y` change the cell type to Code
- `M` change the cell type to Markdown
- `Shift + Up` extend selected cells above
- `Shift + Down` extend selected cells below

### Test whether Pandas is correctly installed

In [None]:
import math
import pandas as pd

# Introduction to Python

* Python is a programming language. 
* Python is a scripting language: simple and interpretable code
* Python is completly free and open-source with many extensions (libraries) for data-science

### Python as a calculator

Let us now try our new notebook and use it as a calculator. Experiment a bit with simple expressions:

In [None]:
1+1

In [None]:
5*2+32

In [None]:
593433545324535454543245454522 * 43432432543245324325465445434255432

Python is a calculator with lots of memory. It can remember values using variables:

In [None]:
x = 3

In [None]:
4 + x

In [None]:
x = x + 1

In [None]:
x

### Functions and libraries

In [None]:
abs(-3)

In [None]:
math.log(10)

In [None]:
help(math.log)

It is easy to find documentation online. See [math — Mathematical functions](https://docs.python.org/3/library/math.html) for more details about the `math` library.

### My own functions

* A function starts with word `def`, followed by the name of the function.
* Then arguments are provided.
* Implement functionality in an indented block.
* Use `return` reserved word to return result.


We will now implement a function that returns future value of some amount. The arguments are: present value, interest rate, and number of periods. A possible implementation of function for future value function is:

In [None]:
def future_value(pv, r, n):
    result = pv * (1 + r)**n
    return result

In [None]:
future_value(100, 0.1, 10)

In [None]:
future_value(1000, 0.05, 100)

In [None]:
future_value(10,0.1,5)

### Strings

Each variable in Python has a type. You have already seen integers (`int`) and floats (`float`). 

Strings can contain text. String are written within two double quotes or single quotes, both is OK, we suggest you choose one.

In [None]:
a = 1

In [None]:
type(a)

In [None]:
a = 'Python'

In [None]:
type(a)

In [None]:
a + 2

In [None]:
a + '2'

In [None]:
a[2]

In [None]:
a[2:4]

### Lists and control statements

Up until now we learned about three types of values: `int`, `float`, and `str`. Variables of these types can hold a single value, either an integer, float or a string.

A list can contain several values, it contains a sequence of values of any type, where each value has its position.

Lists are used to group data together (not necessarily of the same type); for example, a sequence of data such as cash flows of some project. List are initialized with brackets:

In [None]:
cash_flows = [-100, 20, 40, 50]

In [None]:
cash_flows

In [None]:
cash_flows.append(5)

In [None]:
cash_flows

Lists can be changed in several ways, such as *appending*, *removing*, *finding elements*, etc.

In [None]:
40 in cash_flows

We can concatenate two lists:

In [None]:
cash_flows + [7,6]

### For loop

Looping is one of the main reasons why computers are so powerful. They can repeatedly perform the same task again and again without ever getting tired. 

One of the two loops that Python has is the `for` loop. For is actually *for each*, as it will iterate over elements in a list (or an iterator, which you do not need to know yet). For now we will simply assume that `for` iterates over a list of items.

Syntax:
```python
for v in list_var:
    sentence_1
    sentence_2
    ...
```

The syntax of a `for` statement begins with the `for` word, followed by a variable name, which will be assigned every iteration with the new value from the list, then then reserved `in` word and concluded with the list name. At the end you need to put a colon declaring the start of an indented block.

A for statement is followed by an indented block of lines. Indent is in Python the primary means for defining scopes. In case of `for`, indented lines are those that will execute every for iteration. 

To illustrate this principle, run the following two examples. Having one print clause will simply print every value in its own line:

In [None]:
for c in cash_flows:
    print(c)

Multiply each row by 100 and print it:

In [None]:
for c in cash_flows:
    c1 = c * 100
    print(c1)

This one will be a bit harder. How to sum up all values using a for loop?

In [None]:
s = 0
for c in cash_flows:
    s = s + c
print(s)

In [None]:
sum(cash_flows)

### IF statement

If statements implement a simple decision making in programming: branching between alternative paths that depend on some condition. The syntax of `if` clause is:
```python
if condition_1:
    statement_1
elif condition_2
    statement_2
elif condition_3:
    ...
else:
    statement_n
```

Only the first condition is necessary. The number of `elif` is arbitrary, whereas there can be only one `else`, which is without condition. You can also just skip `else`.

Exercise: extend for loop for calculating sum with a note at each step whether the sum is positive or negative.

In [None]:
s = 0
for c in cash_flows:
    if c > 0:
        s = s + c
print(s)

# Introduction to Pandas

* Open-source Python library, often used for data science.
* Easy-to-use data structures (filtering, pivoting, data cleaning, handling missing data, etc.)
* High-performance (uses NumPy)

Two main structures for representing data are:
* `Series` for (time) series functionality, and
* `DataFrame` for tabular data.

### Series

Very similar behaviour to lists in Python, however each value has a label (index), which can represent:
* position
* string representation
* dates for time series
* etc.

Series where index is position:

In [None]:
cfs = pd.Series(cash_flows)

In [None]:
cfs

In [None]:
cfs[3]

But we can also set year as index (it would be better to use `datetime` object, but more on that later):

In [None]:
cfs2 = pd.Series(cash_flows, index=[2013, 2014, 2015, 2016, 2017])

In [None]:
cfs2

### Accessing elements of a Series

Two methods:
* `.loc` for accessing by index
* `.iloc` for accesssing by position

In [None]:
cfs2[2015]

In [None]:
cfs2.loc[2015]

In [None]:
cfs2.iloc[3]

We can use various functions on Series, such as `sum` or `std`. See `help(s)` for more functions.

In [None]:
cfs2.sum()

In [None]:
cfs2.std()

We can also extend a series with another series.

In [None]:
cfs2 = pd.concat([cfs2, pd.Series([20, 10], index=[2018, 2019])])

In [None]:
cfs2

Question: does Series `s` contain added year 2018 and 2019?

Index does not have to be unique.

In [None]:
cfs2 = pd.concat([cfs2, pd.Series([25, 15], index=[2018, 2019])])

In [None]:
cfs2

This also means that filtering by 2018 will give more than one result. 

In [None]:
cfs2.loc[2018]

### DataFrame

* For representing tabular data.
* Most used data structure in pandas.
* Arithmetic operations on rows and columns.

Usually, data-frames are loaded from files. For a quick demonstration, we will manually create a toy-example.

In [None]:
data = [
    ['Mojca', 'Bread', 1, 1.12],
    ['Mojca', 'Milk', 2, 0.9],
    ['Mojca', 'Cereal', 1, 2.79],
    ['Maja', 'Chocolate', 3, 1.39],
    ['Maja', 'Juice', 2, 0.89],
    ['Maja', 'Lettuce', 1, 1.19],
    ['Miha', 'Jack', 1, 29.99],
    ['Miha', 'Coca-cola', 3, 1.24],
    ['Miha', 'Juice', 3, 0.89]
]

In [None]:
df = pd.DataFrame(data, columns=['Name', 'Item', 'Quantity', 'Price'])

In [None]:
df

We can change index to something else:

In [None]:
df = df.set_index('Name')

In [None]:
df

A data frame is combined of series, both vertically and horizontally.

In [None]:
df['Item']

For selecting several columns, used double brackets:

In [None]:
df[['Item', 'Price']]

How do we get all products bought by Maja?

In [None]:
df.loc['Maja']

Use double brackets in `.loc` to access several indices:

In [None]:
df.loc[['Maja','Miha']]

You can also use other columns to access rows, but you need to specify it as a condition. In condition, we can use all standard comparison operators.

For example, to get rows with quantity larger or equal 3 write:

Or where price is higher than 2 EUR:

In [None]:
df['Price'] > 2

In [None]:
df.loc[df['Price'] > 2]

In [None]:
df.loc[df.Price > 2]

We will now add another column to the DataFrame. It will represent the actual value of the item bought:

In [None]:
df.head()

In [None]:
df['Value'] = df['Quantity'] * df['Price']

In [None]:
df.Value = df.Quantity * df.Price

In [None]:
df['Value2'] = df.Quantity * df.Price

In [None]:
df['Purchase Value'] = df.Quantity * df.Price

In [None]:
df

The following three descriptive function can sometimes help to understand the data better:
* `.shape` gives the dimensions
* `.describe()` computes some basic statistics
* `.info()` provided properties of columns

When you need only the index or only column names, use `.index` and `.columns`.