# Jupyter Notebook

1. Is an application to build interactive computational notebooks. 
2. Requires a server that runs on your computer. 
3. Running the server:
    * execute `jupyter notebook` in the folder where all your notebooks are, so you can access them easier
    * create a link to start it in the Apps menu under the Anaconda folder

### Jupyter cells

Notebooks are composed of many cells, which can contain descriptions (like this one) or executable code. Types of cells are:
1. Markdown cells are used for **body text** and contain text in [Markdown format](https://www.markdownguide.org/cheat-sheet/). Within Markdown you can also write a *formula* using the [Latex](https://www.latex-project.org/) format (dollar sign):
$f(x) = x^3$, or write code with three ticks and a python keyword, etc. (see documentation for more features):
```python
a = 3
b = 2 + 3
c = a * b
```

2. Code cells are the core cells of a notebook. They contain code in the language of the document's associated kernel (in our case Python3) and an output window showing result of that code. 


### Notebook shortcuts

There are two modes in the Jupyter Notebook: **command mode** and **edit mode**.

- `Enter` enters edit mode
- `Esc` enters command mode
- `Shift + Enter` run the current cell, select below
- `Ctrl + Enter` run the current cell

In command mode:
- `A` insert cell above
- `B` insert cell below
- `X` cut selected cells
- `C` copy selected cells
- `V` paste cells below
- `DD` (press twice) delete selected cells
- `Z` undo cell deletion
- `S` save the notebook
- `Y` change the cell type to Code
- `M` change the cell type to Markdown
- `Shift + Up` extend selected cells above
- `Shift + Down` extend selected cells below

### Test whether Pandas is correctly installed

In [1]:
import math
import pandas as pd

# Introduction to Python

* Python is a programming language. 
* Python is a scripting language: simple and interpretable code
* Python is completly free and open-source with many extensions (libraries) for data-science

### Python as a calculator

Let us now try our new notebook and use it as a calculator. Experiment a bit with simple expressions:

In [2]:
1+1

2

In [3]:
5*2+32

42

In [4]:
593433545324535454543245454522 * 43432432543245324325465445434255432

25774262426206802856476660580347848176723532740223409825487463504

Python is a calculator with lots of memory. It can remember values using variables:

In [5]:
x = 3

In [6]:
4 + x

7

In [7]:
x = x + 1

In [8]:
x

4

### Functions and libraries

In [9]:
abs(-3)

3

In [10]:
math.log(10)

2.302585092994046

In [11]:
help(math.log)

Help on built-in function log in module math:

log(...)
    log(x, [base=math.e])
    Return the logarithm of x to the given base.

    If the base is not specified, returns the natural logarithm (base e) of x.



It is easy to find documentation online. See [math — Mathematical functions](https://docs.python.org/3/library/math.html) for more details about the `math` library.

In [12]:
math.log(10, 2)

3.3219280948873626

### My own functions

* A function starts with word `def`, followed by the name of the function.
* Then arguments are provided.
* Implement functionality in an indented block.
* Use `return` reserved word to return result.


We will now implement a function that returns future value of some amount. The arguments are: present value, interest rate, and number of periods. A possible implementation of function for future value function is:

In [13]:
def future_value(pv, r, n):
    result = pv * (1 + r)**n
    return result

In [14]:
future_value(100, 0.1, 10)

259.3742460100002

In [15]:
future_value(1000, 0.05, 100)

131501.257846304

In [16]:
future_value(10,0.1,5)

16.105100000000007

### Strings

Each variable in Python has a type. You have already seen integers (`int`) and floats (`float`). 

Strings can contain text. String are written within two double quotes or single quotes, both is OK, we suggest you choose one.

In [17]:
a = 1

In [18]:
type(a)

int

In [19]:
a = 'Python'

In [20]:
type(a)

str

In [21]:
a + 2

TypeError: can only concatenate str (not "int") to str

In [22]:
a + '2'

'Python2'

In [23]:
a[2]

't'

In [24]:
a[0]

'P'

In [25]:
a[2:4]

'th'

### Lists and control statements

Up until now we learned about three types of values: `int`, `float`, and `str`. Variables of these types can hold a single value, either an integer, float or a string.

A list can contain several values, it contains a sequence of values of any type, where each value has its position.

Lists are used to group data together (not necessarily of the same type); for example, a sequence of data such as cash flows of some project. List are initialized with brackets:

In [26]:
cash_flows = [-100, 20, 40, 50]

In [27]:
cash_flows

[-100, 20, 40, 50]

In [28]:
cash_flows.append(5)

In [29]:
cash_flows

[-100, 20, 40, 50, 5]

Lists can be changed in several ways, such as *appending*, *removing*, *finding elements*, etc.

In [30]:
40 in cash_flows

True

In [31]:
3 in cash_flows

False

In [32]:
cash_flows.append(5)

In [33]:
cash_flows

[-100, 20, 40, 50, 5, 5]

In [34]:
cash_flows.remove(5)

In [35]:
cash_flows

[-100, 20, 40, 50, 5]

We can concatenate two lists:

In [36]:
cash_flows + [7,6]

[-100, 20, 40, 50, 5, 7, 6]

### For loop

Looping is one of the main reasons why computers are so powerful. They can repeatedly perform the same task again and again without ever getting tired. 

One of the two loops that Python has is the `for` loop. For is actually *for each*, as it will iterate over elements in a list (or an iterator, which you do not need to know yet). For now we will simply assume that `for` iterates over a list of items.

Syntax:
```python
for v in list_var:
    sentence_1
    sentence_2
    ...
```

The syntax of a `for` statement begins with the `for` word, followed by a variable name, which will be assigned every iteration with the new value from the list, then then reserved `in` word and concluded with the list name. At the end you need to put a colon declaring the start of an indented block.

A for statement is followed by an indented block of lines. Indent is in Python the primary means for defining scopes. In case of `for`, indented lines are those that will execute every for iteration. 

To illustrate this principle, run the following two examples. Having one print clause will simply print every value in its own line:

In [37]:
for c in cash_flows:
    print(c)

-100
20
40
50
5


Multiply each row by 100 and print it:

In [38]:
for c in cash_flows:
    c1 = c * 100
    print(c1)

-10000
2000
4000
5000
500


This one will be a bit harder. How to sum up all values using a for loop?

In [39]:
s = 0
for c in cash_flows:
    s = s + c
print(s)

15


In [40]:
sum(cash_flows)

15

### IF statement

If statements implement a simple decision making in programming: branching between alternative paths that depend on some condition. The syntax of `if` clause is:
```python
if condition_1:
    statement_1
elif condition_2
    statement_2
elif condition_3:
    ...
else:
    statement_n
```

Only the first condition is necessary. The number of `elif` is arbitrary, whereas there can be only one `else`, which is without condition. You can also just skip `else`.

Exercise: extend for loop for calculating sum with a note at each step whether the sum is positive or negative.

In [41]:
s = 0
for c in cash_flows:
    if c > 0:
        s = s + c
print(s)

115


# Introduction to Pandas

* Open-source Python library, often used for data science.
* Easy-to-use data structures (filtering, pivoting, data cleaning, handling missing data, etc.)
* High-performance (uses NumPy)

Two main structures for representing data are:
* `Series` for (time) series functionality, and
* `DataFrame` for tabular data.

### Series

Very similar behaviour to lists in Python, however each value has a label (index), which can represent:
* position
* string representation
* dates for time series
* etc.

Series where index is position:

In [42]:
cfs = pd.Series(cash_flows)

In [43]:
cfs

0   -100
1     20
2     40
3     50
4      5
dtype: int64

In [44]:
cfs[3]

50

In [45]:
cfs[2:4]

2    40
3    50
dtype: int64

But we can also set year as index (it would be better to use `datetime` object, but more on that later):

In [46]:
cfs2 = pd.Series(cash_flows, index=[2013, 2014, 2015, 2016, 2017])

In [47]:
cfs2

2013   -100
2014     20
2015     40
2016     50
2017      5
dtype: int64

### Accessing elements of a Series

Two methods:
* `.loc` for accessing by index
* `.iloc` for accesssing by position

In [48]:
cfs2[2015]

40

In [49]:
cfs2.loc[2015]

40

In [50]:
cfs2.iloc[3]

50

We can use various functions on Series, such as `sum` or `std`. See `help(s)` for more functions.

In [51]:
cfs2.sum()

15

In [52]:
cfs2.std()

60.166435825965294

We can also extend a series with another series.

In [53]:
cfs2 = pd.concat([cfs2, pd.Series([20, 10], index=[2018, 2019])])

In [54]:
cfs2

2013   -100
2014     20
2015     40
2016     50
2017      5
2018     20
2019     10
dtype: int64

Question: does Series `cfs2` contain added year 2018 and 2019?

In [55]:
cash_flows[2018]

IndexError: list index out of range

In [56]:
cfs2[2018], cfs2[2019]

(20, 10)

Index does not have to be unique.

In [57]:
cfs2 = pd.concat([cfs2, pd.Series([25, 15], index=[2018, 2019])])

In [58]:
cfs2

2013   -100
2014     20
2015     40
2016     50
2017      5
2018     20
2019     10
2018     25
2019     15
dtype: int64

This also means that filtering by 2018 will give more than one result. 

In [59]:
cfs2.loc[2018]

2018    20
2018    25
dtype: int64

### DataFrame

* For representing tabular data.
* Most used data structure in pandas.
* Arithmetic operations on rows and columns.

Usually, data-frames are loaded from files. For a quick demonstration, we will manually create a toy-example.

In [60]:
data = [
    ['Mojca', 'Bread', 1, 1.12],
    ['Mojca', 'Milk', 2, 0.9],
    ['Mojca', 'Cereal', 1, 2.79],
    ['Maja', 'Chocolate', 3, 1.39],
    ['Maja', 'Juice', 2, 0.89],
    ['Maja', 'Lettuce', 1, 1.19],
    ['Miha', 'Jack', 1, 29.99],
    ['Miha', 'Coca-cola', 3, 1.24],
    ['Miha', 'Juice', 3, 0.89]
]

In [61]:
df = pd.DataFrame(data, columns=['Name', 'Item', 'Quantity', 'Price'])

In [62]:
df

Unnamed: 0,Name,Item,Quantity,Price
0,Mojca,Bread,1,1.12
1,Mojca,Milk,2,0.9
2,Mojca,Cereal,1,2.79
3,Maja,Chocolate,3,1.39
4,Maja,Juice,2,0.89
5,Maja,Lettuce,1,1.19
6,Miha,Jack,1,29.99
7,Miha,Coca-cola,3,1.24
8,Miha,Juice,3,0.89


We can change index to something else:

In [63]:
df = df.set_index('Name')

In [64]:
df

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Bread,1,1.12
Mojca,Milk,2,0.9
Mojca,Cereal,1,2.79
Maja,Chocolate,3,1.39
Maja,Juice,2,0.89
Maja,Lettuce,1,1.19
Miha,Jack,1,29.99
Miha,Coca-cola,3,1.24
Miha,Juice,3,0.89


A data frame is combined of series, both vertically and horizontally.

In [65]:
df['Item']

Name
Mojca        Bread
Mojca         Milk
Mojca       Cereal
Maja     Chocolate
Maja         Juice
Maja       Lettuce
Miha          Jack
Miha     Coca-cola
Miha         Juice
Name: Item, dtype: object

In [66]:
df.Item

Name
Mojca        Bread
Mojca         Milk
Mojca       Cereal
Maja     Chocolate
Maja         Juice
Maja       Lettuce
Miha          Jack
Miha     Coca-cola
Miha         Juice
Name: Item, dtype: object

For selecting several columns, used double brackets:

In [67]:
df[['Item', 'Price']]

Unnamed: 0_level_0,Item,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mojca,Bread,1.12
Mojca,Milk,0.9
Mojca,Cereal,2.79
Maja,Chocolate,1.39
Maja,Juice,0.89
Maja,Lettuce,1.19
Miha,Jack,29.99
Miha,Coca-cola,1.24
Miha,Juice,0.89


How do we get all products bought by Maja?

In [68]:
df.loc['Maja']

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Maja,Chocolate,3,1.39
Maja,Juice,2,0.89
Maja,Lettuce,1,1.19


Use double brackets in `.loc` to access several indices:

In [69]:
df.loc[['Maja','Miha']]

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Maja,Chocolate,3,1.39
Maja,Juice,2,0.89
Maja,Lettuce,1,1.19
Miha,Jack,1,29.99
Miha,Coca-cola,3,1.24
Miha,Juice,3,0.89


You can also use other columns to access rows, but you need to specify it as a condition. In condition, we can use all standard comparison operators.

For example, to get rows with quantity larger or equal 3 write:

In [70]:
df['Quantity'] >= 3

Name
Mojca    False
Mojca    False
Mojca    False
Maja      True
Maja     False
Maja     False
Miha     False
Miha      True
Miha      True
Name: Quantity, dtype: bool

In [71]:
df[df['Quantity'] >= 3]

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Maja,Chocolate,3,1.39
Miha,Coca-cola,3,1.24
Miha,Juice,3,0.89


Or where price is higher than 2 EUR:

In [72]:
df['Price'] > 2

Name
Mojca    False
Mojca    False
Mojca     True
Maja     False
Maja     False
Maja     False
Miha      True
Miha     False
Miha     False
Name: Price, dtype: bool

In [73]:
df[df['Price'] > 2]

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Cereal,1,2.79
Miha,Jack,1,29.99


In [74]:
df.loc[df['Price'] > 2]

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Cereal,1,2.79
Miha,Jack,1,29.99


In [75]:
df.loc[df.Price > 2]

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Cereal,1,2.79
Miha,Jack,1,29.99


We will now add another column to the DataFrame. It will represent the actual value of the item bought:

In [76]:
df.head()

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Bread,1,1.12
Mojca,Milk,2,0.9
Mojca,Cereal,1,2.79
Maja,Chocolate,3,1.39
Maja,Juice,2,0.89


In [77]:
df.head(3)

Unnamed: 0_level_0,Item,Quantity,Price
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mojca,Bread,1,1.12
Mojca,Milk,2,0.9
Mojca,Cereal,1,2.79


In [78]:
df['Value'] = df['Quantity'] * df['Price']

In [79]:
df.Value = df.Quantity * df.Price

In [80]:
df['Value2'] = df.Quantity * df.Price

In [81]:
df['Purchase Value'] = df.Quantity * df.Price

In [82]:
df

Unnamed: 0_level_0,Item,Quantity,Price,Value,Value2,Purchase Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Mojca,Bread,1,1.12,1.12,1.12,1.12
Mojca,Milk,2,0.9,1.8,1.8,1.8
Mojca,Cereal,1,2.79,2.79,2.79,2.79
Maja,Chocolate,3,1.39,4.17,4.17,4.17
Maja,Juice,2,0.89,1.78,1.78,1.78
Maja,Lettuce,1,1.19,1.19,1.19,1.19
Miha,Jack,1,29.99,29.99,29.99,29.99
Miha,Coca-cola,3,1.24,3.72,3.72,3.72
Miha,Juice,3,0.89,2.67,2.67,2.67


The following three descriptive function can sometimes help to understand the data better:
* `.shape` gives the dimensions
* `.describe()` computes some basic statistics
* `.info()` provided properties of columns

In [83]:
df.shape

(9, 6)

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, Mojca to Miha
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Item            9 non-null      object 
 1   Quantity        9 non-null      int64  
 2   Price           9 non-null      float64
 3   Value           9 non-null      float64
 4   Value2          9 non-null      float64
 5   Purchase Value  9 non-null      float64
dtypes: float64(4), int64(1), object(1)
memory usage: 804.0+ bytes


In [85]:
df.describe()

Unnamed: 0,Quantity,Price,Value,Value2,Purchase Value
count,9.0,9.0,9.0,9.0,9.0
mean,1.888889,4.488889,5.47,5.47,5.47
std,0.927961,9.58102,9.255615,9.255615,9.255615
min,1.0,0.89,1.12,1.12,1.12
25%,1.0,0.9,1.78,1.78,1.78
50%,2.0,1.19,2.67,2.67,2.67
75%,3.0,1.39,3.72,3.72,3.72
max,3.0,29.99,29.99,29.99,29.99


When you need only the index or only column names, use `.index` and `.columns`.

In [86]:
df.columns

Index(['Item', 'Quantity', 'Price', 'Value', 'Value2', 'Purchase Value'], dtype='object')

In [87]:
list(df.columns)

['Item', 'Quantity', 'Price', 'Value', 'Value2', 'Purchase Value']

In [88]:
df.index

Index(['Mojca', 'Mojca', 'Mojca', 'Maja', 'Maja', 'Maja', 'Miha', 'Miha',
       'Miha'],
      dtype='object', name='Name')