# Notes

## Main takeaways

### Python & libraries

- `pandas` is Python's equivalent of R's `readr` and `dplyr` packages. It is commonly imported with the alias `pd`.
- Python coding style is different from R. For example, there should be no spaces around the `=` inside function calls.
- Row and column indices begin at `0`.

### Jupyter notebooks
- On Windows, Jupyter won't be installed properly (*I think*) unless Anaconda is installed for all users. This is not the recommended configuration.
- Auto-complete works in a similar way that in RStudio.
- When we launch Jupyter, we should immediately browse to the directory where we wish to save our notebook. This location will be the working directory of the notebook.

## Cheatsheet

### Libraries and data

In [20]:
import pandas as pd
import numpy as np

tips = pd.read_csv("Data/tips.csv")

### Python

#### Select columns

In [21]:
tips['tip'].head() # returns a Series object, which also has a head method

0    1.01
1    1.66
2    3.50
3    3.31
4    3.61
Name: tip, dtype: float64

In [22]:
tips[['tip', 'total_bill']].head() # returns a DataFrame

Unnamed: 0,tip,total_bill
0,1.01,16.99
1,1.66,10.34
2,3.5,21.01
3,3.31,23.68
4,3.61,24.59


#### Create or mutate columns

In [23]:
tips['tip_share'] = tips['tip'] / tips['total_bill']
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


#### Filter rows

In [24]:
tips2 = tips.query("(time == 'Dinner' & day != 'Sun' | size > 4)")
tips2.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
19,20.65,3.35,Male,No,Sat,Dinner,3,0.162228
20,17.92,4.08,Male,No,Sat,Dinner,2,0.227679
21,20.29,2.75,Female,No,Sat,Dinner,2,0.135535
22,15.77,2.23,Female,No,Sat,Dinner,2,0.141408
23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288


#### Slice a data frame

Slicing uses the following syntax: 

```
df[start:end:by]
```

And remember: *left incluse, right exclusive*.

In [25]:
# First three rows
tips[0:3]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587


In [26]:
# Until row 10 (exclusive), every 2 rows
tips[:10:2]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808
6,8.77,2.0,Male,No,Sun,Dinner,2,0.22805
8,15.04,1.96,Male,No,Sun,Dinner,2,0.130319


In [27]:
# First row
tips[:1:]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447


#### Rename

In [28]:
tips2 = tips2.rename(columns = {"time" : "time_of_day", "size" : "party_size"})
tips2.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time_of_day,party_size,tip_share
19,20.65,3.35,Male,No,Sat,Dinner,3,0.162228
20,17.92,4.08,Male,No,Sat,Dinner,2,0.227679
21,20.29,2.75,Female,No,Sat,Dinner,2,0.135535
22,15.77,2.23,Female,No,Sat,Dinner,2,0.141408
23,39.42,7.58,Male,No,Sat,Dinner,4,0.192288


#### Summarising

In [29]:
tips.agg({"size" : min, "tip" : [min, max, np.mean]})

Unnamed: 0,size,tip
max,,10.0
mean,,2.998279
min,1.0,1.0


#### Grouping

`group_by()` + `summarise()`

In [30]:
tips.groupby("time").agg({"size" : min, "tip" : [min, max, np.mean]})

Unnamed: 0_level_0,size,tip,tip,tip
Unnamed: 0_level_1,min,min,max,mean
time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Dinner,1,1.0,10.0,3.10267
Lunch,1,1.25,6.7,2.728088


`group_by()` + `mutate()`

In [31]:
tips2 = tips.copy()
tips2["avg_daily_tip"] = tips2.groupby(["day"])["tip"].transform(np.mean)
tips2.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share,avg_daily_tip
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,3.255132
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542,3.255132
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587,3.255132
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978,3.255132
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,3.255132


## Other lessons

### Difference between assignment in R and Python

When we are working with DataFrames, the assignment operator from Python (i.e. `=`) doesn't *create* a new DataFrame as a copy of the first, as is done in R, but creates a *reference* to the original DataFrame. This means that all transformations made to either the original or the copied DataFrame will be applied to both objects. 

Here is an example that illustrates the situation better.


In [32]:
tips_copy  = tips.copy() # copies tips
tips_equal = tips        # creates reference to tips

# Let's add a new column
tips_equal["size_dup"] = tips_equal["size"]

# Print result
tips_equal.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share,size_dup
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,2
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542,3
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587,3
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978,2
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,4


Notice that `tips` is also modified, even though we didn't explicitely add the new column.

In [33]:
tips.head() 

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share,size_dup
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447,2
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542,3
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587,3
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978,2
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808,4


`tips_copy`, which was created via `pd.copy()`, retains the original columns of `tips`.

In [34]:
# tips remains the same, because we used pd.copy()
tips_copy.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_share
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059447
1,10.34,1.66,Male,No,Sun,Dinner,3,0.160542
2,21.01,3.5,Male,No,Sun,Dinner,3,0.166587
3,23.68,3.31,Male,No,Sun,Dinner,2,0.13978
4,24.59,3.61,Female,No,Sun,Dinner,4,0.146808


For more information on this type of behaviour, read [this](https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list).