# Plotting with matplotlib and seaborn

Before we approach plotting routines in python, we introduce a couple of different python related tools and packages.

### Jupyter notebooks

Spin off from the [IPython](https://ipython.readthedocs.io/en/stable/) project. Jupyter enables interactive programming for Julia, Python and R (JuPytR).
Enables a mixture of documentation, code, and results to appear in the same page.

Pros:
* Great for documentation
* Easy way to dessiminate ideas
* Quick results

Cons:
* Unlike a script a notebook is not executed in one run from a terminal (hard to automate)
* Hard to version control

I.e. great for use-once code, or documentation, but not for production code.

Install: `$ pip install jupyterlab`

Run by issuing `$ jupyter-lab`

We can define and print variables like this


In [3]:
a=10
print(a)

10


In [4]:
a="kalle"

In [5]:
print(a)

kalle


# Here is my notebook
I can do maths or chemistry inside it $H_2O$.

### Numpy

Numpy enambles vector and matrix calculations for python. 

Many matrix operations are as fast in numpy as in C or C++.

Example, to solve the equation system Ax=b
```python
import numpy as np

A=np.array([[3,1],[1,2]])
b=np.array([9,8])
x=np.linalg.solve(A,b)
```


In [6]:
import numpy as np

A=np.array([[3,1],[1,2]])
b=np.array([9,8])
x=np.linalg.solve(A,b)

In [9]:
b

array([9, 8])

In [10]:
import numpy as np

In [13]:
np.arange(12).reshape(3,4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### Pandas

Package that handles tabular data in python. The package emulates the data frame class of R.

An example, where we define a DataFrame from scratch:

```
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
df.to_markdown()
```
|    | Name   |   Age |
|---:|:-------|------:|
|  0 | Alex   |    10 |
|  1 | Bob    |    12 |
|  2 | Clarke |    13 | 


In [14]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])


In [16]:
df.to_markdown()

'|    | Name   |   Age |\n|---:|:-------|------:|\n|  0 | Alex   |    10 |\n|  1 | Bob    |    12 |\n|  2 | Clarke |    13 |'

|    | Name   |   Age |
|---:|:-------|------:|
|  0 | Alex   |    10 |
|  1 | Bob    |    12 |
|  2 | Clarke |    13 |

Seaborn has several datasets available for your amusment. They can be listed using `seaborn.get_dataset_names()`.

In [42]:
sns.get_dataset_names()



  gh_list = BeautifulSoup(http)


['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'exercise',
 'flights',
 'fmri',
 'gammas',
 'geyser',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'tips',
 'titanic']

In [41]:
import pandas as pd
import seaborn as sns

In [18]:
tips = sns.load_dataset("tips")

In [19]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [20]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [32]:
tips.loc[tips["sex"]=="Female"].describe().loc["25%"]

total_bill    12.75
tip            2.00
size           2.00
Name: 25%, dtype: float64

In [33]:
tips["tip_rate"]=tips["tip"]/tips["total_bill"]

In [35]:
tips.describe()

Unnamed: 0,total_bill,tip,size,tip_rate
count,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,0.160803
std,8.902412,1.383638,0.9511,0.061072
min,3.07,1.0,1.0,0.035638
25%,13.3475,2.0,2.0,0.129127
50%,17.795,2.9,2.0,0.15477
75%,24.1275,3.5625,3.0,0.191475
max,50.81,10.0,6.0,0.710345


# wide to long

Here is some code that could be used for converting wide to long DataFrame

In [36]:
import pandas as pd
import numpy as np
 
#Create a DataFrame
d = {
    'countries':['A','B','C'],
    'population_in_million':[100,200,120],
    'gdp_percapita':[2000,7000,15000]
    }
 
df = pd.DataFrame(d,columns=['countries','population_in_million','gdp_percapita'])
df 

Unnamed: 0,countries,population_in_million,gdp_percapita
0,A,100,2000
1,B,200,7000
2,C,120,15000


In [37]:
df2=pd.melt(df,id_vars=['countries'],var_name='metrics', value_name='values')
df2

Unnamed: 0,countries,metrics,values
0,A,population_in_million,100
1,B,population_in_million,200
2,C,population_in_million,120
3,A,gdp_percapita,2000
4,B,gdp_percapita,7000
5,C,gdp_percapita,15000


# long to wide
Here is some code that could be used for converting long to wide DataFrame

In [38]:
raw_data = {'patient': [1, 1, 1, 2, 2], 
        'obs': [1, 2, 3, 1, 2], 
        'treatment': [0, 1, 0, 1, 0],
        'score': [6252, 24243, 2345, 2342, 23525]} 
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df

Unnamed: 0,patient,obs,treatment,score
0,1,1,0,6252
1,1,2,1,24243
2,1,3,0,2345
3,2,1,1,2342
4,2,2,0,23525


In [39]:
df.pivot(index='patient', columns='obs', values='score')

obs,1,2,3
patient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,6252.0,24243.0,2345.0
2,2342.0,23525.0,


# executing external commands
Begin the row with an exclamation mark (!)

In [44]:
!wget https://opendata.ecdc.europa.eu/covid19/casedistribution/csv -O covid19.csv --no-check-certificate

--2021-06-17 10:26:07--  https://opendata.ecdc.europa.eu/covid19/casedistribution/csv
Resolving opendata.ecdc.europa.eu (opendata.ecdc.europa.eu)... 88.131.255.63
Connecting to opendata.ecdc.europa.eu (opendata.ecdc.europa.eu)|88.131.255.63|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/ [following]
--2021-06-17 10:26:07--  https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/
Reusing existing connection to opendata.ecdc.europa.eu:443.
HTTP request sent, awaiting response... 200 OK
Length: 4307302 (4,1M) [application/octet-stream]
Saving to: ‘covid19.csv’


2021-06-17 10:26:13 (764 KB/s) - ‘covid19.csv’ saved [4307302/4307302]



In [46]:
df = pd.read_csv("covid19.csv")

In [49]:
df.describe()

Unnamed: 0,day,month,year,cases,deaths,popData2019,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
count,61900.0,61900.0,61900.0,61900.0,61900.0,61777.0,59021.0
mean,15.628934,7.067157,2019.998918,1155.147237,26.05546,40987700.0,66.320586
std,8.841582,2.954776,0.032882,6779.224479,131.227055,153129400.0,162.32924
min,1.0,1.0,2019.0,-8261.0,-1918.0,815.0,-147.419587
25%,8.0,5.0,2020.0,0.0,0.0,1293120.0,0.757526
50%,15.0,7.0,2020.0,15.0,0.0,7169456.0,6.724045
75%,23.0,10.0,2020.0,273.0,4.0,28515830.0,52.572719
max,31.0,12.0,2020.0,234633.0,4928.0,1433784000.0,1900.83621
