# 🏋 ex3 Python basics


## Getting help

You can access the help by executing the cell but getting help in a separate Python Console is less distracting.

In [None]:
help('list')  #Help on class list in module builtins
?list  #prints code documentation

## Python as a calculator

In [None]:
2 * 15 + 10

## Basic Types data structures

In [None]:
s1 = 'Hello'  #a string

s2 = " Python!"  #another string 

s3 = ''' Python
is fun'''  #a multiline string

a = 1  #int

b = 2.0  #float

c = True  #boolean true

d = False  #boolean false

print(s1 + s2 + s3)

print('a=' + str(a))

print('b is equal to', b, 'c and d are booleans equal to', c, 'and', d)

mylist = [1, 2.0, "3", '3', True, False]  #a list

print('mylist is', mylist)

mylist[0]  #indices start at 0!

#a dictionary map
mymap = {'setosa': '#a6cee3', 'versicolor': '#1f78b4', 'virginica': '#b2df8a'}

print('mymap is', mymap)

## Functions

In [None]:
def f(x):  #declare f
    return x * x

f(2)  #call f with argument 2

## Packages

[NumPy](http://www.numpy.org) (N-dimensional array objects) and [Pandas](http://pandas.pydata.org) (dataframes built with NumPy) are the most relevant for data visualizations.

In [None]:
import numpy as np  #import numpy
import pandas as pd  #import pandas

Create a numpy array

In [None]:
arr = np.arange(6)  #?np.arange: Return evenly spaced values within a given interval.
np.random.shuffle(arr)
display(arr.size)
arr

Create a panda dataframe from numpy array

In [None]:
df = pd.DataFrame(arr)  #create Pandas dataframe from arr data
df

Create a pandas dataframe with typed and named columns. Note that `pd.Categorical` is the equivalent of R `factors`.

In [None]:
df = pd.DataFrame({
    'A' : pd.Series([1, 2, 3, 4, 5, 6]),
    'B' : pd.Timestamp('20201001'),
    'C' : pd.Categorical(['male', 'female', 'female', 'female', 'male', 'male']),
    'D' : 'foo'})

display(df.C)

display(df.head())  #display first 5 rows
display(df.tail())  #last 5 rows

df.columns  #list columns names

Accessing dataframe elements

In [None]:
display(df.A)  #this is the preferred way to access column A
display(df.C[1])  #access by df[column = 'C'][row = 1]

Alternate way to access dataframe elements (sometimes needed, e.g., when creating a new column)

In [None]:
display(df['A'])  #access df[column = 'A']
display(df['C'][1])  #access df[column = 'C'][row = 1]

# Working with Data

## Load a dataset from a package

In [None]:
from sklearn import datasets
iris = datasets.load_iris()  #see the help: help(datasets)

df = pd.DataFrame(iris.data)  #create dataframe from iris.data
df.head()

## Load a dataset in CSV format

In [None]:
import pandas as pd  #import pandas

df = pd.read_csv("data/heart-decease-cleveland.csv")
df.head()

## List files in your drive:

In [None]:
!ls "./"

# Basic stats

## Descriptive statistics

In [None]:
df = pd.read_csv("data/heart-decease-cleveland.csv")
df.describe()

## Frequency table

In [None]:
arr = np.random.uniform(0, 100, 100)

df = pd.DataFrame({
    'Variable' : pd.Series(arr)
})

df['bin'] = pd.cut(df.Variable, [0, 20, 40, 60, 80, 100])
df = pd.value_counts(df.bin)  #count values for df.bin
df = df.to_frame('count').reset_index()
df = df.sort_values('index')
df['rf'] = df['count'] / len(df)
df['cf'] = df['rf'].cumsum()

df.columns = ['Value', 'No.', 'Rel. Freq.', 'Cum. Freq.']

df.reset_index(drop=True, inplace=True)

df

---

# Exercises

## 😜 Exercise 1 

- Create a dataframe for the values `0, 1, 1, 2, 2, 3, 4, 15`
- use `df.describe()` to compute descriptive statistics

In [None]:
import numpy as np
import pandas as pd



## 😜 Exercise 2 

- Load `heart-decease-cleveland.csv` in a dataframe
- use `df.describe()` to compute descriptive statistics of all the variables

In [None]:
import numpy as np
import pandas as pd



## 🤔 Exercise 3

- Load `heart-decease-cleveland.csv` in a dataframe
- Create a frequency table of the `chol` variable for the frequency ranges:

```
(120, 160]
(160, 200]
(200, 240]
(240, 280]
(280, 320]
(320, 360]
(360, 400]
(400, 440]
```
- set `columns` to `'Chol mg/cl', 'No.', 'Rel. Freq.', 'Cum. Freq.'`

In [None]:
import numpy as np
import pandas as pd
