# Crash course on Python
Python is a general purpose programming language and one of the most popular
programming languages. It is widely used in the scientific community because
of its ease of use and simple syntax. Python also offers a large collection of
libraries that help to solve complex problems easily and build strong system
and data application.

[PEP8](https://realpython.com/python-pep8/) is one of the standards that
specifies guidelines and best practices on how to write Python code

This week is introductory week where we are learning the foundations of machine learning and developing an understanding of what it means for a machine to learn from data. Therefore, we are not going to train any machine learning models yet. 

The aim of this activity is to help you refresh your knowledge of Python Variables and Datatypes, Operators (Arithmetic, Logical, Comparison, Assignment, Membership, Conditional Statements, Iteration (for and while loops), Functions, Objects and Classes, and important Python Libraries for ML (Numpy, Matplotlib, Seaborn, Pandas, and Scikit-Learn.

You are required to do three things:
1. Carefully review the code in each cell and execute it to see the output. 
2. Read reference material(s) where provided.
3. Complete small tasks to help you practice the learned concepts.

---
## Variables and Datatypes
Python has different data types, here is the most common ones

| Data Types   | Examples            | Explanation          | Mutable? |
| ------------ | ------------------- | -------------------- | -------- |
| Strings      | "Hi!", '1.3'        | Text                 | No       |
| Integers     | 49                  | Whole numbers        | No       |
| Floats       | 3.14                | Decimal Numbers      | No       |
| Booleans     | True, False         | Truth values         | No       |
| Lists        | \[1, 'a', [1.5, 2]\]| A collection of data | Yes      |
| Tuples       | (1, 2, 3, 4, 5)     | A collection of data | Yes      |
| Dictionaries | {"a": 1, "b": True} | A collection of data | Yes      |

In [1]:
# Assigning data to variables
var_1 = "Hello World"
var_2 = 254
var_3 = 25.43
var_4 = ["Anna", "Bella", "Cora"]
var_5 = {'Course': 'ML', 'Grade': 'A'} # key: value

# You can access List elements with their index
# and access Dict elements with their key
print('Student:', var_4[0], 'got', var_5['Grade'])

# When in doubt you can always check data types
print('Variables data types')
print('var_1', type(var_1))
print('var_2', type(var_2))
print('var_3', type(var_3))
print('var_4', type(var_4))
print('var_5', type(var_5))

Student: Anna got A
Variables data types
var_1 <class 'str'>
var_2 <class 'int'>
var_3 <class 'float'>
var_4 <class 'list'>
var_5 <class 'dict'>


Using list and tuple

In [None]:
x = [1, 2, 3]  # creating a list
y = x  # assigning a reference
y[0] += 1
# The change is in both x and y because both point to the same object
print(x, y)

In [None]:
x = (1, 2, 3)  # creating a tuple
y = x
# y[0] += 1 gives error because it's immutable
y += (4, 5)  # a new tuple is generated to
print(x, y)

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print('x size:', len(x))
print('The second element in x:', x[1])
print('The last element in x:', x[-1])
print('The first 3 elements in x:', x[:3])  # also x[0: 3] works

### Task 1:
Print the last 3 elements of the list x

In [None]:
# Solve here

## Operators
[Reference](https://www.geeksforgeeks.org/python-operators/)

When dealing with data, knowing python operators and how they work can make
your code smaller and more efficient.

We will look at the operator that might be new to you.
Refer to this [reference](https://www.geeksforgeeks.org/python-operators/)
for all the operations.

| Operator | Description                                               | Syntax        |
| -------- | --------------------------------------------------------- | ------------- |
| //       | Division (floor): divides the first operand by the second | x//y          |
| **       | Power: Returns first raised to power second               | x**y          |
| is       | True if the operands are identical                        | x**y          |
| is not   | True if the operands are not identical                    | x**y          |
| in       | True if value is found in the sequence                    | x**y          |
| not in   | True if value is not found in the sequence                | x**y          |
| Ternary  | testing a condition in a single line                      | x if a else y |

In [None]:
# //	Division (floor): divides and floor the output
a = 10
b = 3
div = a / b
div_floor = a // b

print(div, div_floor)

In [None]:
# negative division could be counter intuitive
print (5//2)
print (-5//2) # floor(-2.333) gives the integer smaller -> -3


In [None]:
# Power
pwr = a ** b
print(pwr)

In [None]:
# is, is not
c = a
print(a is not b)
print(a is c)

In [None]:
# in, not in
x, y = 24, 20
l = [10, 20, 30, 40, 50]

print(x not in l)
print(y in l)


In [None]:
# Ternary operator
if a < b:
    print(a)
else:
    print(b)

# [on_true] if [expression] else [on_false]
minimum = a if a < b else b
print(minimum)

## Iteration
The 2 main loops in python are `while` and `for` loops

[Reference](https://www.geeksforgeeks.org/loops-in-python/)

In [None]:
# While loop
count = 0
while count < 3:
    count = count + 1
    print("count = ", count)

In [None]:
# Combining else with while to execute something after the loop
count = 0
while count < 3:
    count = count + 1
    print("count = ", count)
else:
    print("In the else. count = ", count)

[`range()`](https://docs.python.org/3/library/functions.html#func-range) is
a built-in generator function that is used to generate numbers in a given range.

It takes up to 3 parameter `start`, `stop`, `step`.
If not given,`start` defaults to `0`, and `step` to `1`.

Examples:
- `range(0, 10, 1)` -> `[0, 1, 2, ..., 9]`. note that `stop` isn't included.
- `range(0, 10, 2)` -> `[0, 2, 4, 6, 8]`. even numbers.
- `range(0, 10)` -> Only `start` and `stop`. same as `range(0, 10, 1)`.
- `range(10)` -> Only `stop`. same as `range(0, 10, 1)`.
- `range(10, 0, -1)` -> `[10, 9, 8, ..., 1]`.

In [None]:
# range for loop
my_list = [10, 20, 30, 40]
print('looping over list elements by index using "rang()"')
for i in range(0, len(my_list)):
    print(my_list[i])

print('looping over list elements using "in"')
for element in my_list:
    print(element)

In [None]:
my_list = [10, 20, 30, 40, 50, 60, 70, 80]
print('looping over list elements by index using "rang()"')
for i in range(0, len(my_list)):
    if i == 5:  # break the loop after 5 iterations
        break
    elif i%2 == 1:  # skipping odd indices
        continue
    print(my_list[i])

### Task 2:
Make a list that have all the integers from 0 to 99 that are divisible by 4 but
not divisible by 6.

Hint: `my_list.append(a)` adds `a` to the end of the `my_list`.

In [None]:
# Solve here


## Functions
Functions is very useful to use a block of code multiple time.

In [None]:
def my_function_name():
    print('Hi from a function')

my_function_name()
my_function_name()

In [None]:
def double(x):
    return x * 2

print(double(3))

## Classes
Python is OOP langauge. Here are how classes are used

[Reference](https://www.geeksforgeeks.org/python-classes-and-objects/)

In [None]:
class Dog:

    # A simple class
    # attribute
    attr1 = "mammal"
    attr2 = "dog"

    # A sample method
    def fun(self):
        print("I'm a", self.attr1)
        print("I'm a", self.attr2)

# Object instantiation
Rodger = Dog()

# Accessing class attributes
# and method through objects
print(Rodger.attr1)
Rodger.fun()

---
## Install external modules
Python community offers a huge variety of modules to eliminate the need for
writing codes from scratch.
[pip](https://pypi.org/project/pip/) is a package installer for Python. You
can use pip to install packages from the Python Package Index and other indexes.

To install a package you can run this command in your terminal:
- Unix/macOS: `python3 -m pip install <package name>` or `pip3 install <package name>`
- Windows: `py -m pip install <package name>` or `pip install <package name>`

[Further reading](https://pip.pypa.io/en/stable/cli/pip_install/)
on pip usage and how to write and install requirements files.

You can also run terminal command from your jupyter notebook using
[magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

To do so, you put `!` before your command so that jupyter understand executing
it in the terminal.

In [None]:
# https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-sx
!pip install numpy

---
## Numpy
While python is a powerful, it is very slow compared to C/C++ since it's and
interpreted language with dynamic typing. Numpy is python library that is
implemented in C/C++. giving us the ease of development of python with the
performance of C. depending on the operation numpy could be up to 100x faster.
Moreover, it has various functions to perform linear algebra and array operations.

[Reference](https://numpy.org/doc/stable/user/absolute_beginners.html)

In [None]:
import numpy as np

In [None]:
my_np_list = np.random.randint(low=-100, high=100, size=20)
print(my_np_list)

Here is some of the most common used function

In [None]:
print('shape:', my_np_list.shape)
print('sum:', my_np_list.sum())
print('min:', my_np_list.min())
print('max:', my_np_list.max())
print('abs:', np.abs(my_np_list))
my_np_list.sort()
print('after .sort():', my_np_list)
print('doubling the array', my_np_list * 2) # broadcasting

### Task 3:
Print the mean, variance and standard deviation of `my_np_list`

In [None]:
# Solve here


In [None]:
# 2D arrays
x = np.array(
    [
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12],
    ],
    np.int32,
)
print(x)
print(f'{x.shape = }')
print('accessing element:', x[1, 2])
print('slice of an array:', x[0:2, 2:3])
print('row:', x[0, :])
print('column:', x[:, 0])

In [None]:
# speed
from time import time

st = time()
my_list = [1] * 10000000
sum(my_list)
list_time = time()-st

st = time()
my_np_list = np.ones(10000000)
my_np_list.sum()
np_time = time()-st

print('numpy is', list_time/np_time, ' times faster than list')

### Task 4
Install the following packages: `matplotlib`, `pandas`, `seaborn` and `sklearn`

In [None]:
# Solve here


---
## Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python.

[Reference](https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline # to plot in side the notebook

In [None]:
# Line plot
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='Hours studied (s)', ylabel='voltage (mV)',
       title='Change in volt over time')
ax.grid()
plt.show() # Show the plot

In [None]:
# different plots for categorical data
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]

plt.figure(figsize=(9, 5))

plt.subplot(131)  # To creat 1 by 3 grid and plot in the 1st subplot
plt.bar(names, values)
plt.subplot(132)  # plot in the 2nd subplot
plt.scatter(names, values)
plt.subplot(133)  # plot in the 3nd subplot
plt.plot(names, values)
plt.suptitle('Categorical Plotting')
plt.show()

---
## Pandas
[Pandas](https://pandas.pydata.org/docs/) is a fast, powerful, flexible and
easy to use open source data analysis and manipulation tool.


In [None]:
import pandas as pd


In [None]:
df = pd.read_csv('petrol_consumption.csv')
df.info()

In [None]:
df.head(5)

In [None]:
df.describe()


In [None]:
# access a columns
print(df['Petrol_tax'].head())
print(df[['Average_income', 'Paved_Highways']].head())

In [None]:
# access a rows
df.iloc[[1, 2, 4]]

In [None]:
# dropping a column
new_df = df.drop(['Average_income', 'Paved_Highways'], axis=1)
new_df.head()

In [None]:
# Drop a row by index
new_df = df.drop([0, 1])
new_df.head()

In [None]:
# pandas support many operations similar to numpy
print('Sum of columns\n', df.sum(), '\n')
print('Sum of all the data frame\n', df.sum().sum(), '\n')
print('Mean of columns\n', df.mean(), '\n')
print('Converting dataframe to numpy array\n', type(df.to_numpy()))

---
## Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides
a high-level interface for drawing attractive and informative statistical graphics.
Usually used with pandas.

[Reference](https://seaborn.pydata.org/examples/index.html)
[Examples](https://www.geeksforgeeks.org/python-seaborn-tutorial/)

In [None]:
import seaborn as sns

df = sns.load_dataset('tips')

graph = sns.FacetGrid(df, col ="sex",  hue ="day")
graph.map(plt.scatter, "total_bill", "tip", edgecolor ="w").add_legend()

plt.show()

---
## SKLearn

[SKLearn](https://scikit-learn.org/stable/modules/classes.html) is a simple
and efficient tools for predictive data analysis.
We will use it for classification, regression and clustering algorithms.

Usually datasets have two parts, data(or features) and targets(or labels).
We train our machine learning model to predict the target of a sample given
its features.

In [None]:
# sklearn has some datasets inside it to learn and test on them
from sklearn import datasets
iris = datasets.load_iris()
print(f'Dataset shape: {iris.data.shape = }, {iris.target.shape = }')
print(f'Example of a sample: features: {iris.data[0]}, classification: {iris.target[0]}')

SKLearn gives a simple API to build models. Mainly it consists of two steps:
- Model creation: where we choose the model and set its parameters
- Fitting the data: Where we give our model the training data to learn from
- Prediction: where we can predict the label of a sample

`svm` is a prediction model(we will study how it works later in the course).

Here is an example on how to use an SKLearn model:

In [None]:
from sklearn import svm
# setting the parameters needed by the algorithm
clf = svm.SVC(gamma=0.001, C=100.)
# Training on all the data except the last one
clf.fit(iris.data[:-1], iris.target[:-1])
# predicting the last sample
print('Our prediction', clf.predict(iris.data[-1:]), 'True value:', iris.target[-1:])