# Machine Learning - An introduction


## Structure of  the class  
___

### Part I: Machine Learning
  - Basics & Theory
    - What's this all about?
    - Supervised, Unsupervised Learning
    - Classification, Regression, Clustering
  - Python Basics for Data Science / ML
    - numpy, scipy
    - scikit learn
    - pytorch
  - Hands-on examples
    - Classification
        * Churn prediction
        * Digit classification
    - Regression example
        * Estimating House Prices
    - Clustering
        - kmeans document clustering


  - What is Machine Learning anyway?
  - Some examples


# Machine Learning


A (very informal) definition: Build machines that learn through examples (data) and generalize to unseen data


Examples we are all using in a daily basis:

      - Car plate detection systems
      - spam email classifier  
      - voice recognition in smartphones
      - face recognition @ facebook 
      - amazon recomendations  
      

Aim of this series is to build the foundations and principles to be able to implement some of this systems from scratch. 


### Introduction and definitions 



#### Supervised Learning
   
    Here, the data comes with (a fixed number of) labels  
    
      
      - a dataset of emails and for every email a flag if it is spam or not spam 
      - photos of animals, where for each photo we have a label if it is a cat or a dog (assume for a moment there is only one animal per photo) 
      - fraud detection: a dataset with valid and fraudulent transation examples
      
#### Unsupervised Learning
    
     In the unsupervised learning case, data comes without labels 
     
     - a collection of documents
     - a collection of photographs
     - timeseries data
   
   
   
   
#### Regression (Παλλινδρόμηση)

      In regression, the outcome we are trying to predict is or can be thought as real number (for example house prices, demand etc)    

#### Classification (Ταξινόμηση)
    
    In classification the target variable is one of many categories. For example (cat, dog, bird), (spam, non-spam), (fraud, non-fraud)

#### Classification 

The repeating pattern in supervised learing is that data comes with labels and we wish to build a system to be able to predict the label given a new data example. For example, if we are building a pet detection system, we would like the system to answer that this is a photo of a cat with very high accuracy, eg predict the class of of the new image to be cat and not dog or bird.


<b><span style="color:red">WARNING: MATHS AHEAD</span></b>

Without being too formal, we would like to learn from the data a function $f(x) \to D $, where $x \in X$ is the data and D the set of the labels to be predicted.

This function must have the property to be able to produce correct results in new, unseen data. That is, if we take a photo of a random cat somewhere in the world and we feed this cat in to our system, the system should be able to respond that this is a cat although it has never seen this cat before. We call this fuction property "Gereralisation" and is the most important aspect of machine learning systems: to be able to perform well in unseen data.


Given the above, we need the following ingredients to build a machine learning system
- The function we are going to be using to model our data (usually called hypothesis) 
- A way to measure how "wrong" is this function and a configuration of its parameters, given our data
- A way to train this function, eg to modify the parameters in such a way that the error is minimised 
- A way to test this function to new, unseen data and make a claim of how well we expect this function to behave in unseen data. 

Similarly, for the regression part, instead of trying to predict a label we now try to predict a real value number. 

Again we need a hypothesis, a loss function, a way to tune the hypothesis and a way to see how well we are generalizing to unseen data.


Let's see that in practice. 


#### Linear Regression

One of  the simplest parametric models in statistics and machine learning is Linear Regression. 

Linear Regression tries to find a best straight line to fit the data. 


##### Linear Regression Example

![alt text](./images/lr.png "Linear Regression Example")



We are given a data set D of values  $\{ (x_i,y_i) \}$ and our hypothesis is that there is a straight line 
$y = h(x) = w_0 + w_1 * x$ that fits the data. 

We are now asked to find the "best" weights w that fit our data.

A typicall way to define "best" is to try to minimize the square difference between our targets y and what the model predicts, $\hat{y} = f(x)$,  eg minimize the total loss 

$$ L = \sum_i{  ( y_i - \hat{y_i})^2 } = \sum_i{  ( y_i - \ w_0 + w_1*x_i )^2 }$$ 

There a few ways to do that, either look for a closed form analytic expression or use an optimisation algorithm such as gradient descent. 


In [6]:
!ls ./images

lr.png


# Python & libraries


Most of the state of the art ML stacks are python-based. There are several reasons for that but the most important is the high expressiveness of the language. 



## Build-in functions
Python by default has a number of build-in functions: print, max, type, 


In [11]:
print("test")

test


In [12]:
type("test")

str

In [13]:
max( 3,4,5,1)

5

## Basic Data types

In [15]:
i  =1 
type(i)

int

In [16]:
d = 0.3
type(d)

float

In [17]:
text = "this is a string"
type(text)

str

## Basic Data Structures

In [19]:
l = [1, 2, 3, 4 ]
type(l)

list

In [17]:
d = { "a": 1, 
      "b": 2,
    }
type(d)
d["a"], d.get("A")

(1, None)

In [14]:
a = {1, 2}
type(a) 
1 in a, 3 in a 

(True, False)

## importing packages

In [23]:
import math
print( math.sqrt(10))

3.1622776601683795


The python ecosystem is super rich: We can find libraries for pretty much anything we like.
Typically, the Data Science set consists of a set of wildly adopted libraries for data processing, linear algebra, scientific computing etc. 


In [27]:
import numpy as np
nums = np.random.randn( 10,10 )
nums

array([[ 1.06961965, -0.01718427, -0.31092343, -0.25623881, -0.16897694,
         0.38263021, -0.54922722,  0.16550105, -0.5555095 , -1.32079901],
       [ 0.82652078, -0.66949556, -0.91897743,  1.64098086, -0.67009731,
        -0.89543717, -0.7810434 ,  1.19158959,  1.32941825, -1.69378422],
       [ 1.66409208,  0.82858088, -0.72284334, -3.03137515, -0.72785845,
        -1.92513859, -0.40429778, -2.48304637,  1.57185654,  0.63986022],
       [ 0.86703596, -0.25284487, -0.83240088,  1.98792819,  0.40177488,
         0.2709155 , -0.43810125,  1.9159267 , -0.03900944, -0.54842681],
       [ 0.00495438,  0.40650785,  0.33682255,  0.22093528,  1.95971636,
         0.19618212, -0.43602122,  1.13467653,  1.53770081,  1.44022592],
       [-0.78506845, -0.04177141,  0.53483787,  1.43017555, -0.31305419,
        -1.27595963, -1.12651079, -0.46874791, -1.08536199,  0.72346403],
       [-0.27987282,  0.24706891, -0.40950309, -1.52316642, -1.60114369,
         0.29199042,  0.99566715,  0.08294286

## loops 




In [31]:
for i in range(4):
    print(i)

0
1
2
3


In [32]:
for i in range(5,8):
    print(i)

5
6
7


In [34]:
for i in range(0,10,2):
    print(i)

0
2
4
6
8


In [37]:
L  = [1,2,3,4,5]
for i in L:
    print(i)

1
2
3
4
5


In [38]:
L  = [1,2,3,4,5]
for idx,elem in enumerate(L):
    print(f"index {idx} has value {elem}")

index 0 has value 1
index 1 has value 2
index 2 has value 3
index 3 has value 4
index 4 has value 5


## List comprehensions and a touch of functional programming

In [41]:
L = [1,2,3,5]

L2 = []
for i in L:
    L2.append( i**2 )
L2

[1, 4, 9, 25]

In [44]:
L3 = [ i**2 for i in L]
L3

[1, 4, 9, 25]

In [48]:
L4 = map( lambda x: x**2, L)
L4 = list( L4 )
L4

[1, 4, 9, 25]

## numpy 

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

In [2]:
import numpy as np

### vector operations

In [6]:
x = np.ones( 10)
y = np.ones( 10)

(array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))

## dot product

The dot (inner) product is a crusial operation in Machine Learning as it used to quantify vector similarity

$ <x,y> = \sum{x_i \cdot y_i} $

In [10]:
x.dot(y)

10.0

### cosine similarity 

Generally, when we need to compare vectors their length is not important. Rather, we are mostly interested in vectors that have the same direction. A standard way to quantify similarity of vectors then is their angle: the smaller the angle the most similar the vectos are. If the angle is zero, the vectors are "identical" since they are pointing to the same direction.

Casting this to a similarity measure is typically done by using the cosine of the angle between the two vectors. 


the cosine of two vectors is their dot product, normalized by the vector lengths:

$$ cos( u , v ) = \frac{ < u , v > } { ||u|| ||v|| }  =  \frac{ u^T v }{ ||u|| ||v|| }  $$ 


In [22]:
!pwd

/Users/kostas/code/projects/mylectures/oteacademy/day1
