# EVeMa 2018

![logo](assets/logo.jpg "Logo")

- Instructor: Žiga Emeršič.

- Authors: 
    - Saúl Calderón, Martín Solís, Ángel García, Blaž Meden, Felipe Meza, Juan Esquivel
    - Mauro Méndez, Manuel Zumbado. 

# Supervised Learning
Supervised learning can happen when we have both:
* independent attribute values or input variables $(X)$ and
* dependent attribute or output variables $(y)$.

We can then use various algorithms to derive the mapping function from the input to the output, defined as:

$$y = f(X)$$

What we do during training is that we constanly observe and compare predictions and ground-truth labels or values. $y$ maps the values of the independent variable to the dependent, with the error $\epsilon_i$. So,

$$ y_i = f(\vec{x_i}) + \epsilon_i $$


## Decision Trees

First we do feature selection, and based on that we build (train) a decision tree.

We can measure attribute purity with
* information gain, Gain-ratio,
<!--* distance measure, weight of
evidence, -->
* minimum description length (MDL),
* J-measure, Xi and G statistics,
* orthogonality of class distribution vectors (ORT),
* Gini-index, Relief, ReliefF, etc.

Let us do an example.

## k-Nearest Neighbor

An object is classified by a majority vote of its neighbours.

Object is being assigned to the class most common among its k
nearest neighbors,

k is a positive integer and its typically small.

If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.

## Support Vector Machine

A Support Vector Machine (SVM) model is a representation of the
examples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as wide as
possible. We can define training as the maximization of the margin on the hyperplane. The hyperplane enables us to model non-linear data in a linear problem space.

New examples are then mapped into that same space and predicted
to belong to a category based on which side of the gap they fall.

## Neural Networks
An Artificial Neural Network (ANN) is based on a collection of
connected units or nodes called artificial neurons (a simplified version
of biological neurons in brain).

Each connection (a simplified version of a synapse) between artificial
neurons can transmit a signal from one to another.

An artificial neuron that receives the signal can process its value and
then forwards the result to the artificial neurons connected to it.

http://playground.tensorflow.org

## Convolutional Neural Networks

Convolutional neural networks are a part of the so-called deep neural networks. The main difference between the traditional (deep) neural networks is that they perform convolutions and are therefore ideal for images and other data where row-column order is important.

For in-depth understanding of Convolutional neural networks we recommend: http://cs231n.github.io/

In the recent years CNN present state-of-the-art in recognition, object detection
and other computer-vision tasks.

However, as opposed to the classical approaches where underlying principles are
well understood (think about an arbitrary feature extractor or some classification
model), here we get a black-box solution that ''just works''. There is no explanation of the decisions. Similarly how humans have problems explaining e.g. "what a char is".

Some types of layers:
* Convolution Layer (no. of feature matrices, size of the matrices)
* Pooling Layer (window size, step)
* REctified Linear Units (ReLU) Layer
* Fully Connected Layer (no. of neurons)

An architecture consists of the definition of the elements in the brackets above, and the number, and the
order of layers compose the architecture of the CNN.

<img src="files/filters.png" width="80%">

### Example
Let's do the actual calculation!

<img src="files/x-o-examples.png" width="50%">

<img src="files/cnn.png">

### So ... how do we order "those" layers?

<img src="files/karpathy.png">

### How do CNNs learn?

Back-propagation is used to set weights:
* Errors are calculated based on ground-truths and the outputs of the fully
connected layer. We measure our unhappiness with outcomes such as this one
with a loss function (or sometimes also referred to as the cost function or the
objective).
* For each feature pixels weight is increased and decreased – the new prediction
is then again evaluated.
 _How do we know whether to increase of decrease the weight, and for how
much? Gradient descent, of course!_
* When we are satisfied with the weights (the errors reach the satisfyingly low values) the learning process is complete.

### Again to the Overfitting Problem ...

What are our options?

<img src="files/augmentation.png">

In [None]:
### 

## The classical pipeline of the classification

An example of the classification pipeline through ear recognition. Let us draw the pipeline in more detail and discuss the components we need.

<img src="files/pipeline.png">

Authors: *Saul Calderon, Angel García, Blaz Meden, Ziga Emersic, Felipe Meza, Juan Esquivel, Martín Solís, Mauro Mendez, Manuel Zumbado*