# Módulo 1: introducción
## 1.1. Introducción al Machine Learning

Let's suppose you want to sell a car. Depending on your car's make, model, age, mileage, etc., you can sell it for a certain price. An expert can look at the car's features and determine a price based on them. In other words, the expert took data and extracted patterns from that data.

DATA -> EXPERTO -> PATRÓN ... o lo que es lo mismo ... DATA -> MACHINE LEARNING -> PATRÓN

Machine Learning is a technique which allows us to build models that extract these patterns from data, just like the expert in our example.

* **Features** are the characteristics of the data we've got (year, make, mileage, etc).
* The **target** is a feature we want to predict.
* A **model** is a "description of statistical patterns" that predicts a target given some input. Models are trained with algorithms that take some input features as well as reference targets for those features. The algorithms then extract patterns that calculate the target given the feature inputs within some error margin, and those patterns are stored in the model.

Once we've trained a model, we can use it to process new completely original input and predict the target for the input's features.

DEMOSTRACIÓN

Nuestros datos se tratan de una colección de trazas de tráfico entre distintos switches SDN. Una de las características (features) es el PER (Packet Error Rate), que nos proporciona una buena imagen general acerca de cómo es el tráfico entre los switches, por lo que se convierte en el target. Entrenaremos los datos (features + target) para obtener un modelo mediante Machine Learning. Una vez obtenido el modelo, lo utilizaremos para realizar predicciones de PER utilizando el resto de features. Se trataría al final de una ecuación:

$$features + modelo = PER$$

## 1.2. ML vs Ruled-based systems

In the traditional programming paradigm, the developer defines how a system will behave by defining specific rules. However, for complex or ever-changing behaviors, this method can become unsustainable or even impossible.

For example: we can try to create a spam filter by using specific rules, such as filtering words, blocking certain senders, etc., but human language is so complex and spam changes so quickly that it's impossible to keep up and our filter would be obsolete inmediately and would never work with acceptable effectiveness.

ML offers a solution to this issue:

* We can gather data (in our example, emails, both regular email and spam) to create a dataset.
* We can define and calculate the features which are relevant to our dataset and the problem we're trying to solve.
* Finally, we can train and use a model which is able to recognize the patterns that distinguish regular email from spam, allowing us to act on it by filtering spam.

ML does not necessary discard all Rule-Based Systems. We could use (some of) the rules defined on a Rule-Based System and use them as features for our ML model. Following the spam filter example: a feature could be whether the sender is from a specific domain, or whether the subject contains certain words.

Essentially, ML is a paradigm shift compared to traditional programming. Traditional programming follows this structure:

$$data + code = outcome$$

But ML changes this equation and becomes like this:

$$data + outcome = model$$

And the resulting model allows us to replace code in the original equation:

$$data + model = outcome$$

# 1.3. Supervised Machine Learning

In Supervised Machine Learning (SML) there are always labels associated with certain features. The model is trained, and then it can make predictions on new features. In this way, the model is taught by certain features and targets.

* Feature matrix (X): made of observations or objects (rows) and features (columns).
* Target variable (y): a vector with the target information we want to predict. For each row of X there's a value in y.

The model can be represented as a function g that takes the X matrix as a parameter and tries to predict values as close as possible to y targets. The obtention of the g function is what it is called training.

$$g(X) = y$$

siendo X la matriz features, g es la función modelo que aplicada a la matriz features, devuelve y (vector target).

Types of SML problems:

* Regression: the output is a number (car's price).
* Classification: the output is a category (spam example).
    * Binary: there are two categories.
    * Multiclass problems: there are more than two categories.
* Ranking: the output is the top scores associated with corresponding items. It is applied in recommender systems.

En nuestro ejemplo de SDN, intentaremos predecir el tráfico en una red basándonos en un parámetro numérico (PER), por lo que se tratará de un modelo de regresión.

In summary, SML is about teaching the model by showing different examples, and the goal is to come up with a function that takes the feature matrix as a parameter and makes predictions as close as possible to the y targets.

# 1.4. CRISP-DM

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model. Conceived in 1996, it became a European Union project under the ESPRIT funding initiative in 1997. The project was led by five companies: Integral Solutions Ltd (ISL), Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company:

1. Business understanding: an important question is why do we need ML for the project. The goal of the project has to be measurable.
2. Data understanding: analyze available data sources, and decide if more data is required.
3. Data preparation: clean data, remove noise applying pipelines, and convert the data to a tabular format, so we can put it into ML.
4. Modeling: train different models and choose the best one. Considering the results of this step, it is proper to decide if it is required to add new features or fix data issues.
5. Evaluation: measure how well the model is performing and if it solves the business problem.
6. Deployment: roll out to production to all the users. The evaluation and deployment often happen together - online evaluation.

It is important to consider how well maintainable the project is. In general, ML projects require many iterations.

Iteration:

1. Start simple.
2. Learn from the feedback.
3. Improve.

En nuestro problema de SND: utilizaremos un modelo para realizar predicciones de tráfico, basándonos en una feature (PER, delay, jitter...). De esta forma, evaluaremos el modelo


# 1.5. Model Selection Process

Which model to choose?

* Logistic regression
* Decision tree
* Neural Network
* Or many others

The validation dataset is not used in training. There are feature matrices and y vectors for both training and validation datasets. The model is fitted with training data, and it is used to predict the y values of the validation feature matrix. Then, the predicted y values (probabilities) are compared with the actual y values.

Multiple comparisons problem (MCP): just by chance one model can be lucky and obtain good predictions because all of them are probabilistic.

The test set can help to avoid the MCP. Obtaining the best model is done with the training and validation datasets, while the test dataset is used for assuring that the proposed best model is the best.

1. Split datasets in training, validation, and test. E.g. 60%, 20% and 20% respectively.
2. Train the models with the training data.
3. Evaluate the models comparing the result of the model between training dataset and validation dataset.
4. Select the best model.
5. Apply the best model to the test dataset.
6. Compare the performance metrics of validation and test.

NB: Note that it is possible to reuse the validation data. After selecting the best model (step 4), the validation and training datasets can be combined to form a single training dataset for the chosen model before testing it on the test set.

In [None]:
print('Hello world')