# Overview of Machine Learning for Food Sciences

## What is Machine Learning?

Probably you should have heard a lot about "Machine Learning" so far, and fairly you might have asked yourself: What does it mean? Machine Learning lies in the intersection between computer science and statistics/mathematics. It uses algorithmic concepts from computer science together with mathematical models to describe the data and be able to find meaningful patterns present. Thus, the main goal of Machine Learning is to extract "meaning" from data. This "meaning" is in practice represented as a mathematical equation that best describes the data. In a philosophical perspective, Machine Learning is used to get knowledge from data. By using this extracted knowledge it is possible to make useful predictions.

Although sometimes it might not be evident, machine learning algorithms are vastly part of our lives. Some simple examples in your everyday lives would be the face-detection features in our smartphones, the voice assistants in our electronic devices, spam filters in our emails, etc. Other areas were the usage of machine learning is gathering momentum is medicine: machine learning algorithms can detect diseases with an accuracy similar to that of doctors, they can also be used to predict efficiency of different drug combinations which normally is an extremely time-consuming process.

[Figure 1](#ml-model) shows a high level view of the process of building a machine learning model. As you can see, there are 3 big parts: *the input data*, *the process of building the model* and the *predicted output*. The **input data** is fed into the model so that it can learn a good mathematical representation of it. The **process of building the model** consist of several  steps. When we build the model, we do not give it all the data that we have. We split the data into the training data and the test data. The reasons will become evident in [the train-test split section below](#train-test-split). The machine learning algorithm describes a series of steps used to mathematically approximate the data. The algorithm is trained on the train data. As you can see, there is a cycle between the training data and the machine learning algorithm (ML algorithm). This happens because we iteratively train the model by checking how it performs on the training data. When it reaches an optimal performance on the training data, we assume that it is ready and we use the test data, which until this point the model has never seen, to be able to get a sense on how it would perform on the real world, thus giving the **predicted outputs**. 

<a id="ml-model"></a>
<img src="images/ml_model.jpg" alt="Machine Learning Model">
<center><figcaption><em>Fig 1: Building a machine learning model</em></figcaption></center>


## Where Machine Learning is Used in Food Sciences?

## Supervised vs Unsupervised Learning

Generally, there are 4 types of machine learning algorithms: *supervised learning*, *unsupervised learning*, *semi-supervised learning* and *reinforcement learning*. In this series of tutorials we would explore supervised and unsupervised learning algorithms.

### Supervised Learning

**Supervised Learning** - in this setting we aim to build a model that will learn the data the best and will be able to predict future values. It is the same as building a mathematical equation or formula with many input variables in order to be able to derive the desired output variable. The data points that the model uses to learn already have the corresponding outputs. This is how the model is able to derive a connection between inputs and outputs. There are two types of problems in the supervised setting: *regression* and *classification* settings. In **regression**, the output that the model learns and then tries to predict is a continuous value (e.g learning age, height of people, etc). In **classification**, the output that the model learns and then tries to predict is a categorical value (e.g a class from a finite number of classes like the whether a tumor cell is benign or malignant). 


In the case of regression, after the model learns from the data, when we use it, it will output a value similar to what it saw during the training phase. [Figure 2](#sup_lear_reg) illustrates the process.

<a id="sup_lear_reg"></a>
<img src="images/supervised_learning_regression.jpg" alt="Supervised learning regression">
<center><figcaption><em>Fig 2: Regression</em></figcaption></center>

In the case of classification, after the model learns from the data and is ready to be used, when we give it a new, unseen sample it will output a class or a category from the set of categories that it saw during the training phase. [Figure 3](#sup_lear_cls) illustrates the idea.

<a id="sup_lear_cls"></a>
<img src="images/supervised_learning_classification.jpg" alt="Supervised learning classification">
<center><figcaption><em>Fig 3: Classification</em></figcaption></center>

We will study all the steps in this process in future sections.

### Unsupervised Learning

**Unsupervised Learning** - 

## Datasets Used during the Tutorials

## Data Processing for Machine Learning Methods

### Train-test Split

### Standardization

### Outlier detection

### Data Quality Control