# Machine Learning

Machine Learning is a branch of Artificial Intelligence and Computer Science. It studies algorithms, methods and techniques to "learn" from data. It uses optimisation techniques to find the "optimal" parameters for the function that approximates the observed data within a certain error range. The optimisation phase is that iterative process that "imitate" how humans learn through trial and error and gradual improvement.

It's based on fundamental Math like: Linear Algebra, Vector Calculus, Probability and Statistics.

It can be categorised based on different characteristics, such as:
- Type of analytics
- Modelling Approach
- Training/Learning approach
- Type of prediction
  - Type of classification
    - Classification Approach
    - Classification Output

## Type of analytics
- **Descriptive:** tells you what happened in the past.
- **Diagnostic:** helps you understand why something happened in the past.
- **Predictive:** predicts what is most likely to happen in the future.
- **Prescriptive:** recommends actions you can take to affect those outcomes.

### Descriptive Analytics

Descriptive analytics looks at data statistically to tell you what happened in the past. Descriptive analytics helps a business understand how it is performing by providing context to help stakeholders interpret information. This can be in the form of data visualizations like graphs, charts, reports, and dashboards.

How can descriptive analytics help in the real world? In a healthcare setting, for instance, say that an unusually high number of people are admitted to the emergency room in a short period of time. Descriptive analytics tells you that this is happening and provides real-time data with all the corresponding statistics (date of occurrence, volume, patient details, etc.).

### Diagnostic Analytics

Diagnostic analytics takes descriptive data a step further and provides deeper analysis to answer the question: Why did this happen? Often, diagnostic analysis is referred to as root cause analysis. This includes using processes such as data discovery, data mining, and drill down and drill through.

In the healthcare example mentioned earlier, diagnostic analytics would explore the data and make correlations. For instance, it may help you determine that all of the patients’ symptoms—high fever, dry cough, and fatigue—point to the same infectious agent. You now have an explanation for the sudden spike in volume at the ER.

### Predictive Analytics

Predictive analytics takes historical data and feeds it into a machine learning model that considers key trends and patterns. The model is then applied to current data to predict what will happen next.

Back in our hospital example, predictive analytics may forecast a surge in patients admitted to the ER in the next several weeks. Based on patterns in the data, the illness is spreading at a rapid rate.

### Prescriptive Analytics

Prescriptive analytics takes predictive data to the next level. Now that you have an idea of what will likely happen in the future, what should you do? It suggests various courses of action and outlines what the potential implications would be for each.

Back to our hospital example: now that you know the illness is spreading, the prescriptive analytics tool may suggest that you increase the number of staff on hand to adequately treat the influx of patients.

### In Summary

Both **descriptive** and **diagnostic** analytics look to the past to explain what happened and why it happened, while **predictive** and **prescriptive** analytics use historical data to forecast what will happen in the future and what actions you can take to affect those outcomes.

## Modelling approach
- **Deterministic:** deterministic models such as linear regression and decision tree, are based on precise inputs and produce the same output for a given set of inputs. These models assume that the future can be predicted with certainty based on the current state. They aim to find a fixed relationship between inputs and outputs. They provide interpretable models and are often utilized in scenarios where the data behaves predictably.
- **Stochastic:** stochastic models such as neural networks and random forests, incorporate randomness and uncertainty into the modeling process. They consider the probability of different outcomes and provide various possible results. They incorporate randomness and uncertainty into the modeling process. They capture complex patterns and relationships in the data, making them suitable for uncertain future scenarios.

### Deterministic Modeling Produces Constant Results
Deterministic modeling gives you the same exact results for a particular set of inputs, no matter how many times you re-calculate the model. Here, the mathematical properties are known. None of them is random, and there is only one set of specific values and only one answer or solution to a problem. With a deterministic model, the uncertain factors are external to the model.

### Stochastic Modeling Produces Changeable Results
Stochastic modeling, on the other hand, is inherently random, and the uncertain factors are built into the model. The model produces many answers, estimations, and outcomes—like adding variables to a complex math problem—to see their different effects on the solution. The same process is then repeated many times under various scenarios.

## Training approach

- Unsupervised
- Supervised
  - Regression
  - Classification
- Semi-supervised
- Self-supervised
- Reinforcement Learning

### Unsupervised

Unsupervised learning is used for descriptive/diagnostic tasks: clustering, visualisation, feature reduction.

### Supervised

Supervised learning is used for predictive tasks: classification (for labels/classes) and regression (for numerical).

### Semi-supervised

Semi-supervised learning combines both **supervised** and **unsupervised techniques**. It's used for predictive tasks when labelling data is expensive, difficult, almost impossible or even requires domain expertise which may not be available. The dataset is partially labelled, but not fully.

### Self-supervised

Self-supervised learning such as **Autoencoders** is in reality unsupervised. These type of models don't need an external ground truth, but they derive the ground truth from the underlying structure (latent features) of the input data. They still use a cost function during training.

### Reinforcement Learning

Reinforcement learning is the science of learning from actions. The outcome is for an agent to reach a goal within a set environment. This goal can be reached using a different combination of actions, this means that we can't provide labels or right answers that the system can learn from. Every decision that the system makes, changes the environment and in return, the environment provides a feeback composed by the new state of the environment and a score that will be used as a penalty. This penalty will guide the actions of the agent, because its main goal is to optimise the objective function, by minimising the cost of the actions.

References:

- https://www.ibm.com/topics/semi-supervised-learning
- https://towardsdatascience.com/supervised-semi-supervised-unsupervised-and-self-supervised-learning-7fa79aa9247c

## Type of prediction

- Regression
- Classification

### Type of classification

- Binary
- Multi-Class
- Multi-Label

Reference: https://machinelearningmastery.com/types-of-classification-in-machine-learning/

#### Classification approach

- Generative
- Discriminative

#### Classification Output
- Deterministic: these models provide a single prediction for a given input without providing any information about the uncertainty of the output.
- Probabilistic: these models provide a probabilistic characterization of the uncertainty in their predictions - a number between 0 and 1 - together with the output.