# Introduction to Machine Learning


Lesson Goals

In this lesson you will start to learn about Machine Learning, more concretely:

    You will learn what Machine Learning does.
    You will learn about the main types of Machine Learning.
    You will learn what types of problems you can apply Machine Learning to.



# Introduction

Machine Learning is a fundamental part of Data Science, and of Artificial Intelligence as well. It is an important technology for leveraging data to make informed data-driven decisions, and for creating business value from datasets. Machine Learning is a field that builds on contributions from disciplines like Probability Theory, Computer Science, Statistics, Information Theory, and Symbolic Reasoning. If we think about machine learning in the simplest terms, we can think of two competing concepts. On the one hand we have our data. This data represents the ground truth. However, this data has no explanatory power. We are not able to make future predictions with this data. On the other hand we have our machine learning model. This model simplifies reality using a mathematical algorithm. The model has explanatory power but also causes a loss of the granular information contained in the original dataset. Our job as machine learning experts is to minimize the loss of information that happens when we use a model.



# What is Machine Learning?

Machine Learning is the capability that a software may have to learn from experience. For instance, the software used to compute the payroll of a small business does not typically have the capability of learning; but the software used by a car insurance company to estimate the cost of premiums usually has the capability of Machine Learning to differentiate cases of high risk from cases of low risk.

Experience is typically provided in the form of a dataset, that contains inherent patterns or regularities hidden in the data. In the car insurance case, that dataset may be contained in a historical database, containing customer profiles, claim history, pricing, policies, a temporal series of risk scores, and others. This database contains inherent patterns that Machine Learning can discover and which provide business value. For instance, Machine Learning may be used to find the profiles of customers of car insurance most likely to purchase home insurance from the same provider. This information has business value since it can help to improve the targeting of cross selling campaigns.

The output of the Machine Learning process is the knowledge learned, expressed as a Machine Learning (trained) model. A Machine Learning model refers to an entity that uses learned knowledge to solve new problems. This entity is also known as an estimator.

Machine Learning is a fundamental part of both Data Science, and Artificial Intelligence. You will gain a better understanding of Machine Learning by looking at it from both perspectives.
The Data Science Perspective

The preceding use case of Machine Learning in the insurance sector, can be seen as a problem in Inferential statistics, where each existing car insurance customer is assigned a score measuring the propensity of acquiring home insurance from the same company. This score is actually the inferred "a posteriori" probability provided by a model trained on a historical database containing data from both past customers that purchased home insurance after having purchased car insurance from the same provider and from those who didn't.
The Artificial Intelligence Perspective

There has been an ongoing controversy regarding the definition of Artificial Intelligence, although the following seems to be a good consensus definition.

Artificial Intelligence (AI) is the capability of computer systems to

    Learn knowledge from experience, and
    Use that knowledge to solve problems.

Criterion number 1 is telling us that Machine Learning is a requisite ingredient of AI systems.

In our car insurance example, the knowledge learned from the past experience contained in the historical database will be used to categorize current and future car insurance clients as good or bad targets for the home insurance cross selling campaign.

Both perspectives provide different views of the same reality: Machine Learning finds regularities or patterns from datasets that can be incorporated into decision making software to improve solutions by leveraging data-driven criteria.



# Types of Machine Learning

In your professional practice, you will face many different data analytics projects. Different projects may ask for different Machine Learning approaches or solutions. It will be very helpful to know the different types of Machine Learning available when evaluating the suitability of candidate approaches.
Supervised Learning

Supervised learning is used when we use an input dataset to predict a certain output. Typically we use a dataset to generate the algorithm called a training dataset. We validate the algorithm using a test dataset.

Supervised learning algorithms can be divided into the following groups:


# Classification

In classification problems, Machine Learning is used to classify instances into a predetermined set of classes. In our running example about car insurance, a classification task for Machine Learning would be to learn a model based on current and former customers and classify new incoming car insurance customers as high, medium, or low value depending on the expected propensity to also acquire home insurance in the future. Note that in this example the classes have symbolic identifiers (e.g. "high") and are disjoint and discrete.


# Regression

In contrast to classification, in regression problems the target attribute has a continuous (real valued) range. In our car insurance example, we might use Machine Learning to estimate the cost of a car repair from a set of photographs of the damaged parts. The output of the model would be the expected dollar cost of repairing the damage. Note that in this case, the output of the model would be a real-valued number instead of a symbol, and thus the range would be continuous as opposed to discrete.


# Prediction

In prediction you are assuming that past history is a valid indication of future events. In our car insurance example, estimating the probability of cross selling home insurance to current car insurance customers is a prediction problem. We are estimating the probability that a certain event will happen in the future, for instance, within the next year. Another typical example is to predict the value of the USD / EUR exchange rate (how many dollars you buy with one euro) tomorrow, given the time series of the last year.



# Unsupervised Learning

Unsupervised learning is used when we want to find patterns in a dataset in a situation where we don't have a defined output that we want to predict using an input. An example of an unsupervised learning problem is customer clustering. Many marketing companies want to create targeted advertising campaigns. In order to do this more effectively, companies cluster their customers and find the similarities in the groups and then target them accordingly.


# Reinforcement Learning

Reinforcement learning is an area in machine learning that enables a machine to learn through a process of trial and error. Instead of providing the algorithm with labeled training examples, we provide the algorithm with feedback (either positive or negative) after every action. The applications of reinforcement learning are slightly different from supervised and unsupervised learning and include examples like teaching a computer how to play a game.


# Problems Solved by Machine Learning

After learning some concepts about Machine Learning, now you are probably wondering what problems can I solve with this technology? Here we explain the most common types of problems that you will be solving with Machine Learning in the future.