# What Is Machine Learning?



Machine learning is the science (and art) of programming computers so they can learn from data.

- More general definition:

> Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.
>
> —Arthur Samuel, 1959

- More engineering-oriented one:

> A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
>
> —Tom Mitchell, 1997


## Machine Learning Example

Your spam filter is a machine learning program that, given examples of : **spam emails (flagged by users)** , **regular emails (nonspam, also called “ham”)**


### Machine Learning Terminology in the Context of Spam Filtering

- **Task (T):** Flagging spam for new emails.

- **Experience (E):** The training data, which consists of examples of spam emails (flagged by users) and regular emails (nonspam, also called "ham").

- **Performance measure (P):** The ratio of correctly classified emails. This particular performance measure is called accuracy, and it is often used in classification tasks.



- **Training set:** The examples that the system uses to learn . 

- **Training instance (or sample):** Each training example. 

- **Model:** The part of a machine learning system that learns and makes predictions. 

# Why Use Machine Learning?


## Spam Filter  (Traditional Programming .VS Machine Learning)

### 1. Consider how you would write a spam filter using traditional programming techniques

1. First you would examine what spam typically looks like. 
    - You might notice that some words or phrases (such as “4U”, “credit card”, “free”, and “amazing”) tend to come up a lot in the subject line. 
    - Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and other parts of the email.


2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns were detected.


3. You would test your program and repeat steps 1 and 2 until it was good enough to launch.


<img src="images/1.png" width="800" >


Since the problem is difficult, your program will likely become a long list of complex
rules—pretty hard to maintain

### 2. Spam filter based on machine learning techniques

In contrast, a spam filter based on machine learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the ham examples. The program is much shorter, easier to maintain, and most likely more
accurate.

<img src="images/2.png" width="800" >

### What if spammers notice that all their emails containing “4U” are blocked?
- They might start writing “For U” instead.

#### A spam filter using traditional programming techniques 
- would need to be updated to flag “For U” emails. 
- If spammers keep working around your spam filter, you will need to keep writing new rules forever.

#### A spam filter based on machine learning techniques 
- In contrast, a spam filter based on machine learning techniques automatically notices that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention.

<img src="images/3.png" width="800" >

## Speech Recognition

Another area where machine learning shines is for problems that either are too complex for traditional approaches or have no known algorithm. For example, Say you want to start simple and write a program capable of distinguishing the words “one” and “two”.

### Traditional  programming techniques

- You might notice that the word “two” starts with a high-pitch sound (“T”), 
- so you could hardcode an algorithm that measures high-pitch sound intensity and use that to distinguish ones and twos
- but obviously this technique will not scale to thousands of words spoken by millions of very different people in noisy environments and in dozens of languages. 

### Machine learning techniques
- The best solution (at least today) is to write an algorithm that learns by itself, given many example recordings for each word.

### Finally, machine learning can help humans learn (Figure 1-4).

- ML models can be inspected to see what they have learned. 
- For instance, once a spam filter has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or new trends, and thereby lead to a better understanding of the problem.

- Digging into large amounts of data to discover hidden patterns is called **data mining**, and machine learning excels at it.

<img src="images/4.png" width="800" >

### To summarize, machine learning is great for:
- Problems for which existing solutions require a lot of fine-tuning or long lists of rules
- Complex problems for which using a traditional approach yields no good solution 
- Fluctuating environments (a machine learning system can easily be retrained on new data, always keeping it up to date)
- Getting insights about complex problems and large amounts of data

# Examples of Applications


### 1. Analyzing images of products on a production line to automatically classify them
This is image classification, typically performed using convolutional neural networks (CNNs; see Chapter 14) or sometimes transformers (see Chapter 16).

### 2. Detecting tumors in brain scans
This is semantic image segmentation, where each pixel in the image is classified (as we want to determine the exact location and shape of tumors), typically using CNNs or transformers.

### 3. Automatically classifying news articles
This is natural language processing (NLP), and more specifically text classification, which can be tackled using recurrent neural networks (RNNs) and CNNs, but transformers work even better (see Chapter 16).

### 4. Automatically flagging offensive comments on discussion forums
This is also text classification, using the same NLP tools. 

### 5. Summarizing long documents automatically
This is a branch of NLP called text summarization, again using the same tools.

### 6. Creating a chatbot or a personal assistant
This involves many NLP components, including natural language understanding (NLU) and question-answering modules.

### 7. Forecasting your company’s revenue next year, based on many performance metrics
This is a regression task (predicting values) that may be tackled using any regression model, such as 
- Linear regression or polynomial regression model (see Chapter 4), 
- Regression support vector machine (see Chapter 5), 
- Regression random forest (see Chapter 7), 
- An artificial neural network (see Chapter 10). 
- If you want to take into account sequences of past performance metrics, you may want to use RNNs, CNNs, or transformers (see Chapters 15 and 16).

### 8. Making your app react to voice commands
This is speech recognition, which requires processing audio samples: since they are long and complex sequences, they are typically processed using RNNs, CNNs,or transformers (see Chapters 15 and 16).

### 9. Detecting credit card fraud
This is anomaly detection, which can be tackled using isolation forests, Gaussian mixture models (see Chapter 9), or autoencoders (see Chapter 17).

### 10. Segmenting clients based on their purchases so that you can design a different marketing strategy for each segment
This is clustering, which can be achieved using k-means, DBSCAN, and more (see Chapter 9).

### 11. Representing a complex, high-dimensional dataset in a clear and insightful diagram
This is data visualization, often involving dimensionality reduction techniques (see Chapter 8).

### 12. Recommending a product that a client may be interested in, based on past purchases
This is a recommender system. One approach is to feed past purchases (and other information about the client) to an artificial neural network (see Chapter 10), and get it to output the most likely next purchase. This neural net would typically be trained on past sequences of purchases across all clients.

### 13. Building an intelligent bot for a game
This is often tackled using reinforcement learning (see Chapter 18), which is a branch of machine learning that trains agents (such as bots) to pick the actions that will maximize their rewards over time (a bot may get a reward every time the player loses some life points), within a given environment (such as the game). The famous AlphaGo program that beat the world champion at the game of Go was built using RL.

This list could go on and on, but hopefully it gives you a sense of the incredible breadth and complexity of the tasks that machine learning can tackle, and the types of techniques that you would use for each task.


# Types of Machine Learning Systems

There are so many different types of machine learning systems that it is useful to classify them in broad categories, based on the following criteria:

- How they are supervised during training (supervised, unsupervised, semisupervised, self-supervised, and others)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like scientists do (instance-based versus model-based learning)

These criteria are not exclusive; you can combine them in any way you like. For example, a state-of-the-art spam filter may learn on the fly using a deep neural network model trained using human-provided examples of spam and ham; this makes it
an online, model-based, supervised learning system. Let’s look at each of these criteria a bit more closely.


# Training Supervision
ML systems can be classified according to the amount and type of supervision they get during training. There are many categories, but we’ll discuss the main ones:

## Supervised learning
- In supervised learning, the training set you feed to the algorithm includes the desired solutions, called **labels** .

<img src="images/5.png" width="800" >

- A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails.

- Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.). This sort of task is called **regression** To train the system, you need to give it many examples of cars, including both their features and their targets (their prices). 

- Note that some regression models can be used for classification as well, and vice versa. For example, logistic regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (20% chance of being spam).

<img src="images/6.png" width="800" >


The words **target** and **label** are generally treated as synonyms in supervised learning, but target is more common in regression tasks and label is more common in classification tasks. Moreover, **features** are sometimes called **predictors** or **attributes**. These terms may refer to individual samples ( “this car’s mileage feature is equal to 15,000”) or to all samples (e.g., “the mileage feature is strongly correlated with price”).

## Unsupervised learning
In unsupervised learning, as you might guess, the training data is **unlabeled**. The system tries to learn without a teacher.

- Say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors. it finds those connections without your help. 
    - it might notice that 40% of your visitors are teenagers who love comic books and generally read your blog after school,
    - while 20% are adults who enjoy sci-fi and who visit during the weekends. 
- If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.


<img src="images/7.png" width="800" >
<img src="images/8.png" width="800" >



### Visualization algorithms are also good examples of unsupervised learning:
- You feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted. 

- These algorithms try to preserve as much structure as they can so that you can understand how the data is organized and perhaps identify unsuspected patterns.

- A related task is **dimensionality reduction**, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one. 
    - For example, a car’s mileage may be strongly correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called **feature extraction**.
    
<img src="images/9.png" width="800" >

It is often a good idea to try to reduce the number of dimensions in your training data using a dimensionality reduction algorithm before you feed it to another machine learning algorithm (such as a supervised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform better.


### Anomaly Detection
- another important unsupervised task is anomaly detection—for example, detecting unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learning
algorithm. 

- The system is shown mostly normal instances during training, so it learns to recognize them; then, when it sees a new instance, it can tell whether it looks like a normal one or whether it is likely an anomaly . 

- A very similar task is novelty detection: it aims to detect new instances that look different from all instances in the training set. This requires having a very “clean” training set, devoid of any instance that you would like the algorithm to detect. For example, if you have thousands of pictures of dogs, and 1% of these pictures represent Chihuahuas, then a novelty detection algorithm should not treat new pictures of Chihuahuas as novelties.

- On the other hand, anomaly detection algorithms may consider these dogs as so rare and so different from other dogs that they would likely classify them as anomalies (no offense to Chihuahuas).

<img src="images/10.png" width="800" >


Finally, another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to one another.

## Semi-supervised learning
- Since labeling data is usually time-consuming and costly, you will often have plenty of unlabeled instances, and few labeled instances. Some algorithms can deal with data that’s partially labeled. This is called **semi-supervised learning** .
<img src="images/11.png" width="800" >


- Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just add one label per person3 and it is able to name everyone in every photo, which is useful for searching photos.


- Most semi-supervised learning algorithms are combinations of unsupervised and supervised algorithms. For example, a clustering algorithm may be used to group similar instances together, and then every unlabeled instance can be labeled with the most common label in its cluster. Once the whole dataset is labeled, it is possible to use any supervised learning algorithm.


## Self-supervised learning
Another approach to machine learning involves actually generating a fully labeled dataset from a fully unlabeled one. Again, once the whole dataset is labeled, any supervised learning algorithm can be used. This approach is called **self-supervised learning.**

For example, if you have a large dataset of unlabeled images, you can randomly mask a small part of each image and then train a model to recover the original image. During training, the masked images are used as the inputs to the model, and the original images are used as the labels.

<img src="images/12.png" width="800" >


- The resulting model may be quite useful in itself—for example, to repair damaged images or to erase unwanted objects from pictures. But more often than not, a model trained using self-supervised learning is not the final goal. You’ll usually want to tweak and fine-tune the model for a slightly different task—one that you actually care about.

</br>

- For example, suppose that what you really want is to have a pet classification model: given a picture of any pet, it will tell you what species it belongs to. If you have a large dataset of unlabeled photos of pets, you can start by training an image-repairing model using self-supervised learning. Once it’s performing well, it should be able to distinguish different pet species: when it repairs an image of a cat whose face is masked, it must know not to add a dog’s face. Assuming your model’s architecture allows it (and most neural network architectures do), it is then possible to tweak the model so that it predicts pet species instead of repairing images. The final step consists of fine-tuning the model on a labeled dataset: the model already knows what cats, dogs, and other pet species look like, so this step is only needed so the model can learn the mapping between the species it already knows and the labels we expect from it.

</br>

- Transferring knowledge from one task to another is called transfer learning, and it’s one of the most important techniques in machine learning today, especially when using deep neural networks (neural networks composed of many layers of neurons).

</br>


- Some people consider self-supervised learning to be a part of unsupervised learning, since it deals with fully unlabeled datasets. But self-supervised learning uses (generated) labels during training, so in that regard it’s closer to supervised learning. And the term “unsupervised learning” is generally used when dealing with tasks like clustering, dimensionality reduction, or anomaly detection, whereas self-supervised learning focuses on the same tasks as supervised learning: mainly classification and regression. In short, it’s best to treat self-supervised learning as its own category.


## Reinforcement learning
Reinforcement learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards.
It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

<img src="images/13.png" width="800" >


- For example, many robots implement reinforcement learning algorithms to learn how to walk. DeepMind’s AlphaGo program is also a good example of reinforcement learning: it made the headlines in May 2017 when it beat Ke Jie, the number one ranked player in the world at the time, at the game of Go. It learned its winning policy by analyzing millions of games, and then playing many games against itself. Note that learning was turned off during the games against the champion; AlphaGo was just applying the policy it had learned. As you will see in the next section, this is called offline learning.


# Batch Versus Online Learning

Another criterion used to classify machine learning systems is whether or not the system can learn incrementally from a stream of incoming data.

## Batch learning

- In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called **offline learning.**
</br>

- Unfortunately, a model’s performance tends to decay slowly over time, simply because the world continues to evolve while the model remains unchanged. This phenomenon is often called **model rot** or **data drift**. The solution is to regularly retrain the model on up-to-date data. How often you need to do that depends on the use case: if the model classifies pictures of cats and dogs, its performance will decay very slowly, but if the model deals with fast-evolving systems, for example making predictions on the financial market, then it is likely to decay quite fast. 

</br>

- Even a model trained to classify pictures of cats and dogs may need to be retrained regularly, not because cats and dogs will mutate overnight, but because cameras keep changing, along with image formats, sharpness, brightness, and size ratios. Moreover, people may love different breeds next year, or they may decide to dress their pets with tiny hats—who knows?

</br>

- If you want a batch learning system to know about new data (such as a new type of spam), you need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data), then replace the old model with the new one. Fortunately, the whole process of training, evaluating, and launching a machine learning system can be automated fairly easily (as we saw in Figure 1-3), so even a batch learning system can adapt to change. Simply update the data and train a new version of the system from scratch as often as needed.

</br>

- This solution is simple and often works fine, but training using the full set of data can take many hours, so you would typically train a new system only every 24 hours or even just weekly. If your system needs to adapt to rapidly changing data (predict stock prices), then you need a more reactive solution.

</br>

- Also, training on the full set of data requires a lot of computing resources (CPU, memory space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and you automate your system to train from scratch every day, it will end up costing you a lot of money. If the amount of data is huge, it may even be impossible to use a batch learning algorithm.
</br>


- Finally, if your system needs to be able to learn autonomously and it has limited resources (a smartphone application or a rover on Mars), then carrying around large amounts of training data and taking up a lot of resources to train for hours every day is a showstopper.

A better option in all these cases is to use algorithms that are capable of learning incrementally.

## Online learning
- In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or in small groups called **mini-batches**. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives

<img src="images/14.png" width="800" >

</br>

- Online learning is useful for systems that need to adapt to change extremely rapidly (to detect new patterns in the stock market). It is also a good option if you have limited computing resources; for example, if the model is trained on a mobile device.
</br>

- Additionally, online learning algorithms can be used to train models on huge datasets that cannot fit in one machine’s main memory (this is called **out-of-core learning**). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data.
<img src="images/15.png" width="800" >
</br>

- One important parameter of online learning systems is how fast they should adapt to changing data: this is called the **learning rate**. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data (and you don’t want a spam filter to flag only the latest kinds of spam it was shown). 
</br>


- Conversely, if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of nonrepresentative data points (outliers).

</br>

- **Out-of-core learning** is usually done offline (not on the live system), so online learning can be a confusing name. Think of it as incremental learning.
</br>

- A big challenge with online learning is that if bad data is fed to the system, the system’s performance will decline, possibly quickly (depending on the data quality and learning rate). If it’s a live system, your clients will notice. For example, bad data could come from a bug (a malfunctioning sensor on a robot), or it could come from someone trying to game the system (spamming a search engine to try to rank high in search results). To reduce this risk, you need to monitor your system closely and promptly switch learning off (and possibly revert to a previously working state) if you detect a drop in performance. You may also want to monitor the input data and react to abnormal data; for example, using an anomaly detection algorithm

# Instance-Based Versus Model-Based Learning



One more way to categorize machine learning systems is by how they generalize.
Most machine learning tasks are about making predictions. This means that given a
number of training examples, the system needs to be able to make good predictions
for (generalize to) examples it has never seen before. Having a good performance
measure on the training data is good, but insufficient; the true goal is to perform well
on new instances.
There are two main approaches to generalization: instance-based learning and
model-based learning.
Instance-based learning
Possibly the most trivial form of learning is simply to learn by heart. If you were to
create a spam filter this way, it would just flag all emails that are identical to emails
that have already been flagged by users—not the worst solution, but certainly not the
best.
Instead of just flagging emails that are identical to known spam emails, your spam
filter could be programmed to also flag emails that are very similar to known spam
emails. This requires a measure of similarity between two emails. A (very basic)
similarity measure between two emails could be to count the number of words they
have in common. The system would flag an email as spam if it has many words in
common with a known spam email.
This is called instance-based learning: the system learns the examples by heart, then
generalizes to new cases by using a similarity measure to compare them to the
learned examples (or a subset of them). For example, in Figure 1-16 the new instance
would be classified as a triangle because the majority of the most similar instances
belong to that class.
Figure 1-16. Instance-based learning
Types of Machine Learning Systems | 21
Model-based learning and a typical machine learning workflow
Another way to generalize from a set of examples is to build a model of these
examples and then use that model to make predictions. This is called model-based
learning (Figure 1-17).
Figure 1-17. Model-based learning
For example, suppose you want to know if money makes people happy, so you
download the Better Life Index data from the OECD’s website and World Bank stats
about gross domestic product (GDP) per capita. Then you join the tables and sort by
GDP per capita. Table 1-1 shows an excerpt of what you get.
Table 1-1. Does money make people happier?
Country GDP per capita (USD) Life satisfaction
Turkey 28,384 5.5
Hungary 31,008 5.6
France 42,026 6.5
United States 60,236 6.9
New Zealand 42,404 7.3
Australia 48,698 7.3
Denmark 55,938 7.6
Let’s plot the data for these countries (Figure 1-18).
22 | Chapter 1: The Machine Learning Landscape
4 By convention, the Greek letter θ (theta) is frequently used to represent model parameters.
Figure 1-18. Do you see a trend here?
There does seem to be a trend here! Although the data is noisy (i.e., partly random),
it looks like life satisfaction goes up more or less linearly as the country’s GDP per
capita increases. So you decide to model life satisfaction as a linear function of GDP
per capita. This step is called model selection: you selected a linear model of life
satisfaction with just one attribute, GDP per capita (Equation 1-1).
Equation 1-1. A simple linear model
life_satisfaction = θ0 + θ1 × GDP_per_capita
This model has two model parameters, θ0 and θ1.4 By tweaking these parameters, you
can make your model represent any linear function, as shown in Figure 1-19.
Figure 1-19. A few possible linear models
Types of Machine Learning Systems | 23
Before you can use your model, you need to define the parameter values θ0 and θ1.
How can you know which values will make your model perform best? To answer this
question, you need to specify a performance measure. You can either define a utility
function (or fitness function) that measures how good your model is, or you can define
a cost function that measures how bad it is. For linear regression problems, people
typically use a cost function that measures the distance between the linear model’s
predictions and the training examples; the objective is to minimize this distance.
This is where the linear regression algorithm comes in: you feed it your training
examples, and it finds the parameters that make the linear model fit best to your data.
This is called training the model. In our case, the algorithm finds that the optimal
parameter values are θ0 = 3.75 and θ1 = 6.78 × 10–5.
Confusingly, the word “model” can refer to a type of model (e.g.,
linear regression), to a fully specified model architecture (e.g., linear
regression with one input and one output), or to the final trained
model ready to be used for predictions (e.g., linear regression with
one input and one output, using θ0 = 3.75 and θ1 = 6.78 × 10–5).
Model selection consists in choosing the type of model and fully
specifying its architecture. Training a model means running an
algorithm to find the model parameters that will make it best fit the
training data, and hopefully make good predictions on new data.
Now the model fits the training data as closely as possible (for a linear model), as you
can see in Figure 1-20.
Figure 1-20. The linear model that fits the training data best
24 | Chapter 1: The Machine Learning Landscape
5 It’s OK if you don’t understand all the code yet; I will present Scikit-Learn in the following chapters.
You are finally ready to run the model to make predictions. For example, say you
want to know how happy Cypriots are, and the OECD data does not have the
answer. Fortunately, you can use your model to make a good prediction: you look up
Cyprus’s GDP per capita, find $37,655, and then apply your model and find that life
satisfaction is likely to be somewhere around 3.75 + 37,655 × 6.78 × 10–5 = 6.30.
To whet your appetite, Example 1-1 shows the Python code that loads the data,
separates the inputs X from the labels y, creates a scatterplot for visualization, and
then trains a linear model and makes a prediction.5
Example 1-1. Training and running a linear model using Scikit-Learn
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
# Download and prepare the data
data_root = "https://github.com/ageron/data/raw/main/"
lifesat = pd.read_csv(data_root + "lifesat/lifesat.csv")
X = lifesat[["GDP per capita (USD)"]].values
y = lifesat[["Life satisfaction"]].values
# Visualize the data
lifesat.plot(kind='scatter', grid=True,
x="GDP per capita (USD)", y="Life satisfaction")
plt.axis([23_500, 62_500, 4, 9])
plt.show()
# Select a linear model
model = LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[37_655.2]] # Cyprus' GDP per capita in 2020
print(model.predict(X_new)) # output: [[6.30165767]]
Types of Machine Learning Systems | 25
If you had used an instance-based learning algorithm instead, you
would have found that Israel has the closest GDP per capita to that
of Cyprus ($38,341), and since the OECD data tells us that Israelis’
life satisfaction is 7.2, you would have predicted a life satisfaction
of 7.2 for Cyprus. If you zoom out a bit and look at the two
next-closest countries, you will find Lithuania and Slovenia, both
with a life satisfaction of 5.9. Averaging these three values, you get
6.33, which is pretty close to your model-based prediction. This
simple algorithm is called k-nearest neighbors regression (in this
example, k = 3).
Replacing the linear regression model with k-nearest neighbors
regression in the previous code is as easy as replacing these lines:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
with these two:
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(n_neighbors=3)
If all went well, your model will make good predictions. If not, you may need to
use more attributes (employment rate, health, air pollution, etc.), get more or betterquality
training data, or perhaps select a more powerful model (e.g., a polynomial
regression model).
In summary:
• You studied the data.
• You selected a model.
• You trained it on the training data (i.e., the learning algorithm searched for the
model parameter values that minimize a cost function).
• Finally, you applied the model to make predictions on new cases (this is called
inference), hoping that this model will generalize well.
This is what a typical machine learning project looks like. In Chapter 2 you will
experience this firsthand by going through a project end to end.
We have covered a lot of ground so far: you now know what machine learning is
really about, why it is useful, what some of the most common categories of ML
systems are, and what a typical project workflow looks like. Now let’s look at what can
go wrong in learning and prevent you from making accurate predictions.
26 | Chapter 1: The Machine Learning Landscape
6 For example, knowing whether to write “to”, “two”, or “too”, depending on the context.
7 Peter Norvig et al., “The Unreasonable Effectiveness of Data”, IEEE Intelligent Systems 24, no. 2 (2009): 8–12.

# Main Challenges of Machine Learning



# Insufficient Quantity of Training Data



# Nonrepresentative Training Data



# Poor-Quality Data



# Irrelevant Features



# Overfitting the Training Data



# Underfitting the Training Data



# Stepping Back



# Testing and Validating



# Hyperparameter Tuning and Model Selection



# Data Mismatch



# Exercises