<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/marco-canas/taca/blob/main/prop/unidad_didac/visualizacion/2_seaborn/visualizing_categorical.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
</table>

# Chapter 1 – The Machine Learning landscape

When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator, depending on who you ask.   

Cuando la mayoría de las personas escuchan “Aprendizaje automático”, se imaginan un robot: un mayordomo confiable o un Terminator mortal, según a quién le pregunten.

But
Machine Learning is not just a futuristic fantasy; it’s already here. 

In fact, it has been around for decades in some specialized applications, such as
Optical Character Recognition (OCR). 

But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: the spam filter. 

It’s not
exactly a self-aware Skynet, but it does technically qualify as Machine
Learning (it has actually learned so well that you seldom need to flag an
email as spam anymore). It was followed by hundreds of ML applications
that now quietly power hundreds of products and features that you use
regularly, from better recommendations to voice search.


Where does Machine Learning start and where does it end? What exactly
does it mean for a machine to learn something? If I download a copy of
Wikipedia, has my computer really learned something? Is it suddenly
smarter? In this chapter we will start by clarifying what Machine Learning
is and why you may want to use it.
Then, before we set out to explore the Machine Learning continent, we
will take a look at the map and learn about the main regions and the most
notable landmarks: supervised versus unsupervised learning, online versus
batch learning, instance-based versus model-based learning. Then we will
look at the workflow of a typical ML project, discuss the main challenges
you may face, and cover how to evaluate and fine-tune a Machine
Learning system.


This chapter introduces a lot of fundamental concepts (and jargon) that
every data scientist should know by heart. It will be a high-level overview
(it’s the only chapter without much code), all rather simple, but you should make sure everything is crystal clear to you before continuing on to the
rest of the book. So grab a coffee and let’s get started!

Este cuaderno contiene los ejemplos de código del capítulo 1.

También encontrará las soluciones de los ejercicios al final del cuaderno.

El resto de este cuaderno se utiliza para generar `lifesat.csv` a partir de las fuentes de datos originales y algunas de las figuras de este capítulo.

Le invitamos a revisar el código en este cuaderno si lo desea, pero la verdadera acción comienza en el próximo capítulo.


## Setup

Python 3.7 is required:


In [3]:
import sys
assert sys.version_info >= (3,7) 

Haga que la salida de este cuaderno sea estable en todas las ejecuciones:

In [1]:
import numpy as np 
np.random.seed(42)

Scikit-Learn ≥1.0 is required:

In [6]:
import sklearn
assert sklearn.__version__ >= '1.0'

Definamos los tamaños de fuente predeterminados, para trazar figuras bonitas:

In [7]:
import matplotlib.pyplot as plt

plt.rc('font', size=12)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=12)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

# Code example 1-1

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

# Download and prepare the data
data_root = "https://github.com/ageron/data/raw/main/"
lifesat = pd.read_csv(data_root + "lifesat/lifesat.csv")
X = lifesat[["GDP per capita (USD)"]].values
y = lifesat[["Life satisfaction"]].values

# Visualize the data
lifesat.plot(kind='scatter', grid=True,
             x="GDP per capita (USD)", y="Life satisfaction")
plt.axis([23_500, 62_500, 4, 9])
plt.show()

# Select a linear model
model = LinearRegression()

# Train the model
model.fit(X, y)

# Make a prediction for Cyprus
X_new = [[37_655.2]]  # Cyprus' GDP per capita in 2020
print(model.predict(X_new)) # outputs [[6.30165767]]

Replacing the Linear Regression model with k-Nearest Neighbors (in this example, $k = 3$) regression in the previous code is as simple as replacing these two lines:

In [2]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

## Referentes  

* Cuaderno del primer capítulo de Geron: https://github.com/ageron/handson-ml3/blob/main/01_the_machine_learning_landscape.ipynb  

