# The Fundamentals of Machine Learning
## Chapter 1 - The Machine Learning Landscape

### What Is Machine Learning?
Machine Learning is the science (and art) of programming computers so they can learn from data.

**Definitions:**

- *[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.* —Arthur Samuel, 1959
- *A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.* —Tom Mitchell, 1997

**Example:** A spam filter is a Machine Learning program that learns from examples of spam and ham emails.

- Task (T): flag spam for new emails
- Experience (E): the training data
- Performance (P): ratio of correctly classified emails (accuracy)


### Why Use Machine Learning?
- Rule-based systems (manual patterns) are complex and hard to maintain
- ML-based systems automatically learn and adapt to new data
- Useful for complex problems like speech recognition, computer vision, etc.

**Key advantages:**
- Handles complex problems with no clear algorithms
- Adapts to dynamic environments
- Finds hidden patterns (data mining)


### Example Applications
- Image classification (CNNs)
- Tumor detection (semantic segmentation)
- Text classification (NLP)
- Offensive comment filtering
- Text summarization
- Chatbots & personal assistants
- Game AI (Reinforcement Learning, e.g., AlphaGo)


### Supervised vs Unsupervised Learning
**Supervised Learning:**
- Training set has labels
- Tasks: classification, regression
- Algorithms: k-NN, Linear/Logistic Regression, SVMs, Decision Trees, Random Forests, Neural Networks

**Unsupervised Learning:**
- Training set has no labels
- Tasks: clustering, anomaly detection, dimensionality reduction, association rule learning
- Algorithms: K-Means, DBSCAN, PCA, t-SNE


### Semisupervised & Reinforcement Learning
**Semisupervised:**
- Few labeled + many unlabeled data
- Example: Google Photos face clustering

**Reinforcement Learning:**
- Agent learns by interacting with environment
- Gets rewards/penalties
- Example: AlphaGo, robotics


### Batch vs Online Learning
**Batch Learning:**
- Trained once, deployed
- Needs retraining for new data
- Costly for large datasets

**Online Learning:**
- Learns incrementally (mini-batches or one instance at a time)
- Adapts continuously
- Supports out-of-core learning for huge datasets


### Instance-Based vs Model-Based Learning
**Instance-Based:**
- Memorizes training data
- Classifies new data based on similarity

**Model-Based:**
- Builds a model from training data
- Uses it to make predictions
- Example: Linear Regression for GDP vs Life Satisfaction


In [None]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model

# Load the data
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", thousands=',', delimiter='\t',
                             encoding='latin1', na_values="n/a")

# Prepare the data (assuming prepare_country_stats is defined elsewhere)
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]

# Visualize the data
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()

# Select a linear model
model = sklearn.linear_model.LinearRegression()

# Train the model
model.fit(X, y)

# Make a prediction for Cyprus
X_new = [[22587]]  # Cyprus's GDP per capita
print(model.predict(X_new))  # Expected output ~5.96


### Challenges in Machine Learning
- Insufficient data
- Nonrepresentative data
- Poor-quality data
- Irrelevant features
- Overfitting & Underfitting
- Data mismatch between training & production
- No Free Lunch Theorem
