# Introduction to Machine Learning

## What is Machine Learning?

Machine Learning (ML) is a method of teaching computers to learn patterns and make decisions from data instead of being explicitly programmed. Think of it like teaching a child to recognize animals not by writing out rules but by showing pictures and naming them.

---

## A Brief History of Machine Learning

- **1950s**: Alan Turing asked, "Can machines think?" leading to the famous Turing Test.
- **1959**: Arthur Samuel coined the term "Machine Learning."
- **1980s**: Neural networks gained attention.
- **2010s**: Massive data, improved hardware, and algorithms led to today's AI boom.

**Anecdote**: Just like a student needs good notes and practice to pass exams, machines need lots of data and smart algorithms to make good predictions.

---

## Key Terminology

| Term          | Meaning                                                       |
| ------------- | ------------------------------------------------------------- |
| Algorithm     | A set of steps the computer follows to learn from data        |
| Model         | The final product after training, used for making predictions |
| Training Data | Data the model learns from                                    |
| Test Data     | Data used to check how well the model performs                |
| Features      | Input variables                                               |
| Labels        | Output variable (what we want to predict)                     |

---

## Types of Machine Learning

1. **Supervised Learning**
2. **Unsupervised Learning**
3. **Reinforcement Learning**

---

## Supervised Learning Techniques

- Involves labeled data (we know the correct answer)

**Examples**:

- Predicting house prices
- Classifying emails as spam or not

### Example:

In [10]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.cluster import KMeans
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Input (feature)
X = np.array([[1], [2], [3], [4], [5]])
# Output (label)
y = np.array([2, 4, 6, 8, 10])

model = LinearRegression()
model.fit(X, y)
print(model.predict([[6]]))  # Predicts value for y = 12

[12.]


**Explanation**:

- We give the model `X` and `y` to learn the pattern.
- It learns that `y = 2x`.
- Then it predicts the value for `y = 12`.

---

## Unsupervised Learning Techniques

- No labels. The model finds patterns by itself.

**Examples**:

- Customer segmentation
- Topic discovery in documents

### Example:

In [11]:
# Data without labels
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])

model = KMeans(n_clusters=2)
model.fit(X)
print(model.labels_)  # Which cluster each point belongs to

[0 0 0 1 1 1]


**Explanation**:

- It finds two groups based on similarity.
- No guidance needed!

---

## Reinforcement Learning

- Learns by trial and error.

**Analogy**: Teaching a dog tricks by giving treats. The dog learns what actions lead to rewards.

**Applications**:

- Game playing (e.g. AlphaGo)
- Robotics

---



In [12]:
import gym
import numpy as np
import random

# 1. Create the environment
env = gym.make("FrozenLake-v1", is_slippery=False)  # slippery=False to make it easier

# 2. Initialize Q-table
q_table = np.zeros((env.observation_space.n, env.action_space.n))

# 3. Set parameters
num_episodes = 1000
learning_rate = 0.1
discount_factor = 0.99
exploration_rate = 1.0
min_exploration = 0.01
exploration_decay = 0.001

# 4. Training Loop
for episode in range(num_episodes):
    state = env.reset()[0]
    done = False

    while not done:
        # 4a. Choose action (explore or exploit)
        if random.uniform(0, 1) < exploration_rate:
            action = env.action_space.sample()  # Explore
        else:
            action = np.argmax(q_table[state])  # Exploit (best known)

        # 4b. Take action
        new_state, reward, done, _, _ = env.step(action)

        # 4c. Update Q-table
        old_value = q_table[state, action]
        next_max = np.max(q_table[new_state])

        # Q-learning formula
        q_table[state, action] = old_value + learning_rate * (reward + discount_factor * next_max - old_value)

        state = new_state

    # 4d. Reduce exploration over time
    exploration_rate = max(min_exploration, exploration_rate * np.exp(-exploration_decay * episode))

# 5. Print final Q-table
print("Final Q-Table:")
print(q_table)


Final Q-Table:
[[1.78881228e-01 9.50990050e-01 3.69389520e-06 1.78889526e-01]
 [4.71947511e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [9.51379317e-02 9.60596010e-01 0.00000000e+00 1.78981853e-01]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.05950762e-04 0.00000000e+00 9.70299000e-01 2.88597503e-04]
 [1.82521023e-01 9.80100000e-01 9.56146023e-03 0.00000000e+00]
 [2.62586939e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 9.80100000e-02 9.90000000e-01 1.05606941e-01]
 [3.82057904e-01 2.25673746e-01 1.00000000e+00 1.82112743e-02]
 [0.00000000e+00 0.00000000e+00 0.000000

## Popular Frameworks and Tools

- **Scikit-Learn**: Easy-to-use for classical ML
- **TensorFlow / Keras**: Deep learning
- **PyTorch**: Popular with researchers

---

## Data Preprocessing and Feature Engineering

Before training:

1. Clean the data (handle missing values)
2. Encode categorical variables
3. Normalize or scale features

**Why?** Garbage in, garbage out. The model learns from the data, so clean and meaningful data matters!

---

## Model Evaluation and Assessment

| Metric           | Use Case                    |
| ---------------- | --------------------------- |
| Accuracy         | Classification tasks        |
| MAE, RMSE        | Regression tasks            |
| Confusion Matrix | Check classification errors |

### Example:

In [13]:
actual = [3, 5, 7]
predicted = [2.5, 5, 8]

mse = mean_squared_error(actual, predicted)
print("MSE:", mse)

MSE: 0.4166666666666667


---

## Ethical Considerations in Machine Learning

- **Bias**: Models can reflect societal biases present in data
- **Privacy**: ML models can leak sensitive info
- **Transparency**: "Black box" models are hard to explain

**Remember**: With great power comes great responsibility.

---

## Exercises

1. Load a dataset using `sklearn.datasets.load_iris()` and apply a classifier.
2. Try clustering with KMeans using random 2D points.
3. Add missing values to a dataset and try handling them with Pandas.
4. Train a linear regression model and evaluate it with MAE.

---

## Summary

- Machine learning helps systems learn patterns from data.
- Different types of ML are suited for different problems.
- Always clean your data before modeling.
- Evaluation and ethics are crucial to responsible ML.

You're now ready to start your Machine Learning journey!