---
# <center> *Machine Learning*
## <center> Summer 2025, Week 1 (July 9th): Introduction
---

# Foundations of Machine Learning

## What is Machine Learning?

Machine learning is a type of computer science that focuses on building systems that can learn from data and make predictions or decisions based on that data.

Instead of writing step-by-step instructions for a computer to follow (like in traditional programming), we give it examples. From these examples, the computer finds patterns and creates rules on its own.

**Example:**  
Suppose you want a program to recognize handwritten digits. Instead of writing rules for every possible way a person might draw the number 8, you can show the program thousands of examples of digits along with their correct labels. The program learns to associate shapes with digits and can start recognizing new ones.

---

## Machine Learning vs Traditional Programming

| Traditional Programming            | Machine Learning                      |
|------------------------------------|---------------------------------------|
| Programmer writes rules            | Program learns rules from data        |
| Input + Rules → Output             | Input + Output → Learn rules          |
| Logic is explicitly defined        | Logic is inferred from patterns       |

---

## Core Concepts

- **Model**: A mathematical function that maps inputs to outputs.
- **Feature**: An input variable (e.g. height, temperature, number of hours studied).
- **Label**: The output or correct answer (e.g. pass/fail, price, species).
- **Training**: The process of learning from data.
- **Prediction**: What the model outputs after training, when given new data.
- **Overfitting**: When a model learns noise or specific details in the training data that don't generalize well to new data.
- **Underfitting**: When a model is too simple and fails to capture important patterns.

---

## Types of Machine Learning

### 1. Supervised Learning

In supervised learning, the model is trained on labeled data. Each example in the dataset includes both input values and the correct output.

**Examples:**
- Predicting house prices from features like size and location.
- Determining whether an email is spam or not.

Supervised learning is used when you have historical data with known answers.

**Two main tasks:**
- **Classification**: Predicting categories (e.g. disease vs no disease).
- **Regression**: Predicting continuous values (e.g. house price).

---

### 2. Unsupervised Learning

In unsupervised learning, the data has no labels. The model tries to find structure or patterns in the data.

**Examples:**
- Grouping similar customers based on purchasing behavior.
- Visualizing high-dimensional data using fewer variables.

**Common methods:**
- Clustering (e.g. K-Means)
- Dimensionality Reduction (e.g. PCA)

---

### 3. Semi-Supervised Learning

Semi-supervised learning combines both labeled and unlabeled data. It is useful when labeling data is expensive or time-consuming, but you have access to a large amount of unlabeled data.

**Example:**
- You have 100 labeled images of animals and 10,000 unlabeled images. A semi-supervised model can learn from both and still perform well.

This approach is often used in real-world applications like image recognition, speech processing, or medical diagnosis.

---

### 4. Reinforcement Learning

Reinforcement learning is about learning by trial and error. An agent interacts with an environment, makes decisions, and learns from the feedback (reward or penalty).

**Example:**
- Teaching a robot to walk.
- Training a computer to play a video game.

**Key terms:**
- **Agent**: The learner or decision-maker.
- **Environment**: What the agent interacts with.
- **Action**: A choice the agent makes.
- **Reward**: Feedback the agent receives after an action.
- **Policy**: The agent's strategy for choosing actions.

Over time, the agent learns what actions lead to better rewards and adjusts its strategy accordingly.

---

## The Machine Learning Process

1. **Define the problem**: What are you trying to predict or understand?
2. **Collect data**: Gather examples relevant to the problem.
3. **Explore the data**: Understand distributions, spot missing values, visualize patterns.
4. **Prepare the data**: Clean, normalize, and structure the data for use.
5. **Choose a model**: Select an algorithm suitable for the task.
6. **Train the model**: Fit the model using the training data.
7. **Evaluate the model**: Check how well the model performs on unseen data.
8. **Tune and improve**: Adjust settings or features to improve performance.
9. **Deploy**: Use the model in the real world.
10. **Monitor and maintain**: Update the model as new data becomes available.

---

## Tools You Will Use in This Course

- **Python**: Programming language for writing ML code.
- **Jupyter Notebooks**: Interactive environment for running and sharing code.
- **pandas / numpy**: Tools for handling data.
- **matplotlib / seaborn**: For plotting and visualizing data.
- **scikit-learn**: A library for building and testing machine learning models.