# Day 1: Course Overview & ML Concepts Recap (1.1)

## Table of Contents
1. [Course Roadmap (1.1.1)](#course-roadmap)
   - Welcome & Introduction
   - Course Goal
   - Prerequisites
   - Learning Outcomes
   - Course Structure
   - Tools We Will Use
   - Learning Environment
2. [The Machine Learning Spectrum (1.1.2)](#ml-spectrum)
   - What is Machine Learning?
   - Supervised Learning
   - Unsupervised Learning
   - Semi-Supervised Learning
   - Self-Supervised Learning
   - Reinforcement Learning
3. [Practice Questions](#practice-questions)

<a id="course-roadmap"></a>
## 1.1.1 Course Roadmap

### Welcome & Introduction
Welcome to this introductory course on Machine Learning! Over the next five weeks (10 sessions), we will explore the fundamental concepts and practical applications of ML. This course is designed to give you both a strong theoretical foundation and practical hands-on experience with key ML techniques and workflows.

### Course Goal
Our primary objective is to equip you with a solid foundational understanding of Machine Learning principles and provide hands-on experience using Python and key libraries like Scikit-learn. This course emphasizes practical skills and building intuition for applying common ML techniques to real-world problems.

### Prerequisites
This course assumes familiarity with:
- Python programming fundamentals
- Data manipulation using Pandas
- Data visualization with Matplotlib/Seaborn
- Basic Exploratory Data Analysis (EDA)

We will build upon these existing skills throughout the course.

### Learning Outcomes
Upon completing this course, you should be able to:
- Understand core ML concepts (Supervised vs. Unsupervised Learning)
- Describe and navigate the standard Machine Learning project workflow
- Implement key algorithms for:
  - Regression (Linear Regression)
  - Classification (Logistic Regression, Decision Trees, Random Forests)
  - Clustering (K-Means)
  - Dimensionality reduction (PCA)
- Evaluate the performance of ML models using appropriate metrics
- Recognize and apply basic techniques to address common issues like imbalanced data and model overfitting
- Utilize Scikit-learn effectively for common ML tasks

### Course Structure (10 Days)
- **Days 1-2:** Foundations, ML Workflow, Scikit-learn API, Linear Regression & Evaluation
- **Days 3-4:** Introduction to Classification (Logistic Regression), Classification Evaluation, Data Preparation Techniques
- **Days 5-6:** Handling Imbalanced Data, Decision Trees & Model Interpretability
- **Days 7-8:** Ensemble Methods (Random Forests), Model Validation (Cross-Validation), and Optimization (Hyperparameter Tuning)
- **Days 9-10:** Unsupervised Learning (K-Means Clustering, PCA), Course Wrap-up & Mini-Project

**Daily Format:** Each session will typically involve learning theoretical concepts followed by practical hands-on labs to reinforce understanding.

### Tools We Will Use
- Python 3
- Jupyter Notebooks / Google Colab
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Imbalanced-learn

### Learning Environment
Active participation is encouraged! Ask questions, share insights, and collaborate during labs. The most effective learning happens when you engage actively with the material and with your peers. Regular breaks will be scheduled to maintain focus and energy.

<a id="ml-spectrum"></a>
## 1.1.2 The Machine Learning Spectrum

### What is Machine Learning?
Machine Learning (ML) is a field of artificial intelligence focused on building systems that can learn from and make decisions based on data, without being explicitly programmed for every task. Instead of writing detailed rules, we allow the algorithm to discover patterns and relationships within the data provided.

![ML vs Traditional Programming](https://i.imgur.com/jTmGCJs.png)

### Supervised Learning
**Concept:** Learning from a dataset where each data point (sample) includes input features and a corresponding known output label (the "supervision"). The goal is to learn a function that maps inputs to outputs, enabling predictions on new, unseen data.

**Analogy:** Learning with a teacher who provides correct answers (labels) for practice questions (features).

**Types:**
- **Regression:** Predicts a continuous numerical value.
  - *Examples:* Predicting house prices, forecasting temperature, estimating customer lifetime value.
- **Classification:** Predicts a discrete category or class label.
  - *Examples:* Classifying email as spam/not spam, identifying tumor type as benign/malignant, recognizing handwritten digits.

*Course Focus: We will delve deep into algorithms like Linear Regression, Logistic Regression, Decision Trees, and Random Forests.*

### Unsupervised Learning
**Concept:** Learning from data that does *not* have predefined output labels. The algorithm's task is to identify patterns, structures, or groupings inherent in the data itself.

**Analogy:** Finding natural groups or themes within a collection of news articles without any prior categorization.

**Common Tasks:**
- **Clustering:** Grouping similar data points together.
  - *Examples:* Customer segmentation based on behavior, grouping similar documents, discovering subtypes within a dataset.
- **Dimensionality Reduction:** Reducing the number of features while retaining meaningful information.
  - *Examples:* Compressing data, simplifying data for visualization, noise reduction.
- *(Other types exist, like Association Rule Learning for finding co-occurrence patterns).*

*Course Focus: We will explore K-Means for clustering and PCA for dimensionality reduction.*

### Semi-Supervised Learning
**Concept:** A hybrid approach that utilizes a large amount of unlabeled data along with a small amount of labeled data. This is useful when labeling data is costly or time-consuming.

**Goal:** To improve learning accuracy compared to using only the small labeled dataset.

*Brief Mention: Not a primary focus of this course.*

### Self-Supervised Learning
**Concept:** A type of learning (often considered under the unsupervised/semi-supervised umbrella) where the supervision signal (labels) is generated automatically from the input data itself, rather than by humans. This often involves solving a "pretext task," like predicting a masked part of the input.

**Examples:** Predicting the next word in a text sequence; filling in missing parts of an image.

**Significance:** Foundational for many modern large-scale models in NLP and vision.

*Brief Mention: An advanced topic beyond the scope of our labs.*

### Reinforcement Learning (RL)
**Concept:** A distinct paradigm where an "agent" learns to make sequential decisions by interacting with an "environment." The agent performs "actions" and receives "rewards" or "penalties," learning over time to maximize its cumulative reward.

**Analogy:** Training a pet through treats and consequences; learning to play a game by winning or losing.

**Examples:** Game playing AI (Chess, Go), robotics control, autonomous navigation.

*Brief Mention: A separate field of ML not covered in this course.*

### Summary
While ML encompasses this broad spectrum, our practical work in this course will concentrate on mastering the fundamentals of **Supervised** and **Unsupervised Learning** using Scikit-learn.

![ML Types Overview](https://i.imgur.com/uGKbxk4.png)

<a id="practice-questions"></a>
## Practice Questions

### Course Roadmap Questions
1. What are the main learning outcomes of this Machine Learning course?
2. Why is it important to have prerequisites like Python, Pandas, and basic data visualization skills before taking this course?
3. How is the 10-day course structured? What topics will be covered in which days?
4. What tools and libraries will we be using throughout this course? Why are these important for ML practitioners?

### Machine Learning Spectrum Questions
1. What is the key difference between Machine Learning and traditional programming?
2. Explain the difference between supervised and unsupervised learning with an example of each.
3. For a scenario where you need to predict customer churn (whether a customer will leave or stay), would you use regression or classification? Explain why.
4. How does semi-supervised learning differ from supervised learning, and in what situations might it be preferred?
5. What is the main difference between self-supervised learning and supervised learning in terms of where the labels come from?
6. In reinforcement learning, what are the roles of the "agent," "environment," "actions," and "rewards"?
7. For each of the following problems, identify which type of ML would be most appropriate:
   - Grouping customers by purchasing behavior
   - Predicting stock prices
   - Teaching a robot to navigate a maze
   - Identifying fraudulent credit card transactions
   - Reducing the dimensionality of a large dataset for visualization