# Course Orientation: Machine Learning for Data Scientists Masterclass

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand the repository structure** and how materials are organized across modules.
2. **Know the recommended learning path** from foundational topics through advanced methods.
3. **Set up your local environment** with all required libraries and verify that everything works.

## Prerequisites

- **Basic Python** -- variables, functions, loops, list comprehensions, and familiarity with importing packages.
- **Basic Statistics** -- mean, variance, standard deviation, and a rough idea of probability distributions.

## Table of Contents

1. [What This Course Covers](#1.-What-This-Course-Covers)
2. [How to Use This Repository](#2.-How-to-Use-This-Repository)
3. [Environment Setup](#3.-Environment-Setup)
4. [Notebook Style Guide](#4.-Notebook-Style-Guide)
5. [Recommended Learning Path](#5.-Recommended-Learning-Path)
6. [Key Conventions](#6.-Key-Conventions)
7. [Final Import Verification](#7.-Final-Import-Verification)

---

## 1. What This Course Covers

This masterclass walks through the core machine-learning algorithms that every data scientist should know. The material is organized into seven modules:

| Module | Title | Key Topics |
|--------|-------|------------|
| **ML100** | Data Splitting & Feature Fundamentals | Train/test/validation splits, feature scaling, encoding categorical variables, data leakage |
| **ML200** | Linear Regression | Simple & multiple linear regression, OLS, assumptions, residual analysis, polynomial features |
| **ML300** | Logistic Regression & Classification | Binary & multiclass classification, sigmoid function, decision boundaries, evaluation metrics |
| **ML400** | KNN & Clustering | k-Nearest Neighbors, K-Means clustering, DBSCAN, distance metrics, the curse of dimensionality |
| **ML500** | Trees, Ensembles & Boosting | Decision trees, Random Forests, bagging, AdaBoost, Gradient Boosting, XGBoost |
| **ML600** | Optimization, Regularization & Model Selection | Gradient descent, Ridge, Lasso, ElasticNet, cross-validation, hyperparameter tuning |
| **ML700** | Advanced Topics (Optional) | Dimensionality reduction (PCA, t-SNE), pipelines, feature selection, model interpretability |

Each module builds on the ones before it, so completing them in order is strongly recommended.

---

## 2. How to Use This Repository

### Recommended Order

1. Start with **this notebook** (`00_Course_Orientation`) to understand the structure and set up your environment.
2. Optionally review `ML050_ML_Prerequisites_Quick_Ref` if you need a refresher on math, stats, or Python fundamentals.
3. Work through modules **ML100 through ML700** in numerical order.
4. Complete the exercises in the `exercises/` folder after each module to reinforce your understanding.
5. Tackle the capstone projects in `projects/` once you feel confident.

### Repository Layout

```
Machine-Learning-Algorithms-Masterclass/
|
|-- notebooks/           <- All lesson notebooks, organized by module
|   |-- ML100_.../
|   |-- ML200_.../
|   |-- ...             
|
|-- exercises/           <- Practice exercises for each module
|-- projects/            <- End-to-end capstone projects
|-- data/                <- Datasets (synthetic and real-world)
|-- src/                 <- Reusable helper utilities and plotting functions
|-- assets/              <- Images and diagrams used in notebooks
|-- requirements.txt     <- pip dependencies
|-- environment.yml      <- conda environment specification
```

### How Notebooks Are Structured

Every lesson notebook follows the same pattern:

1. **Title & Learning Objectives** -- what you will learn.
2. **Prerequisites** -- what you should already know.
3. **Table of Contents** -- quick navigation links.
4. **Theory** -- concise explanations with LaTeX formulas and diagrams.
5. **Code Walk-throughs** -- runnable cells that demonstrate each concept.
6. **Exercises / Practice** -- hands-on problems for you to solve.
7. **Summary & Next Steps** -- recap and pointers to the next notebook.

---

## 3. Environment Setup

You can set up your environment using either **pip** or **conda**.

### Option A -- pip

```bash
pip install -r requirements.txt
```

### Option B -- conda

```bash
conda env create -f environment.yml
conda activate ml-masterclass
```

Run the cell below to verify that all core libraries import correctly and print their versions.

In [None]:
import sys
print(f"Python version : {sys.version}")
print()

# Core scientific stack
import numpy as np
print(f"NumPy          : {np.__version__}")

import pandas as pd
print(f"pandas         : {pd.__version__}")

import sklearn
print(f"scikit-learn   : {sklearn.__version__}")

import matplotlib
print(f"matplotlib     : {matplotlib.__version__}")

import matplotlib.pyplot as plt

print()
print("All core imports succeeded.")

---

## 4. Notebook Style Guide

All notebooks in this course follow a **consistent format** so that you always know what to expect:

| Section | Purpose |
|---------|---------|
| **Learning Objectives** | A numbered list at the very top stating what you will be able to do after completing the notebook. |
| **Prerequisites** | Concepts or notebooks you should have completed before starting. |
| **Table of Contents** | Clickable links to each major section for easy navigation. |
| **Theory Sections** | Markdown cells with clear explanations, LaTeX equations, and diagrams where helpful. |
| **Code Cells** | Fully runnable Python code that demonstrates the concept just discussed. Code cells are meant to be executed top-to-bottom. |
| **Exercises** | Practice problems embedded at the end of each notebook (or in the `exercises/` folder) for you to solve on your own. |
| **Summary & Next Steps** | A brief recap of key takeaways and a pointer to the next notebook in the sequence. |

### Why Consistency Matters

- You can **skim quickly** -- Learning Objectives tell you if a notebook is relevant to your needs.
- You can **pick up where you left off** -- the Table of Contents lets you jump to any section.
- You can **run everything end-to-end** -- cells are designed to execute in order without errors.

---

## 5. Recommended Learning Path

The table below summarizes the recommended order, estimated difficulty, and approximate time for each module.

| Order | Module | Topic | Difficulty | Approx. Time |
|:-----:|--------|-------|:----------:|:-------------:|
| 0 | **ML050** | Prerequisites Quick Reference | Beginner | 1--2 hours |
| 1 | **ML100** | Data Splitting & Feature Fundamentals | Beginner | 3--4 hours |
| 2 | **ML200** | Linear Regression | Beginner--Intermediate | 4--5 hours |
| 3 | **ML300** | Logistic Regression & Classification | Intermediate | 4--5 hours |
| 4 | **ML400** | KNN & Clustering | Intermediate | 4--5 hours |
| 5 | **ML500** | Trees, Ensembles & Boosting | Intermediate--Advanced | 5--6 hours |
| 6 | **ML600** | Optimization, Regularization & Model Selection | Intermediate--Advanced | 5--6 hours |
| 7 | **ML700** | Advanced Topics (Optional) | Advanced | 4--6 hours |

**Total estimated time: 30--39 hours** (not including exercises and projects).

> **Tip:** If you already have a strong math/stats background, you can skip ML050 and jump straight to ML100.

---

## 6. Key Conventions

To keep the course materials reproducible and easy to follow, we use the following conventions throughout:

### Reproducibility

- **`random_state=42`** is used wherever randomness is involved (train/test splits, model initialization, etc.) so that you can reproduce results exactly.

### Data

- Most examples use **synthetic datasets** generated with `sklearn.datasets.make_*` functions or small, well-known datasets (Iris, Boston, etc.). This keeps the focus on the algorithm rather than data wrangling.
- Larger or real-world datasets live in the `data/` directory.

### Reusable Utilities

- Common helper functions (e.g., custom plot functions, data loaders, evaluation helpers) are stored in the **`src/`** directory.
- You can import them in any notebook:

```python
import sys, os
sys.path.append(os.path.join("..", "src"))
from utils import plot_decision_boundary
```

### Naming

- Notebook files are prefixed with their module code (e.g., `ML200_01_Simple_Linear_Regression.ipynb`).
- Variables follow Python naming conventions: `snake_case` for variables and functions, `PascalCase` for classes.

---

## 7. Final Import Verification

Run the cell below as a final check. If everything prints without errors, your environment is ready to go.

In [None]:
# ============================================================
# Final verification: import all core libraries and confirm
# ============================================================

libraries = {}

try:
    import numpy as np
    libraries["numpy"] = np.__version__
except ImportError:
    libraries["numpy"] = "NOT INSTALLED"

try:
    import pandas as pd
    libraries["pandas"] = pd.__version__
except ImportError:
    libraries["pandas"] = "NOT INSTALLED"

try:
    import sklearn
    libraries["scikit-learn"] = sklearn.__version__
except ImportError:
    libraries["scikit-learn"] = "NOT INSTALLED"

try:
    import matplotlib
    libraries["matplotlib"] = matplotlib.__version__
except ImportError:
    libraries["matplotlib"] = "NOT INSTALLED"

try:
    import scipy
    libraries["scipy"] = scipy.__version__
except ImportError:
    libraries["scipy"] = "NOT INSTALLED"

try:
    import seaborn as sns
    libraries["seaborn"] = sns.__version__
except ImportError:
    libraries["seaborn"] = "NOT INSTALLED (optional)"

# Print results
print("=" * 45)
print("  Environment Verification Report")
print("=" * 45)
all_ok = True
for lib, ver in libraries.items():
    status = "OK" if "NOT INSTALLED" not in ver else "MISSING"
    if status == "MISSING" and "optional" not in ver:
        all_ok = False
    print(f"  {lib:<15s} {ver:<20s} [{status}]")

print("=" * 45)
if all_ok:
    print("  All required libraries are installed.")
    print("  You are ready to start the course!")
else:
    print("  WARNING: Some required libraries are missing.")
    print("  Please install them before proceeding.")
print("=" * 45)

---

## Summary & Next Steps

You now know:

- How this repository is organized and what each module covers.
- The recommended order for working through the material.
- How to set up and verify your Python environment.
- The conventions used throughout the course for consistency and reproducibility.

**Next up:** If you need a refresher on the math and Python fundamentals behind ML, open **`ML050_ML_Prerequisites_Quick_Ref.ipynb`**. Otherwise, jump straight into **Module ML100 -- Data Splitting & Feature Fundamentals**.

Happy learning!