<a href="https://colab.research.google.com/github/jeron-williams/ML_Reference_Library/blob/main/ML_Reference_Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Machine Learning Reference Library (Python Edition)
**Purpose**: This notebook serves as a comprehensive reference for key machine learning concepts, models, algorithms, and equations using Python. It is designed for fast lookup, project support, and practical implementation within Google Colab.

## 📚 Table of Contents
- [1. Data Preprocessing](#1)
- [2. Supervised Learning](#2)
- [3. Unsupervised Learning](#3)
- [4. Model Evaluation Metrics](#4)
- [5. Feature Engineering](#5)
- [6. Mathematical Foundations](#6)
- [7. Model Deployment (Optional)](#7)
- [8. Utility Functions](#8)

## 📦 1. Data Preprocessing

### 🔹 1.1 Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

### 🔹 1.2 Handle Missing Values

In [None]:
# Drop missing
df = df.dropna()

# Fill missing with mean
df['column_name'] = df['column_name'].fillna(df['column_name'].mean())

### 🔹 1.3 Encode Categorical Features

In [None]:
# Label Encoding
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['encoded'] = le.fit_transform(df['category'])

# One-Hot Encoding
df = pd.get_dummies(df, columns=['category'])

## 🧮 2. Supervised Learning

### 🔹 2.1 Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

### 🔹 2.2 Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train)

### 🔹 2.3 Random Forest Regressor

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

### 🔹 2.4 Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

## 🌀 3. Unsupervised Learning

### 🔹 3.1 K-Means Clustering

In [None]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)

### 🔹 3.2 Principal Component Analysis (PCA)

In [None]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

## 📈 4. Model Evaluation Metrics

### 🔹 4.1 Classification Metrics

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy_score(y_test, y_pred)

### 🔹 4.2 Regression Metrics

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

## 🛠️ 5. Feature Engineering

### 🔹 5.1 Scaling and Normalization

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### 🔹 5.2 Polynomial Features

In [None]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

## 🧾 6. Mathematical Foundations

### 🔹 6.1 Mean Squared Error (MSE)
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$

### 🔹 6.2 Gradient Descent for Linear Regression

In [None]:
def gradient_descent(X, y, lr=0.01, epochs=100):
    m = X.shape[0]
    w = np.zeros(X.shape[1])
    b = 0
    for _ in range(epochs):
        y_pred = np.dot(X, w) + b
        dw = -(2/m) * np.dot(X.T, (y - y_pred))
        db = -(2/m) * np.sum(y - y_pred)
        w -= lr * dw
        b -= lr * db
    return w, b

### 🔹 6.3 Entropy & Gini Index
$$
\text{Entropy}(S) = - \sum p_i \log_2(p_i)
\quad
\text{Gini}(S) = 1 - \sum p_i^2
$$

## 🚀 7. Model Deployment (Optional)

### 🔹 Export Model

In [None]:
import joblib
joblib.dump(model, 'model.pkl')

### 🔹 Load Model

In [None]:
model = joblib.load('model.pkl')

## 🔧 8. Utility Functions

In [None]:
def summarize_dataframe(df):
    print("Shape:", df.shape)
    print("Data types:\n", df.dtypes)
    print("Missing values:\n", df.isnull().sum())
    print("First 5 rows:\n", df.head())