# Object Oriented Programming with Scikit-Learn

## Topics Today:

1. The Four Main Inherited Object Types in Scikit-Learn
2. Transformers (e.g., StandardScaler)
3. Classifiers (e.g., LogisticRegression)
4. Regressors (e.g., LinearRegression)
5. Clusterers (e.g., KMeans)
6. Practice Exercises

---

## Introduction: Why Scikit-Learn and OOP Matter in Data Science

Object-Oriented Programming (OOP) helps organize code using objects, just like organizing your wardrobe into drawers: socks in one, shirts in another. Scikit-Learn is built using this system, making machine learning workflows easier to build, extend, and reuse.

At the heart of Scikit-Learn is the class `BaseEstimator`, which provides a common structure for all learning algorithms and data transformers. From this base, other object types are inherited.

---

## 1. The Four Main Inherited Object Types

Every model or transformer in Scikit-Learn is one of these types:

### 1. Transformers

Used to change or prepare data.

### 2. Classifiers

Used to predict categories (like spam vs not-spam).

### 3. Regressors

Used to predict numbers (like house prices).

### 4. Clusterers

Used to group data (like segmenting customers).

All these classes have the following methods:

- `.fit()` - learns from the data
- `.transform()` - modifies data (for Transformers)
- `.predict()` - makes predictions (for Classifiers, Regressors, Clusterers)

---

## 2. Transformers (Example: StandardScaler)

Transformers help prepare your data, like scaling features to a standard range.

### Code:

In [1]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6]])
scaler = StandardScaler()
scaler.fit(X)           # Learns the mean and std
X_scaled = scaler.transform(X)  # Applies the transformation
X_scaled

array([[-1.22474487, -1.22474487],
       [ 0.        ,  0.        ],
       [ 1.22474487,  1.22474487]])

**Line-by-line:**

- `StandardScaler()` creates a transformer object.
- `.fit(X)` calculates the mean and std dev.
- `.transform(X)` scales the values.

---

## 3. Classifiers (Example: LogisticRegression)

Used to predict categories.

### Code:

In [2]:
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]

clf = LogisticRegression()
clf.fit(X, y)
print(clf.predict([[1.5]]))

[1]


**Explanation:**

- `LogisticRegression()` initializes the classifier.
- `.fit(X, y)` learns from labeled data.
- `.predict([[1.5]])` guesses the label for 1.5.

---

## 4. Regressors (Example: LinearRegression)

Used when the output is a continuous number.

### Code:

In [3]:
X = [[1], [2], [3], [4]]
y = [1.5, 3.5, 5.5, 7.5]

reg = LinearRegression()
reg.fit(X, y)
print(reg.predict([[5]]))

[9.5]


**Explanation:**

- Learns the equation of a line from data.
- Predicts the y-value for a new x.

### Formula:

```
y = mx + b
```

**Python Equivalent:**

In [4]:
m = reg.coef_[0]  # slope
b = reg.intercept_  # intercept
y = m * 5 + b

---

## 5. Clusterers (Example: KMeans)

Used for unsupervised grouping of data.

### Code:

In [5]:
X = [[1], [2], [10], [11]]
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(X)
print(kmeans.predict([[0], [12]]))

[1 0]


**Explanation:**

- `n_clusters=2`: looking for 2 groups.
- `.fit(X)`: finds the centers.
- `.predict(...)`: assigns new data to the nearest group.

---

## Summary

- Scikit-Learn is built with OOP principles to standardize model workflows.
- Every estimator follows `.fit()` and either `.transform()` or `.predict()`.
- Transformers prep data, classifiers and regressors make predictions, clusterers find structure.
- Understanding these types helps you build reusable machine learning pipelines.

Next Step: Explore Scikit-Learn Pipelines to automate end-to-end workflows!