# Step-by-Step Learning Path for scikit-learn

### Step 1: Basics of scikit-learn

##### Installation and Setup

In [None]:
pip install scikit-learn

##### Import the necessary libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Loading and Splitting Data

##### 1. Load data using pandas:

In [None]:
data = pd.read_csv('data.csv')

##### 2. Split data into features (X) and target (y):

In [None]:
X = data.drop('target', axis=1)
y = data['target']

##### 3. Split data into training and testing sets:

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 2: Supervised Learning

### 1. Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


### 2. Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


### 3. Decision Trees

In [None]:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


### 4. Random Forests

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


### 5. Support Vector Machines (SVM)

In [None]:
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


### 6. K-Nearest Neighbors (KNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


## Step 3: Unsupervised Learning

### 1. K-Means Clustering

In [None]:
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(X)
clusters = model.predict(X)


### 2. Principal Component Analysis (PCA)

In [None]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)


## Step 4: Model Evaluation and Selection

### 1. Cross-Validation

In [None]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)


### 2. Grid Search and Random Search for Hyperparameter Tuning

In [None]:
from sklearn.model_selection import GridSearchCV
param_grid = {'parameter_name': [values]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)


## Step 5: Advanced Topics

### 1. Pipelines

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression())
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)


### 2. Ensemble Methods

In [None]:
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


In [None]:
Perfect — let’s build your **Customer Churn Prediction** models using **Logistic Regression**, **Decision Tree**, **Random Forest**, **AdaBoost**, and **SVC** step by step. You’ll be able to compare their performances easily.

Here's the **full code** after preprocessing (assuming you've already scaled `X` and split into `X_train`, `X_test`, `y_train`, `y_test`):

---

### ✅ Model Training & Evaluation with 5 Classifiers

```python
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Initialize models
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
    "Decision Tree": DecisionTreeClassifier(random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
    "AdaBoost": AdaBoostClassifier(n_estimators=100, random_state=42),
    "SVC": SVC(kernel='rbf', random_state=42)
}

# Train and evaluate each model
for name, model in models.items():
    print(f"\n--- {name} ---")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("Confusion Matrix:")
    print(confusion_matrix(y_test, y_pred))

    print("\nClassification Report:")
    print(classification_report(y_test, y_pred))

    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.4f}")
```

---

### 🧠 Tips

* You can store accuracy scores in a dictionary to compare them visually later.
* You can also plot a bar chart of model performances using `matplotlib` if needed.

---

Would you like me to help add:

* a feature importance chart for `RandomForest` or `AdaBoost`?
* a final model selection code block?
* model saving (joblib/pickle)?
* visualizations like confusion matrix heatmaps?

Let me know what you'd like next!


# Recommended Order of Learning

## 1.  Basics of scikit-learn

## 2. Supervised Learning

#### 1. Linear Regression
#### 2. Logistic Regression
#### 3. Decision Trees
#### 4. Random Forests
#### 5. Support Vector Machines (SVM)
#### 6. K-Nearest Neighbors (KNN)

## 3. Unsupervised Learning

#### 1. K-Means Clustering
#### 2. Principal Component Analysis (PCA)

## 4. Model Evaluation and Selection

#### 1. Cross-Validation
#### 2. Hyperparameter Tuning (Grid Search, Random Search)

## 5. Advanced Topics

#### 1. Pipelines
#### 2. Advanced Ensemble Methods

# Projects to Implement

### 1. Predicting House Prices - Linear Regression.
### 2. Customer Churn Prediction - Logistic Regression.
### 3. Iris Flower Classification - Decision Trees and Random Forests.
### 4. Digit Recognition - Support Vector Machines.
### 5. Wine Quality Prediction - K-Nearest Neighbors.
### 6. Customer Segmentation - K-Means Clustering.
### 7. Dimensionality Reduction on MNIST - Principal Component Analysis (PCA).