**1. Linear Regression**
Linear Regression is a supervised learning algorithm used to predict a continuous target variable based on one or more input features. It assumes a linear relationship between the input features and the target variable. The goal is to fit a line (or hyperplane in higher dimensions) that minimizes the difference between the predicted and actual values.

When to Use:
When you want to model the relationship between a continuous dependent variable and one or more independent variables.
When you believe the relationship between variables is approximately linear.
Example Problem:
Predicting house prices based on features like square footage, number of bedrooms, and age of the house.

In [None]:
# Importing the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Creating a sample dataset
# Features (X) - size of house in square feet
# Target (y) - price of the house in $1000s
X = np.array([1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400]).reshape(-1, 1)
y = np.array([300, 320, 340, 360, 380, 400, 420, 440, 460, 480])

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Linear Regression model
lr_model = LinearRegression()

# Fitting the model
lr_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = lr_model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Visualizing the regression line
plt.scatter(X, y, color='blue')  # Actual data points
plt.plot(X, lr_model.predict(X), color='red')  # Regression line
plt.title('House Prices vs. Size (Linear Regression)')
plt.xlabel('Size of house (square feet)')
plt.ylabel('Price ($1000)')
plt.show()


**2. Decision Tree**
A Decision Tree is a supervised learning algorithm used for both classification and regression tasks. The algorithm splits the dataset into subsets based on the value of the input features, recursively creating a tree of decisions. Each internal node represents a "test" on a feature, each branch represents the outcome of the test, and each leaf node represents a class or value.

When to Use:
When you need a model that's interpretable.
When you have both categorical and numerical features.
When the dataset contains non-linear relationships.
Example Problem:
Classifying whether a person will buy a product based on features like age, income, and product category.



In [None]:
# Importing necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Loading the Iris dataset (a common classification dataset)
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Decision Tree model
dt_model = DecisionTreeClassifier()

# Fitting the model
dt_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = dt_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Visualizing the decision tree (you need graphviz installed to view it as a visual tree)
from sklearn import tree
import matplotlib.pyplot as plt

plt.figure(figsize=(12,12))
tree.plot_tree(dt_model, filled=True)
plt.show()


**3. Random Forest**
Random Forest is an ensemble learning algorithm that creates multiple decision trees (using subsets of data and features) and combines their predictions to improve accuracy and reduce overfitting. It works for both classification and regression tasks.

When to Use:
When you want a more robust and less prone to overfitting version of a decision tree.
When you have large datasets with a lot of noise or missing data.
Example Problem:
Predicting customer churn in a telecom company based on usage patterns, customer complaints, and subscription history.

In [None]:
# Importing necessary libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Using the Iris dataset again
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Fitting the model
rf_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = rf_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**4. AdaBoost**
AdaBoost (Adaptive Boosting) is a boosting algorithm that combines multiple weak learners (usually shallow decision trees) to create a stronger model. It assigns more weight to misclassified data points at each iteration, focusing the next model on harder-to-classify instances.

When to Use:
When you have weak learners and want to improve their performance.
When you have imbalanced data or noisy data.
Example Problem:
Classifying whether a customer will renew their subscription based on usage and complaints, where most customers easily renew, but some are difficult to predict.

In [None]:
# Importing necessary libraries
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Using the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the AdaBoost model
ada_model = AdaBoostClassifier(n_estimators=50, random_state=42)

# Fitting the model
ada_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = ada_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**5. Gradient Boosting**
Gradient Boosting is another boosting technique where new models are added to correct the errors made by existing models. Unlike AdaBoost, which assigns weights to instances, Gradient Boosting optimizes based on the gradient of the loss function, adding models that reduce the overall error in a stepwise manner.

When to Use:
When you want a highly accurate model, especially for complex datasets with non-linear relationships.
When you are okay with slightly longer training times.
Example Problem:
Predicting credit card fraud based on spending patterns, location, and transaction history.

In [None]:
# Importing necessary libraries
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Using the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

# Fitting the model
gb_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = gb_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**6. Logistic Regression**
Logistic Regression is a classification algorithm used when the dependent variable is categorical. It estimates the probability that a given input belongs to a particular category, making it suitable for binary or multinomial classification.

When to Use:
When you have a binary or categorical target variable.
When you need a simple and interpretable model for classification.
Example Problem:
Classifying whether an email is spam or not based on the presence of certain words and phrases.

In [None]:
# Importing necessary libraries
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Using the Iris dataset for binary classification (only two classes)
X = iris.data[:100]  # Only two classes
y = iris.target[:100]

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Logistic Regression model
logreg_model = LogisticRegression()

# Fitting the model
logreg_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = logreg_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**7. Support Vector Machine (SVM)**
Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It finds the hyperplane that best separates the data points of different classes, maximizing the margin between them.

When to Use:
When you have high-dimensional data.
When you need a powerful and flexible classification model that can handle non-linear data through kernel tricks.
Example Problem:
Classifying types of cancer based on the results of multiple medical tests, where the data might not be linearly separable.

In [None]:
# Importing necessary libraries
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Using the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Support Vector Classifier model
svm_model = SVC(kernel='linear')

# Fitting the model
svm_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = svm_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**8. k-Nearest Neighbors (KNN)**
k-Nearest Neighbors is a supervised learning algorithm used for both classification and regression tasks. The algorithm classifies a data point based on the majority class of its k-nearest neighbors in the feature space.

When to Use:
When you have small to medium-sized datasets.
When you need a simple, instance-based learning model.
It is sensitive to feature scaling, so it works better with normalized data.
Example Problem:
Classifying whether a person will purchase a product based on their demographic information by comparing them to similar past customers.

In [None]:
# Importing necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Using the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the KNN model
knn_model = KNeighborsClassifier(n_neighbors=3)

# Fitting the model
knn_model.fit(X_train, y_train)

# Predicting the test set results
y_pred = knn_model.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


**9. k-Means (Clustering)**
k-Means is an unsupervised learning algorithm used for clustering data. It partitions data into k clusters based on feature similarity, where each data point belongs to the cluster with the nearest mean.

When to Use:
When you need to perform clustering to group similar data points together.
When you have unlabelled data and want to find underlying patterns.
Example Problem:
Grouping customers based on their purchasing habits into clusters for targeted marketing campaigns.

In [None]:
# Importing necessary libraries
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

# Creating a sample dataset (2D points)
X = np.array([[1, 2], [2, 3], [3, 4], [5, 6], [6, 7], [8, 8], [10, 12], [11, 13], [12, 14]])

# Creating the k-Means model
kmeans_model = KMeans(n_clusters=3, random_state=42)

# Fitting the model
kmeans_model.fit(X)

# Predicting the cluster for each point
y_kmeans = kmeans_model.predict(X)

# Visualizing the clusters
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap='viridis')
plt.scatter(kmeans_model.cluster_centers_[:, 0], kmeans_model.cluster_centers_[:, 1], s=300, c='red')
plt.title('k-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()


**10. Collaborative Filtering (Recommender Systems)**
Collaborative Filtering is a technique used in recommendation systems where predictions about a user's interests are made by collecting preferences or ratings from multiple users.

When to Use:
When you want to build a recommendation system based on user-item interactions.
When you have user behavior data like ratings, purchases, or clicks.
Example Problem:
Recommending movies to users based on their viewing history and the preferences of other similar users.



In [None]:
# Importing necessary libraries
import numpy as np
from sklearn.decomposition import TruncatedSVD

# Example user-item interaction matrix (ratings)
user_item_matrix = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [0, 0, 5, 4],
    [0, 0, 5, 4],
])

# Applying SVD (Singular Value Decomposition)
svd = TruncatedSVD(n_components=2, random_state=42)
user_factors = svd.fit_transform(user_item_matrix)
item_factors = svd.components_

# Reconstructing the user-item matrix
reconstructed_matrix = np.dot(user_factors, item_factors)

# Output reconstructed matrix (predicted ratings)
print("Predicted User-Item Matrix:")
print(reconstructed_matrix)
