<a href="https://colab.research.google.com/github/hariomvyas/AIhub/blob/main/AIHub_Machine_Learning_Models_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIHub Machine Learning Models Template by Hariom Vyas

## Supervised Learning Models

These are the models where the algorithm is trained using labeled data (data that has an outcome variable or label), with the aim of predicting the outcome of new, unseen data. Examples of supervised learning models include linear regression, logistic regression, decision trees, random forests, support vector machines, and artificial neural networks.

### Linear Regression

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print("RMSE:", rmse)


### Logistic Regression

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Decision Trees

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Random Forest model

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Support Vector Machines (SVM)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = SVC()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Naive Bayes

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = GaussianNB()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### K-Nearest Neighbors (KNN)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.3, random_state=42)

# Step 4: Define and train the model
model = KNeighborsClassifier()
model.fit(X_train, y_train)

# Step 5: Make predictions on the testing set
y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Neural Networks (Multilayer Perceptron)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = MLPClassifier(hidden_layer_sizes=(100,), activation='relu', solver='adam', max_iter=1000, random_state=42)
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Gradient Boosting Machines (GBM)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = GradientBoostingClassifier()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Extreme Gradient Boosting (XGBoost)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = xgb.XGBClassifier()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### LightGBM

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
d_train = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'binary', 'metric': 'binary_logloss'}
model = lgb.train(params, d_train, 100)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
y_pred_binary = [1 if x >= 0.5 else 0 for x in y_pred]
accuracy = accuracy_score(y_test, y_pred_binary)
print("Accuracy:", accuracy)


### CatBoost

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import catboost as cb
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = cb.CatBoostClassifier()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### AdaBoost

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = AdaBoostClassifier()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


### Ridge Regression

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import RidgeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = RidgeClassifier()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy, accuracy)

### Lasso Regression

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = Lasso(alpha=0.1)
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


### Elastic Net

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = ElasticNet(alpha=0.1, l1_ratio=0.5)
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


### Bayesian Regression

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = BayesianRidge()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


### Gaussian Processes

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
kernel = RBF(1.0)
model = GaussianProcessRegressor(kernel=kernel, alpha=1e-10)
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


### Linear Discriminant Analysis (LDA)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

### Quadratic Discriminant Analysis (QDA)

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)

# Step 4: Train the model
model = QuadraticDiscriminantAnalysis()
model.fit(X_train, y_train)

# Step 5: Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

## Unsupervised Learning Models

These are the models where the algorithm is trained using unlabeled data (data that does not have an outcome variable or label), with the aim of discovering patterns, relationships, or clusters in the data. Examples of unsupervised learning models include k-means clustering, hierarchical clustering, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE).

### K-means Clustering

#### K-means Clustering with Elbow Method

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Scale the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Step 4: Find the optimal number of clusters using the elbow method
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    kmeans.fit(data_scaled)
    wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

# Step 5: Train the model
n_clusters = 3 # set the number of clusters based on the elbow method
model = KMeans(n_clusters=n_clusters)
model.fit(data_scaled)

# Step 6: Evaluate the model
labels = model.labels_


Note that in Step 4 of the K-means Clustering with the Elbow Method, we are using the Within-Cluster Sum of Squares (WCSS) to determine the optimal number of clusters. The Elbow Method involves plotting the WCSS values for different values of K and looking for an "elbow" in the plot where the WCSS begins to level off. The number of clusters associated with the elbow point is then chosen as the optimal number of clusters.

#### K-means Clustering with Silhouette Method

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Step 2: Load the data
data = pd.read_csv('data.csv')

# Step 3: Scale the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Step 4: Find the optimal number of clusters using the silhouette method
silhouette_scores = []
for n_clusters in range(2, 11):
    kmeans = KMeans(n_clusters=n_clusters)
    labels = kmeans.fit_predict(data_scaled)
    silhouette_avg = silhouette_score(data_scaled, labels)
    silhouette_scores.append(silhouette_avg)

optimal_n_clusters = silhouette_scores.index(max(silhouette_scores)) + 2

plt.plot(range(2, 11), silhouette_scores)
plt.title('Silhouette Method')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette Score')
plt.show()

# Step 5: Train the model
n_clusters = optimal_n_clusters # set the number of clusters based on the silhouette method
model = KMeans(n_clusters=n_clusters)
model.fit(data_scaled)

# Step 6: Evaluate the model
labels = model.labels_

In Step 4 of the K-means Clustering with the Silhouette Method, we are using the Silhouette Score to determine the optimal number of clusters. The Silhouette Score measures the similarity of each data point to its own cluster compared to other clusters. A higher Silhouette Score indicates that the data point is better matched to its own cluster than to neighboring clusters. The optimal number of clusters is chosen as the number of clusters associated with the highest Silhouette Score.

### Hierarchical clustering

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Perform hierarchical clustering using ward linkage and euclidean distance
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))

# Fit the hierarchical clustering model using the appropriate number of clusters
hc = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(X)

# Visualize the clusters
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.title('Hierarchical Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

This template performs hierarchical clustering using the ward linkage method and euclidean distance. The appropriate number of clusters is determined based on the dendrogram plot. Finally, the clusters are visualized in a scatter plot.

### Principal Component Analysis (PCA)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Perform PCA with n_components=2
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Visualize the data in the reduced dimensions
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA')
plt.show()

This template performs PCA on a dataset with two features. First, the data is standardized using StandardScaler. Then, PCA is performed with n_components=2. Finally, the data is visualized in the reduced dimensions.

### Independent Component Analysis (ICA)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.decomposition import FastICA

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Perform ICA with n_components=2
ica = FastICA(n_components=2)
X_ica = ica.fit_transform(X)

# Visualize the data in the independent components
plt.scatter(X_ica[:, 0], X_ica[:, 1])
plt.xlabel('Independent Component 1')
plt.ylabel('Independent Component 2')
plt.title('ICA')
plt.show()

This template performs ICA on a dataset with two features. ICA is performed with n_components=2. Finally, the data is visualized in the independent components.

### t-Distributed Stochastic Neighbor Embedding (t-SNE)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.manifold import TSNE

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Perform t-SNE with n_components=2 and perplexity=30
tsne = TSNE(n_components=2, perplexity=30)
X_tsne = tsne.fit_transform(X)

# Visualize the data in the reduced dimensions
plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.title('t-SNE')
plt.show()

This template performs t-SNE on a dataset with two features. t-SNE is performed with n_components=2 and perplexity=30. Finally, the data is visualized in the reduced dimensions.

### Autoencoders

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Define the input shape
input_shape = (X.shape[1],)

# Define the encoder
input_layer = Input(shape=input_shape)
encoded = Dense(64, activation='relu')(input_layer)
encoded = Dense(32, activation='relu')(encoded)
encoded = Dense(16, activation='relu')(encoded)

# Define the decoder
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(X.shape[1], activation='linear')(decoded)

# Define the autoencoder
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# Train the autoencoder
autoencoder.fit(X, X, epochs=50, batch_size=32, shuffle=True)

# Use the encoder to get the encoded representation of the data
encoder = Model(input_layer, encoded)
X_encoded = encoder.predict(X)

# Visualize the data in the encoded dimensions
plt.scatter(X_encoded[:, 0], X_encoded[:, 1])
plt.xlabel('Encoded Dimension 1')
plt.ylabel('Encoded Dimension 2')
plt.title('Autoencoder')
plt.show()


This template defines an autoencoder with 3 encoding layers and 3 decoding layers. The autoencoder is trained on the input data, and then the encoder is used to get the encoded representation of the data. Finally, the data is visualized in the encoded dimensions.

### Gaussian Mixture Models (GMM)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.mixture import GaussianMixture

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Define the number of clusters
n_clusters = 3

# Initialize the Gaussian Mixture Model
gmm = GaussianMixture(n_components=n_clusters)

# Fit the GMM on the data
gmm.fit(X)

# Get the labels for each data point
labels = gmm.predict(X)

# Visualize the data with different colors for each cluster
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Gaussian Mixture Model')
plt.show()

This template defines a Gaussian Mixture Model with n_clusters=3. The model is fit on the data, and then the labels are obtained for each data point. Finally, the data is visualized with different colors for each cluster.

### Anomaly detection models (e.g. one-class SVM)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.svm import OneClassSVM

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, [2, 3]].values

# Define the nu parameter
nu = 0.05

# Initialize the One-Class SVM model
ocsvm = OneClassSVM(nu=nu)

# Fit the model on the data
ocsvm.fit(X)

# Get the predictions for each data point
y_pred = ocsvm.predict(X)

# Visualize the data with different colors for inliers and outliers
import matplotlib.pyplot as plt

plt.scatter(X[y_pred==1, 0], X[y_pred==1, 1], c='blue', label='inliers')
plt.scatter(X[y_pred==-1, 0], X[y_pred==-1, 1], c='red', label='outliers')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('One-Class SVM Anomaly Detection')
plt.legend()
plt.show()


This template defines a one-class SVM anomaly detection model with nu=0.05. The model is fit on the data, and then the predictions are obtained for each data point. Finally, the data is visualized with different colors for inliers and outliers.

### Self-Organizing Maps (SOM)

In [None]:
# Import the necessary libraries
import pandas as pd
import numpy as np
from minisom import MiniSom
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('data.csv')

# Extract the features
X = data.iloc[:, :-1].values

# Define the SOM parameters
input_len = X.shape[1]
map_size = (10, 10)
sigma = 1.0
learning_rate = 0.5

# Initialize the SOM model
som = MiniSom(map_size[0], map_size[1], input_len, sigma=sigma, learning_rate=learning_rate)

# Train the SOM model
som.random_weights_init(X)
som.train_random(X, 100)

# Visualize the SOM model
plt.figure(figsize=(10, 10))
plt.pcolor(som.distance_map().T, cmap='bone_r')
plt.colorbar()

# Add markers for the data points
markers = ['o', 's']
colors = ['C0', 'C1']
for i, x in enumerate(X):
    w = som.winner(x)
    plt.plot(w[0]+0.5, w[1]+0.5, markers[y[i]], markerfacecolor='None',
             markeredgecolor=colors[y[i]], markersize=10, markeredgewidth=2)

plt.title('Self-Organizing Map')
plt.show()


This template defines a Self-Organizing Map (SOM) with a 10x10 grid and a sigma of 1.0 and a learning_rate of 0.5. The model is trained on the data for 100 epochs. The SOM is then visualized, with markers added for each data point. The markers are colored based on their corresponding class.

### **Note that this is not an exhaustive list and there are many other unsupervised learning models.**

## Semi-supervised Learning Models

These are the models that use a combination of labeled and unlabeled data for training. This approach is used when there is limited labeled data available or labeling the data is expensive. Examples of semi-supervised learning models include self-training, co-training, and multi-view learning.

## Reinforcement Learning Models

These are the models where the algorithm learns by interacting with an environment and receiving feedback in the form of rewards or penalties. The goal of reinforcement learning is to learn a policy (a set of rules) that maximizes the cumulative reward over time. Examples of reinforcement learning models include Q-learning, SARSA, and deep reinforcement learning.

## Deep Learning Models

These are the models that use artificial neural networks with multiple layers to learn hierarchical representations of the data. Deep learning models are particularly good at handling complex, high-dimensional data, such as images, audio, and natural language text. Examples of deep learning models include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer models.

## Other types of models

Other types of machine learning models include Bayesian models, decision-theoretic models, and ensemble models (e.g., bagging, boosting, and stacking).