# Machine Learning Model Storage in MongoDB

This tutorial will guide you through the steps to save and load a machine learning model's weights in a MongoDB database using Python. We will use the Iris dataset and a Support Vector Machine (SVM) classifier for this example.

## Prerequisites

Make sure you have the following libraries installed:
- `pymongo`
- `scikit-learn`
- `python-dotenv`

You can install them using pip:

```bash
pip install pymongo scikit-learn python-dotenv
```

## Import libraries

In [None]:
import os
import pickle
from pymongo import MongoClient
from dotenv import load_dotenv
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Upload dataset to MongoDB

## Step 1: Load the Iris dataset

In [4]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

## Step 2: Connect to MongoDB

In [5]:
MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']
collection = db['iris_collection']

## Step 3: Convert DataFrame to a list of dictionaries

In [6]:
data_dict = df.to_dict("records")

## Step 4: Insert data into MongoDB collection

In [7]:
collection.insert_many(data_dict)

InsertManyResult([ObjectId('66ae80e98ec966914050d660'), ObjectId('66ae80e98ec966914050d661'), ObjectId('66ae80e98ec966914050d662'), ObjectId('66ae80e98ec966914050d663'), ObjectId('66ae80e98ec966914050d664'), ObjectId('66ae80e98ec966914050d665'), ObjectId('66ae80e98ec966914050d666'), ObjectId('66ae80e98ec966914050d667'), ObjectId('66ae80e98ec966914050d668'), ObjectId('66ae80e98ec966914050d669'), ObjectId('66ae80e98ec966914050d66a'), ObjectId('66ae80e98ec966914050d66b'), ObjectId('66ae80e98ec966914050d66c'), ObjectId('66ae80e98ec966914050d66d'), ObjectId('66ae80e98ec966914050d66e'), ObjectId('66ae80e98ec966914050d66f'), ObjectId('66ae80e98ec966914050d670'), ObjectId('66ae80e98ec966914050d671'), ObjectId('66ae80e98ec966914050d672'), ObjectId('66ae80e98ec966914050d673'), ObjectId('66ae80e98ec966914050d674'), ObjectId('66ae80e98ec966914050d675'), ObjectId('66ae80e98ec966914050d676'), ObjectId('66ae80e98ec966914050d677'), ObjectId('66ae80e98ec966914050d678'), ObjectId('66ae80e98ec966914050d6

## Step 5: Check the number of documents inserted

In [8]:
document_count = collection.count_documents({})

print(f'The number of documents in the collection is: {document_count}')

The number of documents in the collection is: 150


# Retrieve the Iris dataset from the database, and create a classifier using the retrieved data

## Step 1: Connect to MongoDB and retrieve the data

In [10]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['iris_collection']  # Access the 'iris_collection'

# Retrieve the data from the collection
data = list(collection.find({}))
df = pd.DataFrame(data)

# Drop the MongoDB specific fields
df.drop(columns=['_id'], inplace=True)

## Step 2: Preprocess the data

In [11]:
X = df.drop(columns=['target']).values
y = df['target'].values

scaler = StandardScaler()
X = scaler.fit_transform(X)

## Step 3: Split the data into training and testing sets

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Train a Support Vector Machine (SVM) classifier

In [13]:
model = SVC(kernel='linear', random_state=42)
model.fit(X_train, y_train)

## Step 5: Evaluate the model

In [14]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]


# Store the model's weights in a MongoDB database

## Continuing from the previous code, we just need to change the database and the collection.

In [16]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Create a collection to store models

# Serialize the model
model_bytes = pickle.dumps(model)

# Store the model in the collection
model_document = {
    'model_name': 'svm_iris',
    'model_data': model_bytes
}
collection.insert_one(model_document)

print("Model successfully saved to MongoDB")

Model successfully saved to MongoDB


# Loading the Model Weights

## Step 1: Connect to MongoDB and retrieve the model

In [17]:
# MONGO_CONNECTION_STRING = "WRITE_MONGO_CONNECTION_STRING_HERE"

client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']  # Access the 'iris_database'
collection = db['models']  # Access the 'models' collection

# Retrieve the model from the collection
model_document = collection.find_one({'model_name': 'svm_iris'})
model_bytes = model_document['model_data']

## Step 2: Deserialize the model

In [18]:
model = pickle.loads(model_bytes)

## Step 3: Load the Iris dataset and preprocess it

In [19]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Evaluate the loaded model

In [20]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
