# Machine Learning Model Storage in MongoDB Exercise

This exercise will help you practice loading a machine learning model's weights from a MongoDB database using Python.

## Prerequisites

Make sure you have the following libraries installed:
- `pymongo`
- `scikit-learn`
- `python-dotenv`

You can install them using pip:

```bash
pip install pymongo scikit-learn python-dotenv
```

In [1]:
! pip install pymongo scikit-learn python-dotenv


Collecting pymongo
  Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-dotenv, dnspython, pymongo
Successfully installed dnspython-2.6.1 pymongo-4.8.0 python-dotenv-1.0.1


## Accessing the Prepared Database

We have set up a MongoDB database that contains the Iris dataset and pre-trained model weights for your practice. Here are the details to connect and access the data:

### Database Structure
- **Database Name**: `iris_database`
- **Collections**:
  - **Iris Dataset**: Stored in the `iris_collection`.
  - **Model Weights**: Stored in the `Models` collection.

### Connection Details
Use the following connection string to connect to the database via MongoDB:

```python
MONGO_CONNECTION_STRING = "MONGO_CONNECTION_STRING"
```

## Import libraries

In [2]:
import os
import pickle
from pymongo import MongoClient
from dotenv import load_dotenv
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Retrieve the Iris dataset from the database, and create a classifier using the retrieved data

## Step 1: Connect to MongoDB and retrieve the data

In [3]:
MONGO_CONNECTION_STRING = "MONGO_CONNECTION_STRING"

# Iris Dataset: Stored in the iris_collection.
# Model Weights: Stored in the Models collection.
client = MongoClient(MONGO_CONNECTION_STRING)
db = client['iris_database']
iris_coll = db['iris_collection']  # Access the 'iris_collection'

iris_data = list(iris_coll.find({}))


iris_df = pd.DataFrame(iris_data)

# Drop column "_id" that come from mongo
print(iris_df.drop(columns=["_id"], inplace=True))


None


## Step 2: Preprocess the data

In [4]:
print(iris_df.head())


   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [6]:
X = iris_df.drop(columns=["target"]).values
y = iris_df[["target"]].values


scaler = StandardScaler()

X = scaler.fit_transform(X)

X[:10]


array([[-0.90068117,  1.01900435, -1.34022653, -1.3154443 ],
       [-1.14301691, -0.13197948, -1.34022653, -1.3154443 ],
       [-1.38535265,  0.32841405, -1.39706395, -1.3154443 ],
       [-1.50652052,  0.09821729, -1.2833891 , -1.3154443 ],
       [-1.02184904,  1.24920112, -1.34022653, -1.3154443 ],
       [-0.53717756,  1.93979142, -1.16971425, -1.05217993],
       [-1.50652052,  0.78880759, -1.34022653, -1.18381211],
       [-1.02184904,  0.78880759, -1.2833891 , -1.3154443 ],
       [-1.74885626, -0.36217625, -1.34022653, -1.3154443 ],
       [-1.14301691,  0.09821729, -1.2833891 , -1.44707648]])

## Step 3: Split the data into training and testing sets

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Train a Support Vector Machine (SVM) classifier

In [10]:
svm_model = SVC(kernel="linear", random_state=42)

svm_model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


## Step 5: Evaluate the model

In [11]:
y_pred = svm_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

Accuracy: 96.67%
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]


# Loading the Model Weights

## Step 1: Connect to MongoDB and retrieve the model

In [19]:
# MONGO_CONNECTION_STRING = "MONGO_CONNECTION_STRING"

# client = MongoClient(MONGO_CONNECTION_STRING)
# db = client['iris_database']  # Access the 'iris_database'
# collection = db['models']  # Create a collection to store models

model_coll = db["models"]

model_doc = model_coll.find_one({
    "model_name": "svm_iris"
})

model_bytes = model_doc["model_data"]

## Step 2: Deserialize the model

In [23]:
svc_model = pickle.loads(model_bytes)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## Step 3: Load the Iris dataset and preprocess it

In [24]:
iris = datasets.load_iris()
X = iris.data
y = iris.target


scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 4: Evaluate the loaded model

In [25]:
y_pred = svc_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy {accuracy * 100:.2f}%")

print("Classification report", classification_report(y_test, y_pred))

print("Confusion matrix", confusion_matrix(y_test, y_pred))



Accuracy 96.67%
Classification report               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Confusion matrix [[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
