
---

# ***`MongoDB Experiments`***

---

- The main objective of this notebook is to send data from our current folder to mongodb database, so later on we can directly fetch it from mongodb. 

### **Import libraries**

In [10]:
import os
import pymongo
import pandas as pd
from dotenv import load_dotenv

import warnings
warnings.filterwarnings('ignore')

### **Load dataset in notebook**

In [2]:
df = pd.read_csv("data.csv")
df.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,1,Male,44,1,28.0,0,> 2 Years,Yes,40454.0,26.0,217,1
1,2,Male,76,1,3.0,0,1-2 Year,No,33536.0,26.0,183,0
2,3,Male,47,1,28.0,0,> 2 Years,Yes,38294.0,26.0,27,1
3,4,Male,21,1,11.0,1,< 1 Year,No,28619.0,152.0,203,0
4,5,Female,29,1,41.0,1,< 1 Year,No,27496.0,152.0,39,0


### **Check info of data**

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 381109 entries, 0 to 381108
Data columns (total 12 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   id                    381109 non-null  int64  
 1   Gender                381109 non-null  object 
 2   Age                   381109 non-null  int64  
 3   Driving_License       381109 non-null  int64  
 4   Region_Code           381109 non-null  float64
 5   Previously_Insured    381109 non-null  int64  
 6   Vehicle_Age           381109 non-null  object 
 7   Vehicle_Damage        381109 non-null  object 
 8   Annual_Premium        381109 non-null  float64
 9   Policy_Sales_Channel  381109 non-null  float64
 10  Vintage               381109 non-null  int64  
 11  Response              381109 non-null  int64  
dtypes: float64(3), int64(6), object(3)
memory usage: 34.9+ MB


### **Convert data into dictionary format**

- In order to push data in mongodb, data should be in `dictionay` format

In [5]:
data = df.to_dict(orient='records')

In [8]:
data[0:2] # check forst 2 converted records

[{'id': 1,
  'Gender': 'Male',
  'Age': 44,
  'Driving_License': 1,
  'Region_Code': 28.0,
  'Previously_Insured': 0,
  'Vehicle_Age': '> 2 Years',
  'Vehicle_Damage': 'Yes',
  'Annual_Premium': 40454.0,
  'Policy_Sales_Channel': 26.0,
  'Vintage': 217,
  'Response': 1},
 {'id': 2,
  'Gender': 'Male',
  'Age': 76,
  'Driving_License': 1,
  'Region_Code': 3.0,
  'Previously_Insured': 0,
  'Vehicle_Age': '1-2 Year',
  'Vehicle_Damage': 'No',
  'Annual_Premium': 33536.0,
  'Policy_Sales_Channel': 26.0,
  'Vintage': 183,
  'Response': 0}]

### **Setup MongoDB Connection**

In [14]:
# set up username and passwoord

# Load environment variables
load_dotenv()

mongodb_username = os.getenv("MONGODB_USERNAME")
mongodb_password = os.getenv("MONGODB_PASSWORD")

In [17]:
DB_NAME = "vehicle-mlops-project"
COLLECTION_NAME = "vehicle-data"
CONNECTION_URL = f"mongodb+srv://madilnaeem0:{mongodb_password}@cluster0.kra1i.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

# above, either remove your credentials or delete the mongoDB resource bofore pushing it to github.

In [18]:
# Establish a connection to the MongoDB server using the specified connection URL
client = pymongo.MongoClient(CONNECTION_URL)

# Access the database with the given name from the MongoDB client
data_base = client[DB_NAME]

# Access the specified collection within the database
collection = data_base[COLLECTION_NAME]

### **Uploading data to MongoDB**

In [19]:
rec = collection.insert_many(data)

In [20]:
print(type(rec))

<class 'pymongo.results.InsertManyResult'>


### **Load back data from mongodb**

In [21]:
df = pd.DataFrame(list(collection.find()))
df.head(2)

Unnamed: 0,_id,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,67de5954a0645bf6eb7a263b,1,Male,44,1,28.0,0,> 2 Years,Yes,40454.0,26.0,217,1
1,67de5954a0645bf6eb7a263c,2,Male,76,1,3.0,0,1-2 Year,No,33536.0,26.0,183,0


---