# UNDERSTANDING THE WAY TO WORK WITH MONGODB

## GOALS

### Understand how to upload and download data to / from database. In this case will be:
### - UPLOAD DATA from CSV to MONGODB
### - DOWNLOAD DATA from MONGODB to the notebook

### MANAGING THE MONGODB ACCESS

Download the pymongo and mongoengine library to manage the mongodb database with dataframes

In [1]:
!pip3 install pymongo mongoengine



Importing the needed libraries

In [2]:
import pymongo
import mongoengine
from mongoengine import StringField, ListField, DateTimeField, DictField
from urllib.parse import quote_plus, quote
import datetime
import pprint as pp
import pandas as pd

Give value to all the variables to access to mongodb database

In [3]:
user = "sophie"
passw = "Mongodb123456!"
host = "sophie.zqjpl1q.mongodb.net"

Create a name of a new database and create this database if no exists

In [4]:
database = "management"

Declare the function to connect to database

In [5]:
#mongoengine.connect(host = "mongodb://{0}:{1}@{2}:27017/{3}" \
#    .format(user, passw, host, database))
client =  pymongo.MongoClient(
    "mongodb+srv://{0}:{1}@{2}/?retryWrites=true&w=majority" \
    .format(user, passw, host))
db = client[database]
print(client.list_database_names())

['admin', 'local']


In [6]:
db

Database(MongoClient(host=['ac-xqus6ni-shard-00-01.zqjpl1q.mongodb.net:27017', 'ac-xqus6ni-shard-00-00.zqjpl1q.mongodb.net:27017', 'ac-xqus6ni-shard-00-02.zqjpl1q.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-3pc8tt-shard-0', tls=True), 'management')

### LOADING THE DATA

Creating a function to remove all the unnamed columns in the datasets.

In [7]:
def remove_unnamed_cols(df):
  df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
  return df

Creating a function to load the data directly from GOOGLE DRIVE.

In [8]:
def load_csv_from_drive(drive_url):
    url='https://drive.google.com/uc?id=' + drive_url.split('/')[-2]
    df = remove_unnamed_cols(pd.read_csv(url))
    return df

Loading the CUSTOMERs data to pandas dataframe.

In [9]:
customer_drive_url = 'https://drive.google.com/file/d/1VerQ3_t3S5UBoN-6qteiuuntAf3pShBv/view?usp=sharing'
customer_df = load_csv_from_drive(customer_drive_url)

Showing a sample of data to validate the columns.

In [10]:
customer_df.head(2)

Unnamed: 0,customer_id,age,gender
0,C2448,76,female
1,C2449,61,male


Applying **info** function to the dataframe to show types of the columns and the NULL values.

In [11]:
customer_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 508932 entries, 0 to 508931
Data columns (total 3 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   customer_id  508932 non-null  object
 1   age          508932 non-null  int64 
 2   gender       508932 non-null  object
dtypes: int64(1), object(2)
memory usage: 11.6+ MB


Creating the collection to insert in the database

In [12]:
collection = db['customer']

Uploading dataframe to mongodb

In [13]:
customers_dict = customer_df.head(50).to_dict("records")
collection.insert_many(customers_dict)

<pymongo.results.InsertManyResult at 0x7f982269e670>

Validating that we have created a new collection on the database

In [14]:
print(client.list_database_names())

['management', 'admin', 'local']


Dowloading customers from mongodb collection recently created

In [15]:
customers_mongodb_df = pd.DataFrame(list(collection.find()))

In [16]:
customers_mongodb_df.head(5)

Unnamed: 0,_id,customer_id,age,gender
0,63f771f34864bf6e2a48d67c,C2448,76,female
1,63f771f34864bf6e2a48d67d,C2449,61,male
2,63f771f34864bf6e2a48d67e,C2450,58,female
3,63f771f34864bf6e2a48d67f,C2451,62,female
4,63f771f34864bf6e2a48d680,C2452,71,male


Dowloading customers from mongodb collection recently created with age greater than 74

In [17]:
customers_gt74_mongodb_df = pd.DataFrame(list(collection.find(
    {"age": {"$gt": 74}})))

In [18]:
customers_gt74_mongodb_df.head(5)

Unnamed: 0,_id,customer_id,age,gender
0,63f771f34864bf6e2a48d67c,C2448,76,female
1,63f771f34864bf6e2a48d6a9,C2493,75,male


In [68]:
customers_gt74_mongodb_df[customers_gt74_mongodb_df["gender"] == 'female']

Unnamed: 0,_id,customer_id,age,gender
0,63f6979d19fda64196915be5,C2448,76,female
1,63f697e019fda64196915be7,C2448,76,female
