# Module 2 - Create your first MongoDB database (30 minutes)

* Please feel free to ask questions at any time!

# Set up the PyMongo environment

In [None]:
# pymongo library is not installed by default 

!pip install pymongo

* The 'pymongo' library contains what we need to enable Python to shake hands with MongoDB

In [None]:
# get appropriate Python libraries

from pymongo import MongoClient  # module we need from the 'pymongo' library
import pandas as pd

* It's more efficient to **only** load the modules you need from a library intead of importing the entire library!
* We need several modules from 'pandas' so we load the entire library!

# Get CSV data into Pandas

In [None]:
# make sure that the CSV file is available!

f = 'data/cars.csv'
df = pd.read_csv(f)

In [None]:
# verify that data is read properly into a Pandas DataFrame

n = 3
df.head()

# Convert DataFrame to list of dictionary elements

In [None]:
# create a list of dictionary elements for MongoDB consumption

data = df.to_dict('records')

* We convert the DataFrame into a list of dictionary elements because it is easy to load data into MongoDB this way. Actually, converting data into a list of dictionary elements is a great way to work with data because not only does it work well with MongoDB, it also works well with JSON and Python.

# Explore the dictionary elements

In [None]:
# each element in 'data' is a dictionary

# let's explore the first dictionary

type(data[0])  # verify the datatype (index starts at 0)

* Lists (arrays) and any other mechanism in Python that holds multiple data elements always starts at index of 0.

In [None]:
# get the keys

data[0].keys()

In [None]:
# get the values

data[0].values()

In [None]:
# get the contents of the first dictionary from the list

data[0]

In [None]:
# get MPG from the first dictionary

data[0]['MPG']

* We access dictionary values based on their respective keys.

In [None]:
# kind of a pain, but here is how we can get values of our choice

ls = ['Car', 'MPG', 'Origin']
for key in ls:
    print (data[0].get(key))

# for the programmers in the class, we can use list comprehension

ls = ['Car', 'MPG', 'Origin']

[data[0].get(key) for key in ls]

# Connect to MongoDB

In [None]:
# Connect to MongoDB on your local machine

client = MongoClient('localhost', port=27017)

# Connect to a MongoDB instance

In [None]:
# connect to database instance 'test' (default)

db = client.test  # in this case, the db instance is 'test'

* If instance doesn't exist, MongoDB creates one automatically!
* 'test' is the default instance
* You can put any name as the database instance (e.g., db = client.puppy)

# Connect to a MongoDB collection

In [None]:
# connect to collection 'cars' (collections hold MongoDB data)

cars = db.cars

* If collection doesn't exist, MongoDB creates one automatically!

# Drop previous instance of collection

In [None]:
# drop previous instance of 'cars' (optional, but a good idea when learning)

cars.drop()

* If you don't drop the existing instance and try to add data with an existing '_id', you will get an error because each '_id' must be unique.

# Add data to the collection

In [None]:
# add 'cars' data to 'cars' collection

for i, row in enumerate(data):
    row['_id'] = i
    cars.insert_one(row)  # create a new document for every record

In MongoDB, the **_id** key is the unique idenfier for each document. We like
to explicitly identify each *_id* key for easier processing. If you don't do
this, MongoDB will automatically assign each document a nasty looking idenfier.

Also, a collection of data is called a **collection** and each item is called a **document**.

# Verify all is well by querying the newly created collection

In [None]:
# create a simple query to verify that data was added

n = 1  # number of records to query
q = cars.find({}, {'_id':0, 'Displacement':0}).limit(n)

# use list comprehension to display cursor!

[row for row in q]

# Create a collection directly from Python

The easiset way to add your own data to MongoDB is to create a list of dictionary elements and add the list to the database. 

In [None]:
# create a simple set of documents

ls = [
    {'_id':0, 'name': 'Lilly Pond', 'city':'NYC', 'state':'NY', 'cum_order_amt':1000},
    {'_id':1, 'name': 'Honey Bee', 'city':'Atlanta', 'state':'GA', 'cum_order_amt':2000},
    {'_id':2, 'name': 'Dee Liver', 'city':'Miami', 'state':'FL', 'cum_order_amt':1500},
    {'_id':3, 'name': 'Jack Pott', 'city':'Chicago', 'state':'IL', 'cum_order_amt':12000},
    {'_id':4, 'name': 'Candy Apple', 'city':'Boston', 'state':'MA', 'cum_order_amt':8000}
    ]

* Notice that we explicitly number the '_id' field. If you don't do this, MongoDB will automatically assign a systems identifier that is kind of hard to work with.

In [None]:
# add the data to a collection

customers = db.customers
customers.drop()  # while we are learning ...
db.customers.insert_many(ls)

In [None]:
# verify results

[row for i, row in enumerate(customers.find()) if i < 1]  # just display one document

In [None]:
# Create a simple query

q = customers.find()

# use list comprehension to display cursor!

[row for i, row in enumerate(q) if i < 1]

The 'enumerate' method adds a counter to the looping mechanism. So, counter 'i' is automatically created. This is a nice feature because it saves us creating a counter, initializing it, and incrementing it in the loop. Python is all about code efficiency! 

# Module 2 Exercise

* Create a MongoDB collection
* Explicitly identify each '_id' for easier processing
* Collection should contain at least 3 elements with 4 features 
* Create a query to verify results

# Our Solution

This one is pretty easy.

1. Create your own list of at least 3 dictionary elements.
2. Each dictionary element should have at least 4 features
3. Add the data to a MongoDB collection
4. Verify that your data was created correctly

So, go ahead and take a few minutes and ask questions if you have trouble.

<strong><font color=blue>Hint:</font></strong> Just look at what we just did in the previous section.

# What did we learn?

1. we read a CSV file into a Pandas DataFrame
2. we converted the DataFrame into dictionary elements
3. we connected to MongoDB
4. we create a new collection and populated it with cars data
5. we queried the new collection cars to verify that all worked as planned
6. we worked through an exercise to sharpen our skills

## Questions?

# <font color=red>5 minute break</font>