# Creating Databases and Collections in _pymongo_

Continuing the work of sections 4-02 and 4-03, we will be again using _pymongo_ to interact with MongoDB. We should being by re-establishing our connection to the software:

In [1]:
from pymongo import MongoClient
import urllib

# final version
con_str = "mongodb+srv://michaelm:" + urllib.parse.quote_plus("B!gD@t@T3(h") + "@cluster0.m2kzp.mongodb.net/sample_analytics?retryWrites=true&w=majority"

client = MongoClient(con_str)

  cert._x509 = crypto_cert._x509


We'll also create some basic data to add to MongoDB:

In [2]:
bdtvdata = {"day1": {"topics": ["module intro", "big data", "web scraping", "APIs"],
                 "staff": ["Micheal", "Jordon"]
                },
        "day2": {"topics": ["noSQL", "MapReduce", "spark", "orchestration", "FaaS"],
                 "staff": ["Micheal", "Jordon", "AK", "Liping"]
                },
        "day3": {"topics": "visualisation",
                 "staff": ["Micheal", "Jordon", "Liping"]
                },
        "day4": {"topics": ["tableau", "json", "mongodb"],
                 "staff": ["Micheal", "Jordon", "Liping"]
                },
        "day5": {"topics": False,
                 "staff": False
                }
}

To put this in MongoDB we should first create a database. _pymongo_ makes this relatively easy to do:

In [3]:
newdb = client["bdtv"]

We can check this has worked by running the following:

In [4]:
print(client.list_database_names())



['bdtv', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']


Where is our database? Its a quirk of MongoDB that even if a database is setup it will not show unless we add at least one document (record). But before we do that, we should add a collection:

In [5]:
newcollect = newdb["schedule"]

Again, our collection should be built but we won't see it until we add a document. So let's get on with it:

In [6]:
for key in bdtvdata.keys():
    newcollect.insert_one({key: bdtvdata[key]})

We can check this has worked by printing database names (as before):

In [7]:
print(client.list_database_names())

['bdtv', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']


We can also call some data from the collection (using the techniques in 4-03):

In [2]:
record = newcollect.find_one()
record

NameError: name 'newcollect' is not defined

## Creating a Collection from a _Pandas_ DataFrame

Again, this is a very easy process (this is much easier than 4-03 right?). We'll being by importing some data:

In [9]:
import pandas as pd

irisdf = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
irisdf.head()



Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Here we've simply imported in the well known "Iris" dataset. We can stick with the same database but we should build a new collection:

In [10]:
newcollect = newdb["iris"]

All we need to do know is convert our DataFrame to a dictionary, and we can insert as before. Note, we convert the key into a string (currently an integer) as MongoDB expects strings:

In [11]:
data_dict = irisdf.to_dict('index') # convert to dictionary

for record in data_dict:
    newcollect.insert_one({str(record): data_dict[record]}) # MongoDB expects string keys

And that's its! As you can see, and as we would expect with NoSQL (this is _schema on read_ not _schema on write_ after all), it is relatively easy and quick to add data to MongoDB ... all we need is a Python dictionary!