# Intro to MongoDB and the Nobel Prize Dataseta: Notes

Welcome to my intro to MongoDB with Python. MongoDB is a tool that helps you explore data without requiring it to have a strict, known structure. Because of this, you can handle diverse data together and unify analytics. You can also keep improving and fix issues as your requirements evolve. Most application programming interfaces - or APIs - on the web today expose a certain data format. If you can express your data in this format, then you can get started with MongoDB. Here's what I mean:


## JavaScript Object Notation (JSON)

Javascript is the language of web browsers. JavaScript Object Notation, or JSON, is a common way that web services and client code pass data. JSON is also the basis of MongoDB's data format. So, what is JSON? JSON has two collection structures.

Objects map string keys to values, and arrays order values. Values, in turn, are one of a few things. Values are strings, numbers, the value "true", the value "false", the value "null", or another object or array. That's it.

- objects: {}, {`string`: `value`}, {`string1`: `value1`, ...} {{0}}

- arrays: [], [`value`], [`value1`, ...] {{1}}

- values:  `string`, `number`, `true`, `false`, `null`, `object`, `array`{{2}}

## JSON $\longleftrightarrow$ Python

These JSON data types have equivalents in Python.

JSON objects are like Python dictionaries with string-type keys.

- objects: {}, {`string`: `value`}, {`string1`: `value1`, ...}
      ---> *dictionaries* (with `str`keys) {{0}}

Arrays are like Python lists.

- arrays: [], [`value`], [`value1`, ...]
      --->   *lists* {{1}}


And the values I mentioned also map to Python. For example, null in JSON maps to None in Python.


- values:  `string`, `number`, `true`, `false`, `null`, `object`, `array`
      ---> str, int/float, True, False, None, dict, list {{2}}

## JSON $\longleftrightarrow$ Python $\longleftrightarrow$ MongoDB

Now, how are these JSON/Python types expressed in MongoDB?



A database maps names to collections. You can access collections by name the same way you would access values in a Python dictionary.

- objects: {}, {`string`: `value`}, {`string1`: `value1`, ...}
      ---> *dictionaries* (with `str`keys)
      ---> ***databases***, ***documents***, ***subdocuments*** {{0}}


A collection, in turn, is like a list of dictionaries, called "documents" by MongoDB. When a dictionary is a value within a document, that's a subdocument.

- arrays: [], [`value`], [`value1`, ...]
      --->   *lists*
      --->  ***collections*** (of documents),  ***arrays*** (within documents) {{1}}

Values in a document can be any of the types I mentioned. MongoDB also supports some types native to Python but not to JSON. Two examples are dates and regular expressions.

- values:  `string`, `number`, `true`, `false`, `null`, `object`, `array`
      ---> str, int/float, True, False, None, dict, list, <datetime>, <re pattern>, ...
      ---> string, int/long/double, true, false, null, object, array, <date>, <regex>, ... {{2}}

## The Nobel Prize API data(base)

Let's make concrete how JSON maps to Python and in turn to MongoDB. Here is how I accessed the Nobel Prize API and collected its data into a Mongo database for you.

First, I import the requests library, which will get the data from the API. I also import the MongoClient class from pymongo. Pymongo is the official Python driver for MongoDB.

In [None]:
import requests
from pymongo import MongoClient

Then, I connect to my local database server. I say that I want a database with the name "nobel", and MongoDB creates it.

In [None]:
# Client connects to "localhost" by default
client = MongoClient()
# Create local "nobel" database on the fly
db = client["nobel"]

Finally, I gather JSON responses for the "prize" and "laureate" endpoints. I insert them into the "prizes" and "laureates" collections, which Mongo also creates for me.

In [None]:
# API documented at https://nobelprize.readme.io/docs/prize
for collection_name in ["prizes", "laureates"]:
    singular = collection_name[:-1]
    response = requests.get(
        "http://api.nobelprize.org/v1/{}.json".format(singular))
    documents = response.json()[collection_name]
    # Create collections on the fly
    db[collection_name].insert_many(documents)

## Counting Documents, and Finding One to Inspect

Now, let's go over how to count documents in a collection, and how to find one to inspect.

First, a note on accessing databases and collections from a client object. One way is square bracket notation, as if a client is a dictionary of databases, with database names as keys. A database in turn is like a dictionary of collections, with collection names as keys. Another way to access things is dot notation. Databases are attributes of a client, and collections are attributes of a database.

In [None]:
# You can also access dbs and collections as attributes
assert client.nobel == db
assert db.prizes == db["prizes"]

To count documents, use the "count_documents " collection method. Pass a filter document to limit what you count. In this case, I want an unfiltered total count, so I pass an empty document as the filter.

In [None]:
# Count documents
n_prizes = db.prizes.count_documents({})
n_laureates = db.laureates.count_documents({})

Finally, you can fetch a document and infer the schema of the raw JSON data given by the Nobel Prize API. Use the find_one method, again with no filter, to grab a document from the collection.

In [None]:
# Find one document to inspect
doc = db.prizes.find_one({})

## Let's practice!

Now, let's practice. You'll access databases and collections from a connected client. You'll count documents, and you'll inspect them.
