# Intro to MongoDB and the Nobel Prize Dataset

## MongoDB Structure

- Data objects represented by ***documents***
- Documents organized into ***collections***
- Collections make up a ***database***

<br>

## Data Structure

object → {`field`: `value`, `field1`: `value1`, ...}


fields: `string`
<br>
values:  `string`, `int/double`, `true`, `false`, `null`, `array`, `object`, ...

<br>

*example:*

```
{
    name: Sue,
    age: 28,
    lawSpecialties: [copyright, tax],
    canMeet:
        { 
            mon: True,
            tues: False,
            wed: True,
            thurs: True,
            fri: False,
        }    
}
```

<br>

## JavaScript Object Notation (JSON)

object → {`string`: `value`, `string1`: `value1`, ...}



values: `string`, `number`, `true`, `false`, `null`, `object`, `array`

<br>

## JSON $\longleftrightarrow$ Python

objects → {`string`: `value`,`string1`: `value1`, ...}

---> *dictionaries* (with `str` keys)

arrays → [`value`, `value1`, ...]

---> *lists*

values: `string`, `number`, `true`, `false`, `null`, `object`, `array`

```python
str, int, float, True, False, None, dict, list
```


<br>

- Construct a dictionary that mirrors the example data structure above.

In [None]:
sample_dict = {__}

<br>

## Accessing MongoDB

We can access our MongoDB databases using an instance of `MongoClient` from the `pymongo` package.

In [None]:
from pymongo import MongoClient

client = MongoClient()
print(client)

<br>

You can access databases and collections as attributes and/or treat them as dictionary keys.

In [None]:
client.nobel == client["nobel"]

In [None]:
client.nobel.prizes == client["nobel"]["prizes"]

We can also connect to the nobel database.

In [None]:
db = client.nobel
print(db)

<br>

## Searching for documents 

Let's see what a document looks like in the `prizes` collection using the `find_one()` method. This takes an optional `filter` argument. Passing an empty filter (`{}`) is the same as passing no filter. In Python, the returned document takes the form of a dictionary. The keys of the dictionary are the (root-level) "fields" of the document.

In [None]:
db.__.__()

We can now add a filter to our search to ensure that the returned `prizes` document contains data for a physics nobel prize.

In [None]:
criteria = {__}
db.__.__(__)

<br>

You may iterate over a collection, collecting from each document. However, a collection is not a list, so we can't write `for doc in <collection>` to iterate over documents. Instead, we can use the `find()` method to produce an iterable called a _cursor_, and instead write `for doc in <collection>.find()` to iterate over documents.

- Save a list of the fields present in each document to `prize_fields` and `laureate_fields`. Recall that the `keys()` method of a dictionary returns a _view_ of its keys -- You need to pass this view to Python's `list` constructor to obtain a list.

In [None]:
# Get lists of the fields present in each type of document
prize_fields = __(db.__.__({}).__())
laureate_fields = __(db.__.__({}).__())

print(prize_fields)
print(laureate_fields)

- Using `find()` to iterate over laureate documents, sum the total number of laureate prizes. The length of `doc["prizes"]` for a laureate document `doc` is the number of prizes won by that laureate. Store the sum in the variable `count`.

In [None]:
# Compute the total number of laureate prizes
count = __(__(__) for __ in __.__.__(__))
print(count)