In [1]:
# Import libraries
import requests
import json
import numpy as np

from pymongo import MongoClient
from pprint import pprint

# 01. Flexibly Structured Data

This chapter is about getting a bird's-eye view of the Nobel Prize data's structure. You will relate MongoDB documents, collections, and databases to JSON and Python types. You'll then use filters, operators, and dot notation to explore substructure.

## 01.01 Intro to MongoDB and the Nobel Prize dataset

1. Welcome
>Welcome to my intro to MongoDB with Python. MongoDB is a tool that helps you explore data without requiring it to have a strict, known structure. Because of this, you can handle diverse data together and unify analytics. You can also keep improving and fix issues as your requirements evolve. Most application programming interfaces - or APIs - on the web today expose a certain data format. If you can express your data in this format, then you can get started with MongoDB. Here's what I mean:

2. JavaScript Object Notation (JSON)
>Javascript is the language of web browsers. JavaScript Object Notation, or JSON, is a common way that web services and client code pass data. JSON is also the basis of MongoDB's data format. So, what is JSON? JSON has two collection structures. Objects map string keys to values, and arrays order values. Values, in turn, are one of a few things.

3. JavaScript Object Notation (JSON)
>Values, in turn, are one of a few things. Values are strings, numbers, the value "true", the value "false", the value "null", or another object or array. That's it.

4. JSON <> Python
>These JSON data types have equivalents in Python. JSON objects are like Python dictionaries with string-type keys. Arrays are like Python lists. And the values I mentioned also map to Python. For example, null in JSON maps to None in Python.

5. JSON <> Python <> MongoDB
>Now, how are these JSON/Python types expressed in MongoDB? A database maps names to collections. You can access collections by name the same way you would access values in a Python dictionary. A collection, in turn, is like a list of dictionaries, called "documents" by MongoDB. When a dictionary is a value within a document, that's a subdocument. Values in a document can be any of the types I mentioned. MongoDB also supports some types native to Python but not to JSON. Two examples are dates and regular expressions.

6. The Nobel Prize API data(base)
>Let's make concrete how JSON maps to Python and in turn to MongoDB. Here is how I accessed the Nobel Prize API and collected its data into a Mongo database for you. First, I import the requests library, which will get the data from the API. I also import the MongoClient class from pymongo. Pymongo is the official Python driver for MongoDB. Then, I connect to my local database server. I say that I want a database with the name "nobel", and MongoDB creates it. Finally, I gather JSON responses for the "prize" and "laureate" endpoints. I insert them into the "prizes" and "laureates" collections, which Mongo also creates for me.

7. Accessing databases and collections
>Now, let's go over how to count documents in a collection, and how to find one to inspect. First, a note on accessing databases and collections from a client object. One way is square bracket notation, as if a client is a dictionary of databases, with database names as keys. A database in turn is like a dictionary of collections, with collection names as keys. Another way to access things is dot notation. Databases are attributes of a client, and collections are attributes of a database.

8. Count documents in a collection
>To count documents, use the "count_documents " collection method. Pass a filter document to limit what you count. In this case, I want an unfiltered total count, so I pass an empty document as the filter. Finally, you can fetch a document and infer the schema of the raw JSON data given by the Nobel Prize API. Use the find_one method, again with no filter, to grab a document from the collection.

9. Let's practice!
>Now, let's practice. You'll access databases and collections from a connected client. You'll count documents, and you'll inspect them.

In [2]:
# Client connects to "localhost" by default
client = MongoClient()

# PREPARING THE DATABASE
# Use empty document {} as a filter
filter = {}
database_collections = {'nobel' : ['prizes', 'laureates'], 
                        'nobel2': ['nobelPrizes', 'laureates']}
for database in database_collections:
    db = client[database]
    for collection_name in database_collections[database]:
        db[collection_name].delete_many(filter)
print('Cleaning finished...')

Cleaning finished...


In [3]:
# READING FROM API SERVICE
# Create local "nobel2" database on the fly
db = client["nobel2"]

years = np.arange(1901, 2021)
path_data = {"nobelPrizes": 'https://api.nobelprize.org/2.0/nobelPrizes', 
             "laureates": 'https://api.nobelprize.org/2.0/laureates'}
for nobelPrizeYear in years:
    for collection_name in path_data:
        # collect the data from the API
        response = requests.get(path_data[collection_name], params={'nobelPrizeYear': nobelPrizeYear})
        
        # convert the data to json
        documents = response.json()[collection_name]
    
        # Create collections on the fly
        if len(documents) > 0:
            db[collection_name].insert_many(documents) 
    
    if len(documents) > 0:
        print(f'Year {nobelPrizeYear} added...')

print('Loading finished...')

## Create local "nobel2" database on the fly
#db = client["nobel2"]
#
#path_data = {"nobelPrizes": 'https://api.nobelprize.org/2.0/nobelPrizes', 
#             "laureates": 'https://api.nobelprize.org/2.0/laureates'}
#for collection_name in path_data:
#    # collect the data from the API
#    response = requests.get(path_data[collection_name], params={'limit': 100})
#    
#    # convert the data to json
#    documents = response.json()[collection_name]
#    
#    # Create collections on the fly
#    db[collection_name].insert_many(documents)
#
#print('Loading finished...')

Year 1901 added...
Year 1902 added...
Year 1903 added...
Year 1904 added...
Year 1905 added...
Year 1906 added...
Year 1907 added...
Year 1908 added...
Year 1909 added...
Year 1910 added...
Year 1911 added...
Year 1912 added...
Year 1913 added...
Year 1914 added...
Year 1915 added...
Year 1916 added...
Year 1917 added...
Year 1918 added...
Year 1919 added...
Year 1920 added...
Year 1921 added...
Year 1922 added...
Year 1923 added...
Year 1924 added...
Year 1925 added...
Year 1926 added...
Year 1927 added...
Year 1928 added...
Year 1929 added...
Year 1930 added...
Year 1931 added...
Year 1932 added...
Year 1933 added...
Year 1934 added...
Year 1935 added...
Year 1936 added...
Year 1937 added...
Year 1938 added...
Year 1939 added...
Year 1943 added...
Year 1944 added...
Year 1945 added...
Year 1946 added...
Year 1947 added...
Year 1948 added...
Year 1949 added...
Year 1950 added...
Year 1951 added...
Year 1952 added...
Year 1953 added...
Year 1954 added...
Year 1955 added...
Year 1956 ad

In [4]:
# READING FROM LOCAL FILE
# Create local "nobel" database on the fly
db = client["nobel"]

path_file = 'data/{}.json'

for collection_name in ['prizes', 'laureates']:
    documents= json.load(open(path_file.format(collection_name)))
    # Create collections on the fly
    db[collection_name].insert_many(documents)

print('Loading finished...')

Loading finished...


In [5]:
# Count documents in a collection
# Use empty document {} as a filter
filter = {}

# Count documents in a collection
n_prizes = db.prizes.count_documents(filter)
n_laureates = db.laureates.count_documents(filter)

print('Documents in Prizes collections   :', n_prizes)
print('Documents in Laureates collections:', n_laureates)

Documents in Prizes collections   : 590
Documents in Laureates collections: 934


## 01.02 Count documents in a collection

In the video, you learned that a MongoDB database can consist of several collections. Collections, in turn, consist of documents, which store the data.

You will be working with the Nobel laureates database which we have retrieved as __nobel__. The database has two collections, __prizes__ and __laureates__. In the __prizes__ collection, every document correspond to a single Nobel prize, and in the __laureates__ collection - to a single Nobel laureate.

Recall that you can access databases by name as attributes of the client, like __client.my_database__ (a connected client is already provided to you as __client__). Similarly, collections can be accessed by name as attributes of databases (__my_database.my_collection__).

**Instructions**

_Use the console on the right to compare the number of laureates and prizes using the __.count_documents()__ method on a collection (don't forget to specify an empty filter document as the argument!), and pick a statement that is __TRUE__._

**Possible Answers**

1. The number of prizes and laureates are equal.
2. Prizes outnumber laureates.
3. __Laureates outnumber prizes.__

**Results**

<font color=darkgreen>Correct! Many laureates have shared prizes!</font>

In [6]:
# Create local "nobel" database on the fly
client = MongoClient()

print(client.nobel.prizes.count_documents({}))
print(client.nobel.laureates.count_documents({}))

590
934


## 01.03 Listing databases and collections

Our __MongoClient__ object is not actually a dictionary, so we can't __call keys()__ to list the names of accessible databases. The same is true for listing collections of a database. Instead, we can list database names by calling __.list_database_names()__ on a client instance, and we can list collection names by calling __.list_collection_names()__ on a database instance.

**Instructions**

1. Save a list, called db_names, of the names of the databases managed by our connected client.
2. Similarly, save a list, called nobel_coll_names, of the names of the collections managed by the "nobel" database.

**Results**

<font color=darkgreen>Excellent! Did you notice any strange database/collection names? Every Mongo host has 'admin' and 'local' databases for internal bookkeeping, and every Mongo database has a 'system.indexes' collection to store indexes that make searches faster.</font>

In [7]:
# Save a list of names of the databases managed by client
db_names = client.list_database_names()
print(db_names)

# Save a list of names of the collections managed by the "nobel" database
nobel_coll_names = client.nobel.list_collection_names()
print(nobel_coll_names)

['admin', 'config', 'local', 'nobel', 'nobel2']
['prizes', 'laureates']


## 01.04 List fields of a document

The __.find_one()__ method of a collection can be used to retrieve a single document. This method accepts an optional __filter__ argument that specifies the pattern that the document must match. You will learn more about filters in the next lesson, but for now, you can specify no filter or an empty document filter (__{}__), in which case MongoDB will return the document that is first in the internal order of the collection.

This method is useful when you want to learn the structure of documents in the collection.

In Python, the returned document takes the form of a dictionary:

><code>sample_doc = {'id' : 12345, 'name':'Donny Winston', 'instructor': True}</code>

The keys of the dictionary are the (root-level) "fields" of the document, e.g. __'id'__, __'name'__, __'instructor'__.

**Instructions**

1. Connect to the nobel database.
2. Fetch one document from each of the prizes and laureates collections, and then take a look at the output in the console to see the format and type of the documents in Python.

Since prize and laureate are dictionaries, you can use the __.keys()__ method to return the keys (i.e. the field names). But it's often more convenient to work with lists of fields.

3. Use the list() constructor to save a list of the fields present in the prize and laureate documents.

**Result**

<font color=darkgreen>Way to fetch those fields! Notice that the prize documents contains a laureates field that stores the information on all the laureates sharing the prize, and the laureate document contains a prizes field, that stores the info on all the prizes they won.</font>

In [8]:
# Connect to the "nobel" database
db = client.nobel

# Retrieve sample prize and laureate documents
prize = db.prizes.find_one()
laureate = db.laureates.find_one()

# Print the sample prize and laureate documents
pprint(prize)
pprint(laureate)
print(type(laureate))

# Get the fields present in each type of document
prize_fields = list(prize.keys())
laureate_fields = list(laureate.keys())

print(prize_fields)
print(laureate_fields)

{'_id': ObjectId('6035cd48354dd8e354623018'),
 'category': 'physics',
 'laureates': [{'firstname': 'Arthur',
                'id': '960',
                'motivation': '"for the optical tweezers and their application '
                              'to biological systems"',
                'share': '2',
                'surname': 'Ashkin'},
               {'firstname': 'Gérard',
                'id': '961',
                'motivation': '"for their method of generating high-intensity, '
                              'ultra-short optical pulses"',
                'share': '4',
                'surname': 'Mourou'},
               {'firstname': 'Donna',
                'id': '962',
                'motivation': '"for their method of generating high-intensity, '
                              'ultra-short optical pulses"',
                'share': '4',
                'surname': 'Strickland'}],
 'overallMotivation': '“for groundbreaking inventions in the field of laser '
                   

## 01.05 Finding documents

1. Finding documents
>In the last lesson, we learned about databases, collections, and documents. We also learned how data structures in MongoDB relate to those of JSON and Python. In this lesson, we will learn how to query a collection to find documents of interest.

2. An example "laureates" document
>Here we have an example document in the laureates collection. Finding documents in MongoDB reminds me of the work of this particular laureate. The remarkable rays named after him are today called x-rays.

3. Filters as (sub)documents
>To find documents satisfying some criteria, we express those criteria as a document. This filter document mirrors the structure of documents to match in the collection. Imagine a filter document as an x-ray image like the one on the right. This image is one of the first x-ray images ever recorded. Imagine you use it to filter by eye a collection of color photos of people's hands. You might keep ones with five fingers and a ring on the ring finger.

4. The Walrus is Out
>Here's another way to think about filter documents. Hold a pair of pants up to a collection of mammals, one at a time. The pants fit a part of the structure of human beings, but not that of a walrus. Sorry, Wally.

5. Simple filters
>Filter documents can be small. For example, there are 48 laureate documents with a value for the field gender equal to female. We can do the same for other fields, like country of death and city of birth. Also, we can merge criteria into a single filter document.

6. Composing filters
>This filter document will have the same form as matching documents. In this case, the filter matches Marie Curie. She discovered the element polonium, named after her native land. Poland was under Russian rule at the time. She hoped that naming an element after it would publicize its lack of independence. Many immigrants won Nobel Prizes, but in what proportion? Later exercises in the course will explore this question.

7. Query operators
>We've seen filters that match exact values in a document. What about satisfying other constraints? Query operators are like different ways to input values on a website form. Some values you select from options, like countries; some you select with a range slider. All operators on fields wrap around their corresponding values. You might operate a drop-down menu to select a country and a slider to pick a value in a range. Query operators in MongoDB work the same way. You place an operator in a filter document to wrap around a field and its acceptable values.

8. Query operators
>For example, let's find documents where the field 'diedCountry' is either France or USA. We use the "in" operator to wrap around acceptable values. Operators in MongoDB have a dollar-sign prefix. Another example. To find documents where 'diedCountry' is not equal to a certain string, we can use the not-equal - or en ee - operator We can compose query operators for a field.

9. Query operators
>For example, here we query for documents with 'diedCountry' greater than - or gee tee - the string 'Belgium'. At the same time, we query for 'diedCountry' less than or equal to - or el tee ee - the string 'USA'. How rude that MongoDB considers some countries to be greater than or less than others! Actually, this highlights MongoDB's loose requirements for comparisons. Comparison operators order values in lexicographic order, meaning alphabetical order for non-numeric values. This behavior is something to keep in mind when working with raw, unprocessed data.

10. Let's Practice!
>Let's practice constructing filter documents, including those with query operators.

In [9]:
# Filters as (sub)documents
filter_doc = {'born': '1845-03-27',
              'diedCountry': 'Germany',
              'gender': 'male',
              'surname': 'Röntgen'}

print('Filter:', filter_doc)
print(db.laureates.count_documents(filter_doc))

# Simple filters
for filter in [{'gender': 'female'}, {'diedCountry': 'France'}, {'bornCity': 'Warsaw'}]:
    print('Filter: {}\n{}'.format(filter, db.laureates.count_documents(filter)))

Filter: {'born': '1845-03-27', 'diedCountry': 'Germany', 'gender': 'male', 'surname': 'Röntgen'}
1
Filter: {'gender': 'female'}
51
Filter: {'diedCountry': 'France'}
50
Filter: {'bornCity': 'Warsaw'}
2


In [10]:
# Composing filters
filter_doc = {'gender': 'female',
              'diedCountry': 'France',
              'bornCity': 'Warsaw'}
print('Filter:', filter_doc)
print(db.laureates.count_documents(filter_doc))
pprint(db.laureates.find_one(filter_doc))

Filter: {'gender': 'female', 'diedCountry': 'France', 'bornCity': 'Warsaw'}
1
{'_id': ObjectId('6035cd48354dd8e3546232aa'),
 'born': '1867-11-07',
 'bornCity': 'Warsaw',
 'bornCountry': 'Russian Empire (now Poland)',
 'bornCountryCode': 'PL',
 'died': '1934-07-04',
 'diedCity': 'Sallanches',
 'diedCountry': 'France',
 'diedCountryCode': 'FR',
 'firstname': 'Marie',
 'gender': 'female',
 'id': '6',
 'prizes': [{'affiliations': [[]],
             'category': 'physics',
             'motivation': '"in recognition of the extraordinary services they '
                           'have rendered by their joint researches on the '
                           'radiation phenomena discovered by Professor Henri '
                           'Becquerel"',
             'share': '4',
             'year': '1903'},
            {'affiliations': [{'city': 'Paris',
                               'country': 'France',
                               'name': 'Sorbonne University'}],
             'category': 'ch

In [11]:
# Query operators
for filter in [{'diedCountry': {'$in': ['France', 'USA']}}, # Value in a range $in: <list>
               {'diedCountry': {'$ne': 'France'}}, # Not equal $ne : <value> 
               {'diedCountry': {'$gt': 'Belgium', '$lte': 'USA'}} #Comparison: > : $gt , ≥ : $gte; < : $lt , ≤ : $lte
              ]: 
    print('Filter: {}\n{}'.format(filter, db.laureates.count_documents(filter)))

Filter: {'diedCountry': {'$in': ['France', 'USA']}}
259
Filter: {'diedCountry': {'$ne': 'France'}}
884
Filter: {'diedCountry': {'$gt': 'Belgium', '$lte': 'USA'}}
455


## 01.06 "born" approximation

The __"born"__ field in a laureate collection document records the date of birth of that laureate. __"born"__ values are of the form "YYYY-MM-DD", also known as ISO 8601 format. An example value is "1937-02-01", for February 1st, 1937. This format is convenient for lexicographic comparison. For example, the query

><code>db.laureates.count_documents({"born": {"$lt": "1900"}})</code>

returns the number of laureates with recorded dates of birth earlier than the year 1900 (__"$lt"__ is for "less than"). 

**Instructions**
    
Using the query format above, what is the number of laureates born prior to 1800? What about prior to 1700?

**Possible Answers**

1. 38 prior to 1800, and 0 prior to 1700
2. 324 prior to 1800, and 35 prior to 1700
3. __38 prior to 1800, and 38 prior to 1700__

**Results**

<font color=darkgreen>Correct! The first Nobel Prize was awarded in 1901, they are not awarded posthumously, so...some laureates lived to be more than 200 years old?? It turns out that a laureate's date of birth is recorded as '0000-00-00' when it is not known. Check your assumptions when working with data!</font>

In [12]:
print('Prior to 1800:', db.laureates.count_documents({'born': {'$lt':'1800'}}))
print('Prior to 1700:', db.laureates.count_documents({'born': {'$lt':'1700'}}))

Prior to 1800: 38
Prior to 1700: 38


## 01.07 Composing filters

It is often useful to incrementally build up a filter document in order to see the effect of adding constraints one at a time. In this exercise, we will count the number of laureate documents matching some criteria, and we will gradually add criteria.

**Instructions**

1. Create a filter criteria to count laureates who died ("diedCountry") in the USA ("USA"). Save the document count as count.
2. Create a filter to count laureates who died in the United States but were born ("bornCountry") in Germany.
3. Count laureates who died in the USA, were born in Germany, and whose first name ("firstname") was "Albert".

**Results**

<font color=darkgreen>Great work!</font>

In [13]:
# Create a filter for laureates who died in the USA
criteria = {'diedCountry': 'USA'}
count = db.laureates.count_documents(criteria)
print(count)

# Create a filter for laureates who died in the USA but were born in Germany
criteria = {'diedCountry': 'USA', 
            'bornCountry': 'Germany'}
count = db.laureates.count_documents(criteria)
print(count)

# Create a filter for Germany-born laureates who died in the USA and with the first name "Albert"
criteria = {'diedCountry': 'USA', 
            'bornCountry': 'Germany',
            'firstname'  : 'Albert'}
count = db.laureates.count_documents(criteria)
print(count)

209
5
1


## 01.08 We've got options

Sometimes, we wish to find documents where a field's value matches any of a set of options. We saw that the __$in__ query operator can be used for this purpose. For example, how many laureates were born in any of "Canada", "Mexico", or "USA"?

If we wish to accept all but one option as a value for a field, we can use the __$ne__ (not equal) operator. For example, how many laureates died in the USA but were not born in the USA?

**Instructions**

1. How many laureates were born in "USA", "Canada", or "Mexico"? Save a filter as criteria and your count as count.
2. How many laureates died in the USA but were not born there? Save your filter as criteria and your count as count.

**Results**

<font color=darkgreen>Good work! $ne is great when you don't want to have to list all other options to $in.</font>

In [14]:
# Save a filter for laureates born in the USA, Canada, or Mexico
criteria = {'bornCountry': { "$in": ['USA', 'Canada', 'Mexico']}}
count = db.laureates.count_documents(criteria)
print(count)


# Save a filter for laureates who died in the USA and were not born there
criteria = {'bornCountry': { "$ne": 'USA'},
            'diedCountry': 'USA'}
count = db.laureates.count_documents(criteria)
print(count)

291
69


## 01.09 Dot notation: reach into substructure

1. Dot notation: reach into substructure
>In the last lesson, we learned how to construct and compose filter documents. We also learned about query operators to express criteria other than simple equality. In this lesson, we are going to learn how to query arrays and subdocuments using dot notation. Dot notation is how MongoDB allows us to query document substructure.

2. A functional density
>Let's use the find_one method to retrieve one of my favorite laureates. Walter Kohn co-developed an important technique for computational chemistry. Notice that the "prizes" field is an array. In this case, the array has one element with data on Kohn's two-way share of the 1998 prize in chemistry. Note also that a laureate may have many affiliations for a prize. The affiliations field of each prize subdocument is an array. To fit text on this slide, I use parentheses to form multiline strings, and I show only part of the document. MongoDB allows you to query document substructure using dot notation. Here's a count of laureates. We reach into the prizes array to query on the affiliations field across prizes. From there, we reach again, this time to query on the name field across affiliations. We count laureates with at least one prize affiliation name as specified. The dot notation gives a full path to a field from the document root. I'm curious how many laureates had an affiliation in Berkeley, CA when they received a prize. Here's my query. Go Bears!

3. No Country for Naipaul
>MongoDB allows you to specify and enforce a schema for a collection, but this is not required. For example, fields do not need to have the same type of value across documents in a collection. In the case of this laureate, there is an "empty" affiliation associated with his prize. Another accommodation in MongoDB is that of field presence. Even root-level fields don't need to be present in all documents. In this document, for example, the "bornCountry" field is absent. Using the "exists" operator, We can query for the existence, or non-existence of fields. Here, we see that many laureates documents do not have a "bornCountry" field.

4. Multiple prizes
>Do all laureates have a prizes field? With the help of the exists operator, we see that, in fact, they do. But are any of those prizes fields empty arrays? I hope not. We can check using dot notation to access array elements. This borrows from javascript syntax. Here we see a filter document for the criteria that a value exists for the field "0" within the "prizes" field. You can reference an array element by its numerical index using dot notation. Thus, this expression counts documents that have a non-empty prizes array. We see, to our relief, that all laureate documents contain at least one prize. Are there laureates with more than one prize? Yes! We see that Marie Curie is in this group, along with a few other people you may recognize.

5. On to exercises!
>We've learned about dot notation to query array and subdocument fields. We've also learned about the "exists" operator to check for the presence of fields. Now, let's explore our data some more.

In [15]:
# A functional density
criteria = {"firstname": "Walter", "surname": "Kohn"}
pprint(db.laureates.find_one())

criteria = {"prizes.affiliations.name": ("University of California")}
print(criteria, db.laureates.count_documents(criteria))

criteria = {"prizes.affiliations.city": ("Berkeley, CA")}
print(criteria, db.laureates.count_documents(criteria))

criteria = {'surname': 'Naipaul'}
print(criteria, db.laureates.count_documents(criteria))

criteria = {"bornCountry": {"$exists": False}}
print(criteria, db.laureates.count_documents(criteria))

{'_id': ObjectId('6035cd48354dd8e354623266'),
 'born': '1853-07-18',
 'bornCity': 'Arnhem',
 'bornCountry': 'the Netherlands',
 'bornCountryCode': 'NL',
 'died': '1928-02-04',
 'diedCountry': 'the Netherlands',
 'diedCountryCode': 'NL',
 'firstname': 'Hendrik Antoon',
 'gender': 'male',
 'id': '2',
 'prizes': [{'affiliations': [{'city': 'Leiden',
                               'country': 'the Netherlands',
                               'name': 'Leiden University'}],
             'category': 'physics',
             'motivation': '"in recognition of the extraordinary service they '
                           'rendered by their researches into the influence of '
                           'magnetism upon radiation phenomena"',
             'share': '2',
             'year': '1902'}],
 'surname': 'Lorentz'}
{'prizes.affiliations.name': 'University of California'} 34
{'prizes.affiliations.city': 'Berkeley, CA'} 19
{'surname': 'Naipaul'} 1
{'bornCountry': {'$exists': False}} 33


In [16]:
# Multiple prizes
criteria = {}
print('Total documents:', db.laureates.count_documents(criteria))

criteria = {"prizes": {"$exists": True}}
print('With prizes:', db.laureates.count_documents(criteria))

criteria = {"prizes.0": {"$exists": True}}
print('With at least one prize:', db.laureates.count_documents(criteria))

criteria = {"prizes.1": {"$exists": True}}
print('With more than one prize:', db.laureates.count_documents(criteria))

Total documents: 934
With prizes: 934
With at least one prize: 934
With more than one prize: 6


## 01.10 Choosing tools
(The __nobel__ database is available in the console as __db__, and the Walter's Kohn document is available to you as __doc__. Feel free to examine the structure of the document __doc__ in the console, and play around with database queries!)

**Instructions**
We saw from his laureate document that Walter Kohn's country of birth was "Austria" and that his prize affiliation country was "USA". If we want to count the number of laureates born in Austria with a prize affiliation country that is not also Austria, what MongoDB concepts/tools should we use?

**Possible Answers**

1. __dot notation and the \$ne operator__
2. dot notation and the \$exists operator
3. dot notation and the \$in operator


**Results**

<font color=darkgreen>Correct! We will need dot notation to specify criteria for the prize affiliation country. We will need $ne to exclude the value "Austria".</font>

In [17]:
criteria = {'bornCountry': 'Austria', 
            'prizes.affiliations.country': {'$ne': 'Austria'}}
print(db.laureates.count_documents(criteria))

10


## 01.11 Starting our ascent

Throughout this course, we will gradually build up a set of tools to examine the proportion of Nobel prizes that were awarded to immigrants. In this exercise, you will answer a limited but related question using tools we have introduced so far.

We saw from his laureate document that Walter Kohn's country of birth was "Austria" and that his prize affiliation country was "USA". Count the number of laureates born in Austria with a prize affiliation country that is not also Austria.

**Instructions**

1. Save a filter criteria for laureates born in (bornCountry) "Austria" with a non-Austria prizes.affiliations.country.
3. Save your count of laureates as count.

**Results**

<font color=darkgreen>I am doting about your dot notation!</font>

In [18]:
# Filter for laureates born in Austria with non-Austria prize affiliation
criteria = {'bornCountry': 'Austria', 
            'prizes.affiliations.country': {"$ne": 'Austria'}}

# Count the number of such laureates
count = db.laureates.count_documents(criteria)
print(count)

10


## 01.12 Our 'born' approximation, and a special laureate

We saw earlier that the laureates collection encodes uncertainty about birthdate in a special way. When a birthdate is unknown, the __"born"__ field has the value __"0000-00-00"__. Thus,

><code>db.laureates.count_documents({"born": "0000-00-00"})</code>

counts the number of such laureates. Or does it?

We also saw that the total number of laureate prizes is more than the number of laureates -- some were awarded more than one prize. There is one in particular with a whopping three prizes, and this laureate holds key information to aid our quest to determine the proportion of prizes awarded to immigrants.

**Instructions**

1. Use a filter document (criteria) to count the documents that don't have a "born" field.
2. Use a filter document (criteria) to find a document for a laureate with at least three elements in its "prizes" array. In other words, does a third element exist for the array? Remember about the zero-based indexing!

**Results**

<font color=darkgreen>Well done. Take a look at the document in the counsole. What about this laureate presents a challenge to our goal of partitioning laureates into immigrant and non-immigrant Prize recipients?</font>

In [19]:
criteria = {"born": "0000-00-00"}
print(criteria, db.laureates.count_documents(criteria))

# Filter for documents without a "born" field
criteria = {'born': {'$exists': False}}
count = db.laureates.count_documents(criteria)
print(criteria, count)

# Filter for laureates with at least three prizes
criteria = {"prizes.2": {'$exists': True}}
count = db.laureates.count_documents(criteria)
print(criteria, count)
doc = db.laureates.find_one(criteria)
pprint(doc)

{'born': '0000-00-00'} 38
{'born': {'$exists': False}} 0
{'prizes.2': {'$exists': True}} 1
{'_id': ObjectId('6035cd48354dd8e35462339c'),
 'born': '0000-00-00',
 'died': '0000-00-00',
 'firstname': 'Comité international de la Croix Rouge (International Committee '
              'of the Red Cross)',
 'gender': 'org',
 'id': '482',
 'prizes': [{'affiliations': [[]],
             'category': 'peace',
             'share': '1',
             'year': '1917'},
            {'affiliations': [[]],
             'category': 'peace',
             'share': '1',
             'year': '1944'},
            {'affiliations': [[]],
             'category': 'peace',
             'share': '2',
             'year': '1963'}]}


# Aditional material
- Datacamp course: https://learn.datacamp.com/courses/introduction-to-using-mongodb-for-data-science-with-python
- Data source: https://www.nobelprize.org/about/developer-zone-2/