## The Nobel Prize API data(base)

You will be working with the Nobel laureates database which we have retrieved as nobel. The database has two collections, prizes and laureates. In the prizes collection, every document correspond to a single Nobel prize, and in the laureates collection - to a single Nobel laureate.

Recall that you can access databases by name as attributes of the client, like client.my_database (a connected client is already provided to you as client). Similarly, collections can be accessed by name as attributes of databases (my_database.my_collection).

Use the console on the right to compare the number of laureates and prizes using the .count_documents() method on a collection (don't forget to specify an empty filter document as the argument!), and pick a statement that is TRUE.

In [2]:
import requests
from pymongo import MongoClient
# Client connects to "localhost" by default
client = MongoClient()
# Create local "nobel" database on the fly
db = client["nobel"]
for collection_name in ["prizes", "laureates"]:
    # collect the data from the API
    response = requests.get(
        "http://api.nobelprize.org/v1/{}.json".\
        format(collection_name[:-1] ))
    # convert the data to json
    documents = response.json()[collection_name]
    # Create collections on the fly
    db[collection_name].insert_many(documents)

## Listing databases and collections
Our MongoClient object is not actually a dictionary, so we can't call keys() to list the names of accessible databases. The same is true for listing collections of a database. Instead, we can list database names by calling .list_database_names() on a client instance, and we can list collection names by calling .list_collection_names() on a database instance.

In [58]:
# Save a list of names of the databases managed by client
db_names = client.list_database_names()
print(db_names)

# Save a list of names of the collections managed by the "nobel" database
nobel_coll_names = client.nobel.list_collection_names()
print(nobel_coll_names)

['admin', 'config', 'local', 'nobel', 'nobel_prize_db']
['laureates', 'prizes']


## Accessing databases and collections

In [4]:
# client is a dictionary of databases
db = client["nobel"]
# database is a dictionary of collections
prizes_collection = db["prizes"]

#### 
# databases are attributes of a client
#db = client.nobel
# collections are attributes of databases
#prizes_collection = db.prizes

## Count documents in a collection

In [6]:
# Use empty document {} as a filter
filter = {}
# Count documents in a collection
n_prizes = db.prizes.count_documents(filter)
n_laureates = db.laureates.count_documents(filter)

# Find one document to inspect
doc = db.prizes.find_one(filter)

doc

{'_id': ObjectId('67f81e84e3b001d8eab41e0f'),
 'year': '2024',
 'category': 'chemistry',
 'laureates': [{'id': '1039',
   'firstname': 'David',
   'surname': 'Baker',
   'motivation': '"for computational protein design"',
   'share': '2'},
  {'id': '1040',
   'firstname': 'Demis',
   'surname': 'Hassabis',
   'motivation': '"for protein structure prediction"',
   'share': '4'},
  {'id': '1041',
   'firstname': 'John',
   'surname': 'Jumper',
   'motivation': '"for protein structure prediction"',
   'share': '4'}]}

## Filters as (sub)documents
Count documents by providing a filter document to match.

In [8]:
filter_doc = {
'born': '1845-03-27',
'diedCountry': 'Germany',
'gender': 'male',
'surname': 'Röntgen'
}
db.laureates.count_documents(filter_doc)

4

## Simple filters

In [18]:
db.laureates.count_documents({'gender': 'female'})

260

In [20]:
db.laureates.count_documents({'diedCountry': 'France'})

208

In [22]:
db.laureates.count_documents({'bornCity': 'Warsaw'})

8

## Composing filters

In [25]:
filter_doc = {'gender': 'female',
'diedCountry': 'France',
'bornCity': 'Warsaw'}

db.laureates.count_documents(filter_doc)

4

In [27]:
db.laureates.find_one(filter_doc)

{'_id': ObjectId('67f81e8de3b001d8eab420b8'),
 'id': '6',
 'firstname': 'Marie',
 'surname': 'Curie',
 'born': '1867-11-07',
 'died': '1934-07-04',
 'bornCountry': 'Russian Empire (now Poland)',
 'bornCountryCode': 'PL',
 'bornCity': 'Warsaw',
 'diedCountry': 'France',
 'diedCountryCode': 'FR',
 'diedCity': 'Sallanches',
 'gender': 'female',
 'prizes': [{'year': '1903',
   'category': 'physics',
   'share': '4',
   'motivation': '"in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel"',
   'affiliations': [[]]},
  {'year': '1911',
   'category': 'chemistry',
   'share': '1',
   'motivation': '"in recognition of her services to the advancement of chemistry by the discovery of the elements radium and polonium, by the isolation of radium and the study of the nature and compounds of this remarkable element"',
   'affiliations': [{'name': 'Sorbonne University',
     'city': 'Paris',
     '

## Query operators

In [30]:
#Value in a range $in: <list>

db.laureates.count_documents({
'diedCountry': {
'$in': ['France', 'USA']}})

1196

In [32]:
#Not equal $ne : <value>

db.laureates.count_documents({
'diedCountry': {
'$ne': 'France'}})

3808

In [34]:
#Comparison:
#> : $gt , ≥ : $gte
#< : $lt , ≤ : $lte

db.laureates.count_documents({
'diedCountry': {
'$gt': 'Belgium',
'$lte': 'USA'}})

2084

## A functional density

In [37]:
db.laureates.find_one({
"firstname": "Walter",
"surname": "Kohn"})

{'_id': ObjectId('67f81e8de3b001d8eab421cf'),
 'id': '290',
 'firstname': 'Walter',
 'surname': 'Kohn',
 'born': '1923-03-09',
 'died': '2016-04-19',
 'bornCountry': 'Austria',
 'bornCountryCode': 'AT',
 'bornCity': 'Vienna',
 'diedCountry': 'USA',
 'diedCountryCode': 'US',
 'diedCity': 'Santa Barbara, CA',
 'gender': 'male',
 'prizes': [{'year': '1998',
   'category': 'chemistry',
   'share': '2',
   'motivation': '"for his development of the density-functional theory"',
   'affiliations': [{'name': 'University of California',
     'city': 'Santa Barbara, CA',
     'country': 'USA'}]}]}

In [39]:
db.laureates.count_documents({
"prizes.affiliations.name": (
"University of California")})

156

In [41]:
db.laureates.count_documents({
"prizes.affiliations.city": (
"Berkeley, CA")})

88

## No Country for Naipaul

In [44]:
db.laureates.find_one({'surname': 'Naipaul'})

{'_id': ObjectId('67f81e8de3b001d8eab42387'),
 'id': '747',
 'firstname': 'V. S.',
 'surname': 'Naipaul',
 'born': '1932-08-17',
 'died': '2018-08-11',
 'bornCountry': 'Trinidad and Tobago',
 'bornCountryCode': 'TT',
 'bornCity': 'Chaguanas',
 'diedCountry': 'United Kingdom',
 'diedCountryCode': 'GB',
 'diedCity': 'London',
 'gender': 'male',
 'prizes': [{'year': '2001',
   'category': 'literature',
   'share': '1',
   'motivation': '"for having united perceptive narrative and incorruptible scrutiny in works that compel us to see the presence of suppressed histories"',
   'affiliations': [[]]}]}

In [46]:
db.laureates.count_documents({"bornCountry": {"$exists": False}})

120

## Multiple prizes

In [49]:
db.laureates.count_documents({})

4016

In [51]:
db.laureates.count_documents({"prizes": {"$exists": True}})

4016

In [53]:
db.laureates.count_documents({"prizes.0": {"$exists": True}})

4016

In [55]:
db.laureates.count_documents({"prizes.1": {"$exists": True}})

28