# Sorting: Exercises

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.nobel

## What the sort?

This block prints out the first five projections of a sorted query. What "sort" argument fills the blank?

```python
docs = list(db.laureates.find(
    {"born": {"$gte": "1900"}, "prizes.year": {"$gte": "1954"}},
    {"born": 1, "prizes.year": 1, "_id": 0},
    sort=____))
for doc in docs[:5]:
    print(doc)
```
```
{'born': '1916-08-25', 'prizes': [{'year': '1954'}]}
{'born': '1915-06-15', 'prizes': [{'year': '1954'}]}
{'born': '1901-02-28', 'prizes': [{'year': '1954'}, {'year': '1962'}]}
{'born': '1913-07-12', 'prizes': [{'year': '1955'}]}
{'born': '1911-01-26', 'prizes': [{'year': '1955'}]}
```

1. `[("prizes.year", 1), ("born", -1)]`
1. `{"prizes.year": 1, "born": -1}`
1. `None`
1. `[("prizes.year", 1)]`

## Sorting together: MongoDB + Python

You will print out the names of all physics laureates, with one line printed for each award year, in chronological order. Each line will list laureates for that year in alphabetical order by surname ("last" name).

I encourage you to print intermediate results and understand the nested structure of prize documents.

- Construct a sort specification `sort_spec` to fetch physics prizes by ascending year.

In [None]:
from operator import itemgetter

# Sort by ascending year
sort_spec = [(____, ____)]

- Use `<collection>.find` to construct a `cursor` that fetches prizes with a "category" of "physics", sorts by ascending year, and projects the year and laureate names (`laureates.firstname` and `laureates.surname`). Understand the printed results.


In [None]:
# Construct a cursor over physics prizes
cursor = db.prizes.____({____: ____}, [____, "laureates.firstname", "laureates.surname"], sort=sort_spec)
docs = list(cursor)
for doc in sorted(docs, key=itemgetter("year")):
    print("{year}: {first_laureate_surname}".format(
        year=doc["year"], first_laureate_surname=doc["laureates"][0]["surname"]))
cursor.rewind() # Rewind cursor to reuse in the next step

- Complete the definition of the function `names` so that, given a prize document, it returns a list of formatted names, sorted by ascending "surname", for each of the "laureates" in that prize document.


In [None]:
# Define a function names() to return a list of formatted names
def names(doc):
    formatted_names = ["{firstname} {surname}".format(**laureate)
          for laureate in sorted(doc[____], key=itemgetter(____))]
    return formatted_names

lines = ["{year}: {names}".format(year=doc["year"], names=" and ".join(names(doc)))
         for doc in cursor]
for line in lines: print(line)

## Gap years

The prize in economics was not added until 1969. There have also been many years for which prizes in one or more of the original categories were not awarded.

Sorting first by reverse chronological order and second by alphabetical order of category, collect and format prize documents to produce one formatted entry per year listing categories missing for that year.


- Construct a set `original_categories` of prize categories awarded in 1901.

In [None]:
import itertools
from operator import itemgetter

# Save the set of prize categories awarded in 1901
original_categories = set(db.prizes.____("category", {____: "1901"}))

- Use `<collection>.find` to construct a cursor `cursor` that yields prize documents only for categories in the list of original categories, sorted first by decreasing year and second by increasing category.


In [None]:
# Construct a cursor over original-category prizes
cursor = db.prizes.____({"category": {____: list(original_categories)}}, ["category", "year"],
                        sort=[(____, ____), (____, ____)])

- Collect a list `not_awarded` of entries to be printed, one per line, that display a year and the categories missing for that year. You will collect "category" values for each year and set-subtract them from the original categories.


In [None]:
# Collect entries for missing prize categories
not_awarded = []
for key, group in itertools.groupby(cursor, key=itemgetter("year")):
    year_categories = set(prize[____] for prize in group)
    missing = ", ".join(sorted(____ - ____))
    if missing: not_awarded.append("{}: {}".format(key, missing))

for line in not_awarded: print(line)