<img src="https://s3.amazonaws.com/edu-static.mongodb.com/lessons/M220/notebook_assets/screen_align.png" style="margin: 0 auto;">


<h1 style="text-align: center; font-size=58px;">Cursor Methods and Aggregation Equivalents</h1>

In this lesson we're going to discuss methods we can call against Pymongo cursors, and the aggregation stages that would perform the same tasks in a pipeline.

<h2 style="text-align: center; font-size=58px;">Limiting</h2>

In [1]:
import pymongo
from bson.json_util import dumps
uri = "mongodb+srv://max:Ocean123@mflix-4bnej.mongodb.net"
client = pymongo.MongoClient(uri)
mflix = client.sample_mflix
movies = mflix.movies

Here's (point) a collection object for the `movies` collection.

In [2]:
limited_cursor = movies.find(
    { "directors": "Sam Raimi" },
    { "_id": 0, "title": 1, "cast": 1 } 
).limit(2)

print(dumps(limited_cursor, indent=2))

[
  {
    "cast": [
      "Bruce Campbell",
      "Ellen Sandweiss",
      "Richard DeManincor",
      "Betsy Baker"
    ],
    "title": "The Evil Dead"
  },
  {
    "title": "Evil Dead II",
    "cast": [
      "Bruce Campbell",
      "Sarah Berry",
      "Dan Hicks",
      "Kassie Wesley DePaiva"
    ]
  }
]


So this is a find query with a predicate (point) and a projection (point). And the find() method is always gonna return a cursor to us. But before assigning that cursor to a variable, we've transformed it with the limit() method, to make sure no more than 2 documents are returned by this cursor.

(run command)

And we can see we only got two (point) documents back.

In [3]:
pipeline = [
    { "$match": { "directors": "Sam Raimi" } },
    { "$project": { "_id": 0, "title": 1, "cast": 1 } },
    { "$limit": 2 }
]

limited_aggregation = movies.aggregate( pipeline )

print(dumps(limited_aggregation, indent=2))

[
  {
    "cast": [
      "Bruce Campbell",
      "Ellen Sandweiss",
      "Richard DeManincor",
      "Betsy Baker"
    ],
    "title": "The Evil Dead"
  },
  {
    "title": "Evil Dead II",
    "cast": [
      "Bruce Campbell",
      "Sarah Berry",
      "Dan Hicks",
      "Kassie Wesley DePaiva"
    ]
  }
]


Now this is the equivalent operation with the aggregation framework. Instead of tacking a .limit() to the end of the cursor, we add $limit as a stage in our pipeline.

(enter command)

And it's the same output. And these (point to `$match` and `$project`) aggregation stages represent the query predicate and the projection from when we were using the query language.

<h2 style="text-align: center; font-size=58px;">Sorting</h2>

In [4]:
from pymongo import DESCENDING, ASCENDING

sorted_cursor = movies.find(
    { "directors": "Sam Raimi" },
    { "_id": 0, "year": 1, "title": 1, "cast": 1 } 
).sort("year", ASCENDING)

print(dumps(sorted_cursor, indent=2))

[
  {
    "cast": [
      "Bruce Campbell",
      "Ellen Sandweiss",
      "Richard DeManincor",
      "Betsy Baker"
    ],
    "title": "The Evil Dead",
    "year": 1981
  },
  {
    "year": 1987,
    "title": "Evil Dead II",
    "cast": [
      "Bruce Campbell",
      "Sarah Berry",
      "Dan Hicks",
      "Kassie Wesley DePaiva"
    ]
  },
  {
    "year": 1990,
    "title": "Darkman",
    "cast": [
      "Liam Neeson",
      "Frances McDormand",
      "Colin Friels",
      "Larry Drake"
    ]
  },
  {
    "cast": [
      "Bruce Campbell",
      "Embeth Davidtz",
      "Marcus Gilbert",
      "Ian Abercrombie"
    ],
    "title": "Army of Darkness",
    "year": 1992
  },
  {
    "cast": [
      "Sharon Stone",
      "Gene Hackman",
      "Russell Crowe",
      "Leonardo DiCaprio"
    ],
    "title": "The Quick and the Dead",
    "year": 1995
  },
  {
    "cast": [
      "Bill Paxton",
      "Bridget Fonda",
      "Billy Bob Thornton",
      "Brent Briscoe"
    ],
    "title": "A Sim

This is an example of the `sort()` (point) cursor method. `sort()` takes two parameters, the key we're sorting on and the sorting order. In this example we're sorting on year (point), in increasing (point) order.

ASCENDING and DESCENDING are values from the pymongo library to specify sort direction, but they're really just the integers 1 and -1.

(enter command)

And we can see that the movies were returned to us in order of the year they were made.

In [5]:
pipeline = [
    { "$match": { "directors": "Sam Raimi" } },
    { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } },
    { "$sort": { "year": ASCENDING } }
]

sorted_aggregation = movies.aggregate( pipeline )

print(dumps(sorted_aggregation, indent=2))

[
  {
    "cast": [
      "Bruce Campbell",
      "Ellen Sandweiss",
      "Richard DeManincor",
      "Betsy Baker"
    ],
    "title": "The Evil Dead",
    "year": 1981
  },
  {
    "year": 1987,
    "title": "Evil Dead II",
    "cast": [
      "Bruce Campbell",
      "Sarah Berry",
      "Dan Hicks",
      "Kassie Wesley DePaiva"
    ]
  },
  {
    "year": 1990,
    "title": "Darkman",
    "cast": [
      "Liam Neeson",
      "Frances McDormand",
      "Colin Friels",
      "Larry Drake"
    ]
  },
  {
    "cast": [
      "Bruce Campbell",
      "Embeth Davidtz",
      "Marcus Gilbert",
      "Ian Abercrombie"
    ],
    "title": "Army of Darkness",
    "year": 1992
  },
  {
    "cast": [
      "Sharon Stone",
      "Gene Hackman",
      "Russell Crowe",
      "Leonardo DiCaprio"
    ],
    "title": "The Quick and the Dead",
    "year": 1995
  },
  {
    "cast": [
      "Bill Paxton",
      "Bridget Fonda",
      "Billy Bob Thornton",
      "Brent Briscoe"
    ],
    "title": "A Sim

And this is the equivalent pipeline, with a sort (point) stage that corresponds to a dictionary, giving the sort (point) field, and the direction (point) of the sort.

(enter command)

And the agg framework was able to sort by year here.

In [6]:
sorted_cursor = movies.find(
    { "cast": "Tom Hanks" },
    { "_id": 0, "year": 1, "title": 1, "cast": 1 }
).sort([("year", ASCENDING), ("title", ASCENDING)])

print(dumps(sorted_cursor, indent=2))

[
  {
    "cast": [
      "Tom Hanks",
      "Daryl Hannah",
      "Eugene Levy",
      "John Candy"
    ],
    "title": "Splash",
    "year": 1984
  },
  {
    "cast": [
      "Tom Hanks",
      "Jackie Gleason",
      "Eva Marie Saint",
      "Hector Elizondo"
    ],
    "title": "Nothing in Common",
    "year": 1986
  },
  {
    "cast": [
      "Tom Hanks",
      "Elizabeth Perkins",
      "Robert Loggia",
      "John Heard"
    ],
    "title": "Big",
    "year": 1988
  },
  {
    "cast": [
      "Sally Field",
      "Tom Hanks",
      "John Goodman",
      "Mark Rydell"
    ],
    "title": "Punchline",
    "year": 1988
  },
  {
    "cast": [
      "Tom Hanks",
      "Bruce Dern",
      "Carrie Fisher",
      "Rick Ducommun"
    ],
    "title": "The 'Burbs",
    "year": 1989
  },
  {
    "cast": [
      "Tom Hanks",
      "Mare Winningham",
      "Craig T. Nelson",
      "Reginald VelJohnson"
    ],
    "title": "Turner & Hooch",
    "year": 1989
  },
  {
    "cast": [
      "Tom Ha

So just a special case to note here, sorting on multiple keys in the cursor method is gonna look a little different.

When sorting on one key, the `sort()` method takes two arguments, the key and the sort order.

When sorting on two or more keys, the `sort()` method takes a single argument, an array of tuples. And each tuple has a key and a sort order.

(enter command)

And we can see that after sorting on year, the cursor sorted the movie titles alphabetically.

In [7]:
pipeline = [
    { "$match": { "cast": "Tom Hanks" } },
    { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } },
    { "$sort": { "year": ASCENDING, "title": ASCENDING } }
]

sorted_aggregation = movies.aggregate( pipeline )

print(dumps(sorted_aggregation, indent=2))

[
  {
    "cast": [
      "Tom Hanks",
      "Daryl Hannah",
      "Eugene Levy",
      "John Candy"
    ],
    "title": "Splash",
    "year": 1984
  },
  {
    "cast": [
      "Tom Hanks",
      "Jackie Gleason",
      "Eva Marie Saint",
      "Hector Elizondo"
    ],
    "title": "Nothing in Common",
    "year": 1986
  },
  {
    "cast": [
      "Tom Hanks",
      "Elizabeth Perkins",
      "Robert Loggia",
      "John Heard"
    ],
    "title": "Big",
    "year": 1988
  },
  {
    "cast": [
      "Sally Field",
      "Tom Hanks",
      "John Goodman",
      "Mark Rydell"
    ],
    "title": "Punchline",
    "year": 1988
  },
  {
    "cast": [
      "Tom Hanks",
      "Bruce Dern",
      "Carrie Fisher",
      "Rick Ducommun"
    ],
    "title": "The 'Burbs",
    "year": 1989
  },
  {
    "cast": [
      "Tom Hanks",
      "Mare Winningham",
      "Craig T. Nelson",
      "Reginald VelJohnson"
    ],
    "title": "Turner & Hooch",
    "year": 1989
  },
  {
    "cast": [
      "Tom Ha

<h2 style="text-align: center; font-size=58px;">Skipping</h2>

In [8]:
pipeline = [
    { "$match": { "directors": "Sam Raimi" } },
    { "$project": { "_id": 0, "title": 1, "cast": 1 } },
    { "$count": "num_movies" }
]

sorted_aggregation = movies.aggregate( pipeline )

print(dumps(sorted_aggregation, indent=2))

[
  {
    "num_movies": 13
  }
]


(enter command)

So we know from counting the documents in this aggregation, that if we don't specify anything else, we're getting 15 (point) documents returned to us.

Note that the cursor method `count()` that counts documents in a cursor has been deprecated. So if you want to know how many documents are returned by a query, you should use the `$count` aggregation stage.

In [9]:
skipped_cursor = movies.find(
    { "directors": "Sam Raimi" },
    { "_id": 0, "title": 1, "cast": 1 } 
).skip(14)

print(dumps(skipped_cursor, indent=2))

[]


The `skip()` method allows us to skip documents in a collection, so only documents we did not skip appear in the cursor. Because we only have 15 documents, skipping 14 of them should only leave us with 1.

(enter command)

And look at that, we've only got 1 document in our cursor. The issue is, we don't really know which documents we skipped over, because we haven't specified a sort key and really, we have no idea the order in which documents are stored in the cursor.

In [10]:
skipped_sorted_cursor = movies.find(
    { "directors": "Sam Raimi" },
    { "_id": 0, "title": 1, "year": 1, "cast": 1 } 
).sort("year", ASCENDING).skip(10)

print(dumps(skipped_sorted_cursor, indent=2))

[
  {
    "year": 2007,
    "title": "Spider-Man 3",
    "cast": [
      "Tobey Maguire",
      "Kirsten Dunst",
      "James Franco",
      "Thomas Haden Church"
    ]
  },
  {
    "year": 2009,
    "title": "Drag Me to Hell",
    "cast": [
      "Alison Lohman",
      "Justin Long",
      "Lorna Raver",
      "Dileep Rao"
    ]
  },
  {
    "cast": [
      "James Franco",
      "Mila Kunis",
      "Rachel Weisz",
      "Michelle Williams"
    ],
    "title": "Oz the Great and Powerful",
    "year": 2013
  }
]


So here we've sorted on year (point) and then skipped the first 14. Now we know that when we're skipping 10 documents, we're skipping the 10 oldest Sam Raimi movies in this collection.

(enter command)

And we only got 5 of those 15 documents back, because we skipped 10 of them.

These cursor methods are nice because we can tack them on a cursor in the order we want them applied. It even kinda makes our Python look like Javascript, with this `.sort()` and `.skip()`.

In [11]:
pipeline = [
    { "$match": { "directors": "Sam Raimi" } },
    { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } },
    { "$sort": { "year": ASCENDING } },
    { "$skip": 10 }
]

sorted_skipped_aggregation = movies.aggregate( pipeline )

print(dumps(sorted_skipped_aggregation, indent=2))

[
  {
    "year": 2007,
    "title": "Spider-Man 3",
    "cast": [
      "Tobey Maguire",
      "Kirsten Dunst",
      "James Franco",
      "Thomas Haden Church"
    ]
  },
  {
    "year": 2009,
    "title": "Drag Me to Hell",
    "cast": [
      "Alison Lohman",
      "Justin Long",
      "Lorna Raver",
      "Dileep Rao"
    ]
  },
  {
    "cast": [
      "James Franco",
      "Mila Kunis",
      "Rachel Weisz",
      "Michelle Williams"
    ],
    "title": "Oz the Great and Powerful",
    "year": 2013
  }
]


So here's an example of the same query in the aggregation framework. As you can see the `$skip` stage represents the `.skip()` from before.

(run command)

And it gives us the same output.

The `skip()` method is useful for paging results on a website, because we can sort the results chronologically, and then if we have 10 movies displayed on each page, the first page would have a skip value of 0, but then the second page would skip the first 10 movies, the third page would skip the first 20 movies, etc.

## Summary

* `.limit()` == `$limit`
* `.sort()` == `$sort`
* `.skip()` == `$skip`

So just to recap, in this lesson we covered some cursor methods and their aggregation equivalents. Remember that there won't always be a 1 to 1 mapping, because the aggregation framework can do a lot more than cursors can.

But these three methods exist as both aggregation stages and cursor methods.