Aggregation framework

```json
{
     "_id" : ObjectId("5b107bec1d2952d0da904dd7"),
     "title" : "Titan A.E.",
     "year" : 2000,
     "rated" : "PG",
     "runtime" : 94,
     "countries" : [
             "USA"
     ],
     "genres" : [
             "Animation",
             "Action",
             "Adventure"
     ],
     "director" : "Don Bluth, Gary Goldman, Art Vitello",
     "writers" : [
             "Hans Bauer",
             "Randall McCormick",
             "Ben Edlund",
             "John August",
             "Joss Whedon"
     ],
     "actors" : [
             "Matt Damon",
             "Bill Pullman",
             "John Leguizamo",
             "Nathan Lane"
     ],
     "plot" : "A young man learns that he has to find a hidden Earth ship before an enemy alien species does in order to secure the survival of humanity.",
     "poster" : "http://ia.media-imdb.com/images/M/MV5BMjE0NTU0ODg4NV5BMl5BanBnXkFtZTcwNzY3MTQyMQ@@._V1_SX300.jpg",
     "imdb" : {
             "id" : "tt0120913",
             "rating" : 6.6,
             "votes" : 50875
     },
     "tomato" : {
             "meter" : 52,
             "image" : "rotten",
             "rating" : 5.7,
             "reviews" : 99,
             "fresh" : 51,
             "consensus" : "Great visuals, but the story feels like a cut-and-paste job of other sci-fi movies.",
             "userMeter" : 60,
             "userRating" : 3.2,
             "userReviews" : 69055
     },
     "metacritic" : 48,
     "awards" : {
             "wins" : 1,
             "nominations" : 7,
             "text" : "1 win & 7 nominations."
     },
     "type" : "movie"
}
```

In [1]:
#Connect to database
!pip install pymongo



In [2]:
from pymongo import MongoClient
from pprint import pprint as pp
client = MongoClient('mongodb://localhost:27017')

In [3]:
db = client.datascience

Movie titles with rating over 8.5

In [4]:
pp(next(db.movieDetails.find(
    {},
    {"title": 1, "imdb.rating": 1}
)))

{'_id': ObjectId('5b107bec1d2952d0da9046e1'),
 'imdb': {'rating': 6.1},
 'title': 'A Million Ways to Die in the West'}


Operators
---------
distinguished with $

* Inequality
    - \$gt
    - \$lt
    - \$gte
    - \$lte
    - \$ne


In [6]:
pp(db.movieDetails.find(
    {"imdb.rating": {"$gt": 8.9}},
    {"title": 1, "imdb.rating": 1}
).count())

9


  This is separate from the ipykernel package so we can avoid doing imports until


In [7]:
pp(next(db.movieDetails.find(
    {"imdb.rating": {"$gt": 8.9, "$lt": 9.5}},

    {"title": 1, "imdb.rating": 1}
)))

{'_id': ObjectId('5b107bec1d2952d0da9048bb'),
 'imdb': {'rating': 9.0},
 'title': 'Gamechangers Ep. 3: A Legend in the Booth'}


how many movies are over 9 in our collection

In [8]:
movies_count = 0

for movie in db.movieDetails.find(
    {"imdb.rating": {"$gt": 8.9, "$lt": 9.5}},
    {"title": 1, "imdb.rating": 1}
):
    movies_count += 1

print("There is {} movies rated over 9.0".format(movies_count))

There is 7 movies rated over 9.0


exists field

In [10]:
def count_results(cursor):
    count = 0
    for item in cursor:
        count += 1
    return count

In [12]:
count_results(db.movieDetails.find(
    {
        "tomato": {
            "$exists": 1
        },
        "imdb": {
            "$exists": 1
        }
    },
    {"title": 1, "imdb.rating": 1}
))

376

In [13]:
count_results(db.movieDetails.find(
    {"tomato": {"$exists": 0}},
    {"title": 1, "imdb.rating": 1}
))

1874

regex Operator
--------------


In [14]:
count_results(db.movieDetails.find(
    {"title": {"$regex": "super"}},
    {"title": 1, "imdb.rating": 1}
))

0

In [15]:
count_results(db.movieDetails.find(
    {"title": {"$regex": "[Ss]uper"}},
    {"title": 1, "imdb.rating": 1}
))

6

In [16]:
count_results(db.movieDetails.find(
    {"title": {"$regex": "super", "$options": 'i'}},
    {"title": 1, "imdb.rating": 1}
))

6

In [40]:
pp(list(
    map(
        lambda x: x['title'],
        db.movieDetails.find(
            {"title": {"$regex": "super|hero", "$options": 'i'}},
            {"title": 1, "imdb.rating": 1}
        )
    )
))

["Herod's Law",
 'Big Hero 6',
 'The LS SuperTards',
 'JD Superhero',
 'Lego DC Comics Super Heroes: Justice League vs. Bizarro League',
 'LEGO Batman: The Movie - DC Super Heroes Unite',
 'My Super Ex-Girlfriend',
 'Mon Pere Ce Heros',
 'The Haunted World of El Superbeasto']


In [18]:
count_results(db.movieDetails.find(
    {"imdb.rating": {"$gt": 9}, "year": {"$in": list(range(2005, 2019, 1))}},
    {"title": 1, "imdb.rating": 1}
))

2

Sadly there is no movies rated over 9 after 2009

In [63]:
db.movieDetails.find(
    {"title": {"$regex": "super|hero", "$options": 'i'}},
    {"title": 1, "imdb.rating": 1}
).count()

  This is separate from the ipykernel package so we can avoid doing imports until


9