[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jkanclerz/data-science-workshop-2021/blob/main/21--no-sql-database-mongo/05--document-database_mongodb--movies.ipynb)

In [None]:
!sudo apt install mongodb
!sudo service mongodb start

## datasample

In [None]:
!mkdir -p var
!wget https://github.com/SouthbankSoftware/dbkoda-data/raw/master/SampleCollections/dump/SampleCollections/video_movies.bson -O var/video_movies.bson
!wget https://github.com/SouthbankSoftware/dbkoda-data/raw/master/SampleCollections/dump/SampleCollections/video_movieDetails.bson -O var/video_movieDetails.bson
!mongorestore --db datascience --drop --collection movies var/video_movieDetails.bson
!mongorestore --db datascience --drop --collection films var/video_movies.bson

Queries

```json
{
     "_id" : ObjectId("5b107bec1d2952d0da904dd7"),
     "title" : "Titan A.E.",
     "year" : 2000,
     "rated" : "PG",
     "runtime" : 94,
     "countries" : [
             "USA"
     ],
     "genres" : [
             "Animation",
             "Action",
             "Adventure"
     ],
     "director" : "Don Bluth, Gary Goldman, Art Vitello",
     "writers" : [
             "Hans Bauer",
             "Randall McCormick",
             "Ben Edlund",
             "John August",
             "Joss Whedon"
     ],
     "actors" : [
             "Matt Damon",
             "Bill Pullman",
             "John Leguizamo",
             "Nathan Lane"
     ],
     "plot" : "A young man learns that he has to find a hidden Earth ship before an enemy alien species does in order to secure the survival of humanity.",
     "poster" : "http://ia.media-imdb.com/images/M/MV5BMjE0NTU0ODg4NV5BMl5BanBnXkFtZTcwNzY3MTQyMQ@@._V1_SX300.jpg",
     "imdb" : {
             "id" : "tt0120913",
             "rating" : 6.6,
             "votes" : 50875
     },
     "tomato" : {
             "meter" : 52,
             "image" : "rotten",
             "rating" : 5.7,
             "reviews" : 99,
             "fresh" : 51,
             "consensus" : "Great visuals, but the story feels like a cut-and-paste job of other sci-fi movies.",
             "userMeter" : 60,
             "userRating" : 3.2,
             "userReviews" : 69055
     },
     "metacritic" : 48,
     "awards" : {
             "wins" : 1,
             "nominations" : 7,
             "text" : "1 win & 7 nominations."
     },
     "type" : "movie"
}
```

In [None]:
#Connect to database
!pip install pymongo

In [None]:
from pymongo import MongoClient
from pprint import pprint as pp
client = MongoClient('mongodb://localhost:27017')

In [None]:
db = client.datascience

Movie titles with rating over 8.5

In [None]:
pp(next(db.movies.find(
    {},
    {"title": 1, "imdb.rating": 1}
)))

Operators
---------
distinguished with $

* Inequality
    - \$gt
    - \$lt
    - \$gte
    - \$lte
    - \$ne


In [None]:
pp(next(db.movies.find(
    {"imdb.rating": {"$gt": 8.9}},
    {"title": 1, "imdb.rating": 1}
)))

In [None]:
pp(next(db.movies.find(
    {"imdb.rating": {"$gt": 8.9, "$lt": 9.5}},
    {"title": 1, "imdb.rating": 1}
)))

how many movies are over 9 in our collection

In [None]:
movies_count = 0

for movie in db.movies.find(
    {"imdb.rating": {"$gt": 8.9, "$lt": 9.5}},
    {"title": 1, "imdb.rating": 1}
):
    movies_count += 1

print("There is {} movies rated over 9.0".format(movies_count))

exists field

In [None]:
def count_results(cursor):
    count = 0
    for item in cursor:
        count += 1
    return count

In [None]:
count_results(db.movies.find(
    {"tomato": {"$exists": 1}},
    {"title": 1, "imdb.rating": 1}
))

In [None]:
count_results(db.movies.find(
    {"tomato": {"$exists": 0}},
    {"title": 1, "imdb.rating": 1}
))

regex Operator
--------------


In [None]:
count_results(db.movies.find(
    {"title": {"$regex": "super"}},
    {"title": 1, "imdb.rating": 1}
))

In [None]:
count_results(db.movies.find(
    {"title": {"$regex": "[Ss]uper"}},
    {"title": 1, "imdb.rating": 1}
))

In [None]:
count_results(db.movies.find(
    {"title": {"$regex": "super", "$options": 'i'}},
    {"title": 1, "imdb.rating": 1}
))

In [None]:
pp(list(map(lambda x: x['title'],db.movieDetails.find(
    {"title": {"$regex": "super|hero", "$options": 'i'}},
    {"title": 1, "imdb.rating": 1}
))))

In [None]:
count_results(db.movies.find(
    {"imdb.rating": {"$gt": 9}, "year": {"$in": list(range(2008, 2019, 1))}},
    {"title": 1, "imdb.rating": 1}
))

Sadly there is no movies rated over 9 after 2009

In [None]:
db.movies.count_documents(
    {"title": {"$regex": "super|hero", "$options": 'i'}}
)