1- **API EXERCISE:** 


Browse the moviedb API https://developers.themoviedb.org/3/getting-started/introduction and find the top 5 (5 most average_vote) trending movies for the last week (iterate through all the pages).

For each movie, create a dictionary with name, release date and average vote, put in a list and show it.

Store it in a Collection called Movies in the ADS MongoDB cloud Database

In [6]:
from urllib.request import urlopen
from tqdm import tqdm
import json

# We need to get our API Key from an external file.
# ATTENTION: As requested in class, the api_key must be included in a file named 'api_key', not 'api_key.txt'.
with open('api_key', 'r') as f:
    api_key = f.read()
f.closed

# We carry out a first access to the Movie Database's Endpoint that gives us information on weekly trending movies to get the number of pages we should iterate through. 
source = urlopen('https://api.themoviedb.org/3/trending/movie/week?api_key=' + api_key + '&page=1')
# We parse the content of the source with json (it converts it into a dict)
json_obj = json.loads(source.read())

# We define a list where we will keep all the movie-representing dicts. Now that we know the number of pages thanks to the previous request's result, we will iterate through all the pages to fetch the movies.
movie_list = []
# tqdm package allows us to show a very informative progress bar. First page is iterated again, which could be avoided, but it does not represent a big loss in efficiency.
for i in tqdm(range(json_obj['total_pages']), desc = "Fetching information from all pages"):
    source = urlopen('https://api.themoviedb.org/3/trending/movie/week?api_key=' + api_key + '&page=' + str(i + 1))
    pageDict = json.loads(source.read())

    # For each movie found in the current's page content, we create a dict and only add to it those properties we are interested in.
    for movie in pageDict['results']:
        movie_list.append({
            'title' : movie.get('title'),
            'vote_average' : movie.get('vote_average'),
            'release_date' : movie.get('release_date')
        })

# Sort the list of movies found by average vote (descending)
sorted_movie_list = sorted(movie_list, key=lambda k: k['vote_average'], reverse=True) 
# Keep only the top 5 movies and show it
weekly_top_5_trending_movies = sorted_movie_list[0:5]
print(weekly_top_5_trending_movies)

Fetching information from all pages: 100%|██████████| 1000/1000 [03:39<00:00,  4.56it/s]

[{'title': 'My Name Is Pauli Murray', 'vote_average': 10.0, 'release_date': '2021-09-17'}, {'title': 'My Struggle', 'vote_average': 10.0, 'release_date': '2021-09-24'}, {'title': 'Paradox Lost', 'vote_average': 10.0, 'release_date': '2020-12-05'}, {'title': 'Under the Volcano', 'vote_average': 10.0, 'release_date': '2021-03-20'}, {'title': 'Miracle: Letters to the President', 'vote_average': 10.0, 'release_date': '2021-09-15'}]





In [7]:
# Requirement needed by some of us to connect to MongoDB correctly.
!pip install certifi



In [8]:
# Method seen in class to connect to MongoDB
# ATTENTION: Remember to add your current IP as valid IP access in your MongoDB network dashboard.
import pymongo, certifi

try:
    #create a credentials.txt file in this folder:
    #first line: Database username (not account username)
    #second line: Database username password (not account password)
    #third line: Databse url with port (you can find it in the cloud dashboard)
    #fourth lin: Database name
    if 'conn' in globals():
        conn.close()
        print("Closing")
    
    with open("credentials.txt", 'r') as f:
        [name,password,url,dbname]=f.read().splitlines()
    # Next line was changed in order to use certifi and connect correctly
    conn=pymongo.MongoClient("mongodb+srv://{}:{}@{}".format(name,password,url), tlsCAFile=certifi.where())
    print ("Connected successfully!!!")
    
except pymongo.errors.ConnectionFailure as e:
    print ("Could not connect to MongoDB: %s" % e) 
conn
db = conn[dbname]
db


Closing
Connected successfully!!!


Database(MongoClient(host=['ads-shard-00-01.trb7f.mongodb.net:27017', 'ads-shard-00-02.trb7f.mongodb.net:27017', 'ads-shard-00-00.trb7f.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-r513ol-shard-0', ssl=True, ssl_ca_certs='C:\\anaconda3\\lib\\site-packages\\certifi\\cacert.pem'), 'ads')

In [9]:
# We connect to the Movies collection (or create it if it does not exist)
collection = db.Movies
# Delete the previous results added in previous executions
collection.delete_many({})
# We add all the items (dicts) in the list to the collection
collection.insert_many(weekly_top_5_trending_movies)


<pymongo.results.InsertManyResult at 0x1bc85951d40>

In [10]:
# Check if items were added to the collection
collection.find()
[d for d in collection.find()] 

[{'_id': ObjectId('615eebcb90fdd665b2c0e56f'),
  'title': 'My Name Is Pauli Murray',
  'vote_average': 10.0,
  'release_date': '2021-09-17'},
 {'_id': ObjectId('615eebcb90fdd665b2c0e570'),
  'title': 'My Struggle',
  'vote_average': 10.0,
  'release_date': '2021-09-24'},
 {'_id': ObjectId('615eebcb90fdd665b2c0e571'),
  'title': 'Paradox Lost',
  'vote_average': 10.0,
  'release_date': '2020-12-05'},
 {'_id': ObjectId('615eebcb90fdd665b2c0e572'),
  'title': 'Under the Volcano',
  'vote_average': 10.0,
  'release_date': '2021-03-20'},
 {'_id': ObjectId('615eebcb90fdd665b2c0e573'),
  'title': 'Miracle: Letters to the President',
  'vote_average': 10.0,
  'release_date': '2021-09-15'}]