# ML: Movie Learning

We're going to do some form of movie text wrangling using Python. At this point, we have aquired a list of movie titles, along with some other data scraped from Wikipedia. Unfortunately, these other data are inconsistently formatted and will be a bit difficult to work with. Since we want to get plot synopses from the OMDb API anyway, we can get these other data of interest from here as well.



## 1. Data Collection

### 1.1. The OMDb API

We're going to use the OMDb API, which is free once again (http://www.omdbapi.com/). To use it, you need to get an API key. You can get a free key, which limits to 1000 requests per day. More requests (and a poster API) are avilable if you patronize the OMDb Patreon. The OMDb website explains how the API works pretty well. We'll use the `requests` package to make calls to the OMDb API. 

There are two ways we can get movie data using this API, either by movie title or IMDb ID. We'll want to be able to handle either, as I have a feeling that it will be easier to get a random list of IMDb ID's than a random list of movie titles? It is also more exact to use ID's since movie titles aren't unique (e.g.,"The Mummy" can either refer to the Brendan Fraser masterpiece, or the Tom Cruise dumpster fire).

Note, I'm using format strings (`f'some {text}'`) which is a Python 3.6 feature (equivalent to `'some {}.format(text)'`). If you want to run this on 3.5 or something, uh change it all by hand.

In [1]:
import os
import json

import requests

from dotenv import load_dotenv, find_dotenv
#find .env automagically by walking up directories until it's found
dotenv_path = find_dotenv()
# load up the entries as environment variables
load_dotenv(dotenv_path)

True

In [6]:
API_KEY = os.environ.get('OMDB_API_KEY')

def get_movie_data(name, year=None, api_key=API_KEY, full_plot=False):
    """Returns json from OMDb API for movie."""
    api_url = f'http://private.omdbapi.com/?apikey={api_key}'
    # There are actually utilities that can automatically escape invalid characters
    # but here we do the manual dumb solution
    name = name.lower().replace(' ', '+')
    
    # Can either manually extend the url with parameters or...
    #api_url += f'&t={name}'
    # if year is not None:
    #     api_url += f'&y={year}'
    # if full_plot:
    #     api_url += '&plot=full'
    # response = requests.get(api_url)
    
    # ... have `requests` do it for you!
    body = {'t': name}
    if year is not None:
        body['y'] = year
    if full_plot:
        body['plot'] = 'full'
    response = requests.get(api_url, params=body)
    
    # Throw error if API call has an error
    if response.status_code != 200:
        raise requests.HTTPError(
            f'Couldn\'t call API. Error {response.status_code}.'
        )
     
    # Throw error if movie not found
    # if response.json()['Response'] == 'False':
    #     raise ValueError(response.json()['Error'])
    
    return response.json()

Let's test it out, and see what kind of information we get from our request.

In [7]:
response_json = get_movie_data('Snakes on a Plane')
print(json.dumps(response_json, indent=4))

{
    "Title": "Snakes on a Plane",
    "Year": "2006",
    "Rated": "R",
    "Released": "18 Aug 2006",
    "Runtime": "105 min",
    "Genre": "Action, Adventure, Crime, Thriller",
    "Director": "David R. Ellis",
    "Writer": "John Heffernan (screenplay), Sebastian Gutierrez (screenplay), David Dalessandro (story), John Heffernan (story)",
    "Actors": "Samuel L. Jackson, Julianna Margulies, Nathan Phillips, Rachel Blanchard",
    "Plot": "An F.B.I. Agent takes on a plane full of deadly venomous snakes, deliberately released to kill a witness being flown from Honolulu to Los Angeles to testify against a mob boss.",
    "Language": "English",
    "Country": "Germany, USA, Canada",
    "Awards": "3 wins & 7 nominations.",
    "Poster": "https://m.media-amazon.com/images/M/MV5BZDY3ODM2YTgtYTU5NC00MTE4LTkzNjktMzNhZWZmMzJjMWRjXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg",
    "Ratings": [
        {
            "Source": "Internet Movie Database",
            "Value": "5.5/10"
        },


In [8]:
# With year provided
response_json = get_movie_data('Snakes on a Plane', year=2006)
print(json.dumps(response_json, indent=4))

{
    "Title": "Snakes on a Plane",
    "Year": "2006",
    "Rated": "R",
    "Released": "18 Aug 2006",
    "Runtime": "105 min",
    "Genre": "Action, Adventure, Crime, Thriller",
    "Director": "David R. Ellis",
    "Writer": "John Heffernan (screenplay), Sebastian Gutierrez (screenplay), David Dalessandro (story), John Heffernan (story)",
    "Actors": "Samuel L. Jackson, Julianna Margulies, Nathan Phillips, Rachel Blanchard",
    "Plot": "An F.B.I. Agent takes on a plane full of deadly venomous snakes, deliberately released to kill a witness being flown from Honolulu to Los Angeles to testify against a mob boss.",
    "Language": "English",
    "Country": "Germany, USA, Canada",
    "Awards": "3 wins & 7 nominations.",
    "Poster": "https://m.media-amazon.com/images/M/MV5BZDY3ODM2YTgtYTU5NC00MTE4LTkzNjktMzNhZWZmMzJjMWRjXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg",
    "Ratings": [
        {
            "Source": "Internet Movie Database",
            "Value": "5.5/10"
        },


In [9]:
# With incorrect year provided
response_json = get_movie_data('Snakes on a Plane', year=2005)
print(json.dumps(response_json, indent=4))

{
    "Response": "False",
    "Error": "Movie not found!"
}


In [10]:
# With full plot:
response_json = get_movie_data('Snakes on a Plane', full_plot=True)
print(json.dumps(response_json, indent=4))

{
    "Title": "Snakes on a Plane",
    "Year": "2006",
    "Rated": "R",
    "Released": "18 Aug 2006",
    "Runtime": "105 min",
    "Genre": "Action, Adventure, Crime, Thriller",
    "Director": "David R. Ellis",
    "Writer": "John Heffernan (screenplay), Sebastian Gutierrez (screenplay), David Dalessandro (story), John Heffernan (story)",
    "Actors": "Samuel L. Jackson, Julianna Margulies, Nathan Phillips, Rachel Blanchard",
    "Plot": "While practicing motocross in Hawaii, Sean Jones witnesses the brutal murder of an important American prosecutor by the powerful mobster Eddie Kim. He is protected and persuaded by the FBI agent Neville Flynn to testify against Eddie in Los Angeles. They embark in the red-eye Flight 121 of Pacific Air, occupying the entire first-class. However, Eddie dispatches hundred of different species of snakes airborne with a time operated device in the luggage to release the snakes in the flight with the intent of crashing the plane. Neville and the passe

In [11]:
# Film without a lot of data
# Note you can leave out the apostrophe and OMDb will still find the film
response_json = get_movie_data('Boarding School Girls\' Pajama Parade', full_plot=False)
print(json.dumps(response_json, indent=4))

{
    "Title": "Boarding School Girls' Pajama Parade",
    "Year": "1900",
    "Rated": "N/A",
    "Released": "01 Oct 1900",
    "Runtime": "N/A",
    "Genre": "N/A",
    "Director": "N/A",
    "Writer": "N/A",
    "Actors": "N/A",
    "Plot": "N/A",
    "Language": "N/A",
    "Country": "USA",
    "Awards": "N/A",
    "Poster": "N/A",
    "Ratings": [
        {
            "Source": "Internet Movie Database",
            "Value": "4.4/10"
        }
    ],
    "Metascore": "N/A",
    "imdbRating": "4.4",
    "imdbVotes": "9",
    "imdbID": "tt0325094",
    "Type": "movie",
    "DVD": "N/A",
    "BoxOffice": "N/A",
    "Production": "N/A",
    "Website": "N/A",
    "Response": "True"
}


In [12]:
# Nonsense film that doesn't exist
response_json = get_movie_data('Phantom of the cheese cake')
print(json.dumps(response_json, indent=4))

{
    "Response": "False",
    "Error": "Movie not found!"
}


In [13]:
# Nonsense film that doesn't exist
response_json = get_movie_data('ar')
print(json.dumps(response_json, indent=4))

{
    "Title": "The Fairy King of Ar",
    "Year": "1998",
    "Rated": "N/A",
    "Released": "14 May 1998",
    "Runtime": "93 min",
    "Genre": "Adventure, Family, Fantasy",
    "Director": "Paul Matthews",
    "Writer": "Christopher Atkins, Paul Matthews",
    "Actors": "Corbin Bernsen, Glynis Barber, Jameson Baltes, Brittney Bomann",
    "Plot": "Since as far back as Kyle and Evie Preston can remember, their grandmother told fantastical tales about elves who had been trapped underground by giants for thousands of years. Now ...",
    "Language": "English",
    "Country": "UK",
    "Awards": "N/A",
    "Poster": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTU0NzA2MzM4Nl5BMl5BanBnXkFtZTcwODI5NzQyMQ@@._V1_SX300.jpg",
    "Ratings": [
        {
            "Source": "Internet Movie Database",
            "Value": "5.0/10"
        }
    ],
    "Metascore": "N/A",
    "imdbRating": "5.0",
    "imdbVotes": "272",
    "imdbID": "tt0139060",
    "Type": "movie",
    "DVD": "N/A"