### Author: Meet K Sahni
#### In this notebook:
a) Netflix shows and movies excel is read into a dataframe.
b) The shows and movie titles are stored in a list, which in turn, is passed to OMDB API to retrive their details.
c) OMDB API results are individually saved in lists, which are then used to create a dataframe.
d) The Netflix dataframe is then cleaned and created into a csv to be used to create SQL database.

In [5]:
import pandas as pd
import requests
import json
from config import api_key
from pprint import pprint

In [8]:
netflix_shows = pd.read_excel("Resources/netflix.xlsx")
netflix_shows.head()

Unnamed: 0,title,rating,ratingLevel,ratingDescription,release year,user rating score,user rating size
0,White Chicks,PG-13,"crude and sexual humor, language and some drug...",80,2004,82.0,80
1,Lucky Number Slevin,R,"strong violence, sexual content and adult lang...",100,2006,,82
2,Grey's Anatomy,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2016,98.0,80
3,Prison Break,TV-14,Parents strongly cautioned. May be unsuitable ...,90,2008,98.0,80
4,How I Met Your Mother,TV-PG,Parental guidance suggested. May not be suitab...,70,2014,94.0,80


In [9]:
len(netflix_shows)

1000

In [64]:
response = requests.get(f"http://www.omdbapi.com/?i=tt3896198&apikey={api_key}")
response

<Response [200]>

In [17]:
pprint(response.json())


{'Actors': 'Chris Pratt, Zoe Saldana, Dave Bautista',
 'Awards': 'Nominated for 1 Oscar. 15 wins & 58 nominations total',
 'BoxOffice': '$389,813,101',
 'Country': 'United States',
 'DVD': '10 Jul 2017',
 'Director': 'James Gunn',
 'Genre': 'Action, Adventure, Comedy',
 'Language': 'English',
 'Metascore': '67',
 'Plot': 'The Guardians struggle to keep together as a team while dealing with '
         "their personal family issues, notably Star-Lord's encounter with his "
         'father the ambitious celestial being Ego.',
 'Poster': 'https://m.media-amazon.com/images/M/MV5BNjM0NTc0NzItM2FlYS00YzEwLWE0YmUtNTA2ZWIzODc2OTgxXkEyXkFqcGdeQXVyNTgwNzIyNzg@._V1_SX300.jpg',
 'Production': 'Marvel Studios, Walt Disney Pictures',
 'Rated': 'PG-13',
 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.6/10'},
             {'Source': 'Rotten Tomatoes', 'Value': '85%'},
             {'Source': 'Metacritic', 'Value': '67/100'}],
 'Released': '05 May 2017',
 'Response': 'True',
 'Runtime': 

In [16]:
netflix_titles = netflix_shows["title"].tolist()
netflix_titles

['White Chicks',
 'Lucky Number Slevin',
 "Grey's Anatomy",
 'Prison Break',
 'How I Met Your Mother',
 'Supernatural',
 'Breaking Bad',
 'The Vampire Diaries',
 'The Walking Dead',
 'Pretty Little Liars',
 'Once Upon a Time',
 'Sherlock',
 'Death Note',
 'Naruto',
 'The Hunter',
 'Lottie Dottie Chicken',
 'Arrow',
 'Black Mirror',
 'The Originals',
 'The 100',
 'Masha and the Bear',
 'Ã“ Pai, Ã“ - Look at This',
 'Nymphomaniac: Volume 1',
 'Hunter X Hunter (2011)',
 "Marvel's Luke Cage",
 "Marvel's Iron Fist",
 'Narcos',
 'The Flash',
 'The Seven Deadly Sins',
 'Scream',
 'The Do-Over',
 'OperaÃ§Ãµes Especiais',
 0.03,
 'Trollhunters',
 'Fearless',
 'Hyena Road',
 'Santa Clarita Diet',
 'Sex, Love and Therapy',
 'White Girl',
 'Girlboss',
 'City of God: 10 Years Later',
 '13 Reasons Why',
 'Sand Castle',
 'Salvador Martinha - Tip of the Tongue',
 'Sandy Wexler',
 'O Amor no DivÃ£',
 'Stronger Than the World',
 'Back and Forth',
 'Happily Married',
 'Pandora',
 'Lucky Number Slevin',
 

#### Function to retrieve values from dictionaries inside the List

In [65]:
def get_rating(keyname,source):
    for dic in keyname:
        if dic["Source"] == source:
            return dic["Value"]

#### This part of the code reads the Netflix titles from the list, passes it to OMDB API and parses OMDB response into respective lists.

In [68]:
IMDB_IDs = []
Titles = []
Year = []
Rated = []
Released = []
Runtime = []
Genre = []
Director = []
Writer = []
Actors = []
Language = []
Country = []
imdbRatings = []
rtRatings = []
metacriticRatings = []
Type = []
   
for i in range(len(netflix_titles)):
    query_url = f"http://www.omdbapi.com/?t={netflix_titles[i]}&apikey={api_key}"
    response = requests.get(query_url).json()
    
    try:
        IMDB_IDs.append (response["imdbID"])
        Titles.append (response["Title"])
        Year.append(response["Year"])
        Rated.append(response["Rated"])
        Released.append(response["Released"])
        Runtime.append(response["Runtime"])
        Genre.append(response["Genre"])
        Director.append(response["Director"])
        Writer.append(response["Writer"])
        Actors.append (response["Actors"])
        Language.append(response["Language"])
        Country.append(response["Country"])
        imdbRatings.append(response["imdbRating"])
        rtRatings.append(get_rating(response['Ratings'], "Rotten Tomatoes"))
        metacriticRatings.append(get_rating(response['Ratings'],"Metacritic"))
        Type.append(response["Type"])
        
    except:
       print(f"Movie/Show {netflix_titles[i]} not found!")
       pass

print(len(Titles))

http://www.omdbapi.com/?t=White Chicks&apikey=90cc3fd0
http://www.omdbapi.com/?t=Lucky Number Slevin&apikey=90cc3fd0
http://www.omdbapi.com/?t=Grey's Anatomy&apikey=90cc3fd0
http://www.omdbapi.com/?t=Prison Break&apikey=90cc3fd0
http://www.omdbapi.com/?t=How I Met Your Mother&apikey=90cc3fd0
http://www.omdbapi.com/?t=Supernatural&apikey=90cc3fd0
http://www.omdbapi.com/?t=Breaking Bad&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Vampire Diaries&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Walking Dead&apikey=90cc3fd0
http://www.omdbapi.com/?t=Pretty Little Liars&apikey=90cc3fd0
http://www.omdbapi.com/?t=Once Upon a Time&apikey=90cc3fd0
http://www.omdbapi.com/?t=Sherlock&apikey=90cc3fd0
http://www.omdbapi.com/?t=Death Note&apikey=90cc3fd0
http://www.omdbapi.com/?t=Naruto&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Hunter&apikey=90cc3fd0
http://www.omdbapi.com/?t=Lottie Dottie Chicken&apikey=90cc3fd0
http://www.omdbapi.com/?t=Arrow&apikey=90cc3fd0
http://www.omdbapi.com/?t=Black Mirro

http://www.omdbapi.com/?t=Cooper Barrett's Guide to Surviving Life&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Great British Baking Show&apikey=90cc3fd0
Movie/Show The Great British Baking Show not found!
http://www.omdbapi.com/?t=Life in Pieces&apikey=90cc3fd0
http://www.omdbapi.com/?t=Crazy Ex-Girlfriend&apikey=90cc3fd0
http://www.omdbapi.com/?t=Easy&apikey=90cc3fd0
http://www.omdbapi.com/?t=Santa Clarita Diet&apikey=90cc3fd0
http://www.omdbapi.com/?t=No Tomorrow&apikey=90cc3fd0
http://www.omdbapi.com/?t=Minimalism: A Documentary About the Important Things&apikey=90cc3fd0
http://www.omdbapi.com/?t=Girlboss&apikey=90cc3fd0
http://www.omdbapi.com/?t=Hot Girls Wanted: Turned On&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Discovery&apikey=90cc3fd0
http://www.omdbapi.com/?t=13 Reasons Why&apikey=90cc3fd0
http://www.omdbapi.com/?t=Sandy Wexler&apikey=90cc3fd0
http://www.omdbapi.com/?t=Chewing Gum&apikey=90cc3fd0
http://www.omdbapi.com/?t=Win It All&apikey=90cc3fd0
http://www.omdbapi.co

http://www.omdbapi.com/?t=Gossip Girl&apikey=90cc3fd0
http://www.omdbapi.com/?t=How I Met Your Mother&apikey=90cc3fd0
http://www.omdbapi.com/?t=Glee&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Vampire Diaries&apikey=90cc3fd0
http://www.omdbapi.com/?t=Family Guy&apikey=90cc3fd0
http://www.omdbapi.com/?t=Criminal Minds&apikey=90cc3fd0
http://www.omdbapi.com/?t=Friends&apikey=90cc3fd0
http://www.omdbapi.com/?t=Mean Girls 2&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Secret Life of the American Teenager&apikey=90cc3fd0
http://www.omdbapi.com/?t=That '70s Show&apikey=90cc3fd0
http://www.omdbapi.com/?t=Gilmore Girls&apikey=90cc3fd0
http://www.omdbapi.com/?t=Parenthood&apikey=90cc3fd0
http://www.omdbapi.com/?t=Heartland&apikey=90cc3fd0
http://www.omdbapi.com/?t=Pretty Little Liars&apikey=90cc3fd0
http://www.omdbapi.com/?t=Shameless (U.S.)&apikey=90cc3fd0
Movie/Show Shameless (U.S.) not found!
http://www.omdbapi.com/?t=Switched at Birth&apikey=90cc3fd0
http://www.omdbapi.com/?t=New Girl&ap

http://www.omdbapi.com/?t=Lie to Me&apikey=90cc3fd0
http://www.omdbapi.com/?t=Prison Break&apikey=90cc3fd0
http://www.omdbapi.com/?t=NCIS&apikey=90cc3fd0
http://www.omdbapi.com/?t=Burn Notice&apikey=90cc3fd0
http://www.omdbapi.com/?t=Breaking Bad&apikey=90cc3fd0
http://www.omdbapi.com/?t=Criminal Minds&apikey=90cc3fd0
http://www.omdbapi.com/?t=White Collar&apikey=90cc3fd0
http://www.omdbapi.com/?t=Luther&apikey=90cc3fd0
http://www.omdbapi.com/?t=Hawaii Five-0&apikey=90cc3fd0
http://www.omdbapi.com/?t=House of Cards&apikey=90cc3fd0
http://www.omdbapi.com/?t=Last Man Standing&apikey=90cc3fd0
http://www.omdbapi.com/?t=Revenge&apikey=90cc3fd0
http://www.omdbapi.com/?t=Person of Interest&apikey=90cc3fd0
http://www.omdbapi.com/?t=Scandal&apikey=90cc3fd0
http://www.omdbapi.com/?t=Touch&apikey=90cc3fd0
http://www.omdbapi.com/?t=Sherlock&apikey=90cc3fd0
http://www.omdbapi.com/?t=Breakout Kings&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Client List&apikey=90cc3fd0
http://www.omdbapi.com/?t=Ar

http://www.omdbapi.com/?t=Slugterra: Slug Fu Showdown&apikey=90cc3fd0
http://www.omdbapi.com/?t=Dinotrux&apikey=90cc3fd0
http://www.omdbapi.com/?t=Edgar Rice Burroughs' Tarzan and Jane&apikey=90cc3fd0
Movie/Show Edgar Rice Burroughs' Tarzan and Jane not found!
http://www.omdbapi.com/?t=Lego DC Comics: Batman Be-Leaguered&apikey=90cc3fd0
http://www.omdbapi.com/?t=Trollhunters&apikey=90cc3fd0
http://www.omdbapi.com/?t=Power Rangers Dino Charge&apikey=90cc3fd0
http://www.omdbapi.com/?t=Bunks&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Jungle Book&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Angry Birds Movie&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Secret Life of Pets&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Furchester Hotel&apikey=90cc3fd0
http://www.omdbapi.com/?t=Marvel's Hulk: Where Monsters Dwell&apikey=90cc3fd0
Movie/Show Marvel's Hulk: Where Monsters Dwell not found!
http://www.omdbapi.com/?t=Angry Birds&apikey=90cc3fd0
http://www.omdbapi.com/?t=Talking Tom and Friends

http://www.omdbapi.com/?t=Chicken Run&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Emperor's New Groove&apikey=90cc3fd0
http://www.omdbapi.com/?t=Lilo & Stitch&apikey=90cc3fd0
http://www.omdbapi.com/?t=Spy Kids 3: Game Over&apikey=90cc3fd0
http://www.omdbapi.com/?t=Annie&apikey=90cc3fd0
http://www.omdbapi.com/?t=Mulan 2&apikey=90cc3fd0
Movie/Show Mulan 2 not found!
http://www.omdbapi.com/?t=Chicken Little&apikey=90cc3fd0
http://www.omdbapi.com/?t=Lilo & Stitch 2: Stitch Has A Glitch&apikey=90cc3fd0
http://www.omdbapi.com/?t=Teenage Mutant Ninja Turtles&apikey=90cc3fd0
http://www.omdbapi.com/?t=Bee Movie&apikey=90cc3fd0
http://www.omdbapi.com/?t=Kung Fu Panda&apikey=90cc3fd0
http://www.omdbapi.com/?t=Cars Toons: Mater's Tall Tales&apikey=90cc3fd0
http://www.omdbapi.com/?t=Super Why!&apikey=90cc3fd0
http://www.omdbapi.com/?t=DreamWorks Shrek's Swamp Stories&apikey=90cc3fd0
http://www.omdbapi.com/?t=Beverly Hills Chihuahua 3: Viva La Fiesta!&apikey=90cc3fd0
http://www.omdbapi.com/?t=Drea

http://www.omdbapi.com/?t=The Magic School Bus&apikey=90cc3fd0
http://www.omdbapi.com/?t=Animaniacs&apikey=90cc3fd0
http://www.omdbapi.com/?t=Disney Animation Collection: Vol. 4: The Tortoise and the Hare&apikey=90cc3fd0
http://www.omdbapi.com/?t=PopPixie&apikey=90cc3fd0
Movie/Show PopPixie not found!
http://www.omdbapi.com/?t=Jack and the Cuckoo-Clock Heart&apikey=90cc3fd0
Movie/Show Jack and the Cuckoo-Clock Heart not found!
http://www.omdbapi.com/?t=The Pirate Fairy&apikey=90cc3fd0
http://www.omdbapi.com/?t=Axel: The Biggest Little Hero&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Adventures of Figaro Pho&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Boxcar Children&apikey=90cc3fd0
http://www.omdbapi.com/?t=Pet Pals in Windland&apikey=90cc3fd0
Movie/Show Pet Pals in Windland not found!
http://www.omdbapi.com/?t=Walt Disney Animation Studios Short Films Collection&apikey=90cc3fd0
http://www.omdbapi.com/?t=Alpha and Omega: Dino Digs&apikey=90cc3fd0
http://www.omdbapi.com/?t=Alice Thr

http://www.omdbapi.com/?t=Anastasia&apikey=90cc3fd0
http://www.omdbapi.com/?t=Antz&apikey=90cc3fd0
http://www.omdbapi.com/?t=Babe: Pig in the City&apikey=90cc3fd0
http://www.omdbapi.com/?t=The Road to El Dorado&apikey=90cc3fd0
http://www.omdbapi.com/?t=Fantasia 2000&apikey=90cc3fd0
http://www.omdbapi.com/?t=Fantasia&apikey=90cc3fd0
http://www.omdbapi.com/?t=Harry and the Hendersons&apikey=90cc3fd0
http://www.omdbapi.com/?t=Pooh's Grand Adventure: The Search for Christopher Robin&apikey=90cc3fd0
http://www.omdbapi.com/?t=Cats & Dogs&apikey=90cc3fd0
http://www.omdbapi.com/?t=Atlantis: The Lost Empire&apikey=90cc3fd0
http://www.omdbapi.com/?t=FernGully: The Last Rainforest&apikey=90cc3fd0
http://www.omdbapi.com/?t=Balto 2: Wolf Quest&apikey=90cc3fd0
Movie/Show Balto 2: Wolf Quest not found!
http://www.omdbapi.com/?t=Balto&apikey=90cc3fd0
http://www.omdbapi.com/?t=Back to the Secret Garden&apikey=90cc3fd0
http://www.omdbapi.com/?t=Tarzan & Jane&apikey=90cc3fd0
http://www.omdbapi.com/?t=Atl

In [69]:
netflix_df = pd.DataFrame({"IMDB_id": IMDB_IDs,"Title":Titles,"Year":Year,"Rated": Rated,"Released": Released,
                         "Runtime": Runtime,"Genre": Genre,"Director": Director,"Writer": Writer, "Actors":Actors,
                         "Language":Language,"Country":Country,"IMDB_rating":imdbRatings,"RT_rating":rtRatings,
                         "Metacritic_rating":metacriticRatings, "Type":Type})

netflix_df.head()

Unnamed: 0,IMDB_id,Title,Year,Rated,Released,Runtime,Genre,Director,Writer,Actors,Language,Country,IMDB_rating,RT_rating,Metacritic_rating,Type
0,tt0381707,White Chicks,2004,PG-13,23 Jun 2004,109 min,"Comedy, Crime",Keenen Ivory Wayans,"Keenen Ivory Wayans, Shawn Wayans, Marlon Wayans","Marlon Wayans, Shawn Wayans, Busy Philipps",English,United States,5.6,15%,41/100,movie
1,tt0425210,Lucky Number Slevin,2006,R,07 Apr 2006,110 min,"Crime, Drama, Thriller",Paul McGuigan,Jason Smilovic,"Josh Hartnett, Ben Kingsley, Morgan Freeman",English,"United States, Germany, United Kingdom, Canada",7.7,,53/100,movie
2,tt0413573,Grey's Anatomy,2005–,TV-14,27 Mar 2005,41 min,"Drama, Romance",,Shonda Rhimes,"Ellen Pompeo, Chandra Wilson, James Pickens Jr.",English,United States,7.5,,,series
3,tt0455275,Prison Break,2005–2017,TV-14,29 Aug 2005,44 min,"Action, Crime, Drama",,Paul T. Scheuring,"Dominic Purcell, Wentworth Miller, Amaury Nolasco","English, Arabic, Spanish","United Kingdom, United States",8.3,,,series
4,tt0460649,How I Met Your Mother,2005–2014,TV-14,19 Sep 2005,22 min,"Comedy, Romance",,"Carter Bays, Craig Thomas","Josh Radnor, Jason Segel, Cobie Smulders","English, Persian, Chinese",United States,8.3,,,series


In [70]:
len(netflix_df)

914

### Data Cleaning

In [80]:
netflix_df["Country"].unique()

array(['United States', 'United States, Germany, United Kingdom, Canada',
       'United Kingdom, United States', 'United States, Canada', 'Japan',
       'Australia', 'Brazil', 'United Kingdom', 'N/A',
       'United States, Colombia, Mexico', 'United States, Mexico',
       'Canada', 'Portugal', 'France, Belgium', 'Mexico',
       'United States, New Zealand', 'Canada, United States',
       'Switzerland', 'United States, Japan', 'USA, Georgia, Ghana',
       'United States, Germany', 'Germany, United States',
       'Ireland, Canada, United States, United Kingdom',
       'United States, United Kingdom, Bulgaria', 'Spain', 'Turkey',
       'Spain, Portugal', 'United Kingdom, Ireland',
       'United States, South Africa, India',
       'United Kingdom, France, United States',
       'France, Germany, Belgium, United States',
       'Canada, France, United States, Germany', 'Brazil, France', 'UK',
       'United States, Australia', 'United States, United Kingdom',
       'Finland', '

In [79]:
netflix_df = netflix_df.replace({"USA":"United States"})

In [82]:
netflix_df = netflix_df.replace({"UK":"United Kingdom"})

In [83]:
netflix_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 914 entries, 0 to 913
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   IMDB_id            914 non-null    object
 1   Title              914 non-null    object
 2   Year               914 non-null    object
 3   Rated              914 non-null    object
 4   Released           914 non-null    object
 5   Runtime            914 non-null    object
 6   Genre              914 non-null    object
 7   Director           914 non-null    object
 8   Writer             914 non-null    object
 9   Actors             914 non-null    object
 10  Language           914 non-null    object
 11  Country            914 non-null    object
 12  IMDB_rating        914 non-null    object
 13  RT_rating          316 non-null    object
 14  Metacritic_rating  259 non-null    object
 15  Type               914 non-null    object
dtypes: object(16)
memory usage: 114.4+ KB


In [86]:
netflix_df["Duplicate"] = netflix_df.duplicated(keep = 'first')

In [87]:
netflix_df["Duplicate"].value_counts()

True     469
False    445
Name: Duplicate, dtype: int64

In [89]:
netflix_df["Duplicate1"] = netflix_df.duplicated('IMDB_id',keep = 'first')

In [90]:
netflix_df["Duplicate1"].value_counts()

True     470
False    444
Name: Duplicate1, dtype: int64

In [92]:
clean_netflixdf = netflix_df.drop_duplicates(subset=['IMDB_id',"Title"])

In [93]:
len(clean_netflixdf)

444

#### Create CSV to be used for importing in SQL database (via pandas)

In [94]:
clean_netflixdf.to_csv("OutputData/NetflixDB.csv", index = False)