# <font color=blue>Movie-Novels Dataset</font>

##  Introduction 

This notebook creates a custom dataset of books and movies which are based on the books. Kaggle IMDB dataset is used for getting the movies data and corresponding books information is obtained from the google books store with the help of google books API. This notebook demonstrates the tasks of acquiring, extracting, cleaning data for creating a dataset with relevant information.

The Dataset provides following information about books and movies - 

    - Books: which category section book is located,the publish date of the books and author name of books
    - Movies: indicates the director of the movie, the runtime time of the movie, when the movies was released and the rating of the movie and what genre does the movie fall into.
    
The final dataset can be used in the analysis such as - Finding trends in popularity of the movies that were based on novels in the last 10 years



## Import Libraries

In [1]:
import pandas as pd
import numpy as np
import re
import csv, json
import urllib.request
import requests
import time
from pandas import DataFrame
from pprint import pprint

## Reading movie datasets

The movie data was gathered from the IMDb on Kaggle. The IMDB is a popular movie website and it combines movie plot description, ratings,reviews, release dates, and many more aspects. The movie data has been scraped from the publicly available website https://www.imdb.com on January 1, 2020. The movies dataset includes 85,855 movies from year 1894 to 2020. 


It contain information about the keywords and tagline indicate that these movies were based on or inspired by the books.

In [2]:
movies = pd.read_csv("data/movies.csv")
movies.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,tagline,...,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,...,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,2015-06-09,5562,6.5,2015,137999900.0,1392446000.0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,...,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,2015-05-13,6185,7.1,2015,137999900.0,348161300.0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,...,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015-03-18,2480,6.3,2015,101200000.0,271619000.0
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,...,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,2015-12-15,5292,7.5,2015,183999900.0,1902723000.0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,...,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,2015-04-01,2947,7.3,2015,174799900.0,1385749000.0


In [3]:
book_based_movies = movies[["id", "original_title", "keywords", "tagline"]].reset_index(drop = True)
book_based_movies.head(10)

Unnamed: 0,id,original_title,keywords,tagline
0,135397,Jurassic World,monster|dna|tyrannosaurus rex|velociraptor|island,The park is open.
1,76341,Mad Max: Fury Road,future|chase|post-apocalyptic|dystopia|australia,What a Lovely Day.
2,262500,Insurgent,based on novel|revolution|dystopia|sequel|dyst...,One Choice Can Destroy You
3,140607,Star Wars: The Force Awakens,android|spaceship|jedi|space opera|3d,Every generation has a story.
4,168259,Furious 7,car race|speed|revenge|suspense|car,Vengeance Hits Home
5,281957,The Revenant,father-son relationship|rape|based on novel|mo...,"(n. One who has returned, as if from the dead.)"
6,87101,Terminator Genisys,saving the world|artificial intelligence|cybor...,Reset the future
7,286217,The Martian,based on novel|mars|nasa|isolation|botanist,Bring Him Home
8,211672,Minions,assistant|aftercreditsstinger|duringcreditssti...,"Before Gru, they had a history of bad bosses"
9,150540,Inside Out,dream|cartoon|imaginary friend|animation|kid,Meet the little voices inside your head.


#### <font color=darkblue>Searching for keywords in the Keywords column for novel based movies</font>

In [4]:
def search_book_keywords(keywords):
    try:
        match = re.search("(\|{0,1}[\w\s]*(book|novel)[\w\s]*\|{0,1})", keywords)
        if match:
            return match.group(0)
        else:
            return None
    except TypeError:
        return None
    
phrases_book_novel = set()
for data in set(book_based_movies["keywords"].tolist()):
    phrases = search_book_keywords(data)
    phrases_book_novel.add(phrases)

print(phrases_book_novel)

{'|book burning|', '|novelist', 'comic book|', 'bookshop|', '|novelist|', '|based on comic book', 'based on novel', 'based on novel|', None, 's book|', 'book store', '|based on graphic novel|', '|cookbook|', '|comic book artist|', '|based on comic book|', 'reference to the book of revelation', '|tell all novel|', 's book', '|book signing|', '|magic book|', 'plagiarized book|', '|based on novel', 'based on graphic novel|', '|yearbook|', 'book|', '|bookshop|', '|comic book', '|notebook|', '|phone book|', 'motivational book', '|based on novel|', '|comic book|', '|based on young adult book', '|book|'}


#### <font color=darkblue>Searching for novel based movies based on the keywords</font>

In [5]:
def book_based_movie(keywords):
    try:
        match1 = re.search("(\|{0,1}based[\w\s]*(book|novel)\|{0,1})", keywords)
        match2 = re.search("(\|{0,1}inspired[\w\s]*novel\|{0,1})", keywords)
        if match1 or match2:
            return True
        else:
            return False
    except TypeError:
        return None
    
book_based_movies["isNovelBased_0"] = book_based_movies["keywords"].apply(book_based_movie)  
book_based_movies.head()

Unnamed: 0,id,original_title,keywords,tagline,isNovelBased_0
0,135397,Jurassic World,monster|dna|tyrannosaurus rex|velociraptor|island,The park is open.,False
1,76341,Mad Max: Fury Road,future|chase|post-apocalyptic|dystopia|australia,What a Lovely Day.,False
2,262500,Insurgent,based on novel|revolution|dystopia|sequel|dyst...,One Choice Can Destroy You,True
3,140607,Star Wars: The Force Awakens,android|spaceship|jedi|space opera|3d,Every generation has a story.,False
4,168259,Furious 7,car race|speed|revenge|suspense|car,Vengeance Hits Home,False


#### <font color=darkblue> Searching for keywords in the tagline column for novel based movies </font>

In [6]:
def search_book_tagline(tagline):
    try:
        match = re.search("(.*(book|novel).*)", tagline)
        if match:
            return match.group(0)
        else:
            return None
    except TypeError:
        return None
    
phrases_book_novel_1 = set()
for data in set(book_based_movies["tagline"].tolist()):
    phrases = search_book_tagline(data)
    phrases_book_novel_1.add(phrases)

print(phrases_book_novel_1)

{"It's the oldest con in the book.", 'Based on the best-selling novel', "He wrote the book on life's big questions. But the truth is he hasn't got a clue.", 'Every book has a life of its own.', "From the book that's an American tradition… from the smash-hit Broadway show… the entertainment of the year!", 'That humble radiant terrific book is now a humble radiant terrific movie.', 'Weird sex · Obsession · Comic books', 'Based on the novel of Chico Xavier', None, 'The most beloved Pulitzer Prize book now comes vividly alive on the screen!', "Don't judge a book by its hair color!", 'A true story based on the award-winning book by Tobias Wolff.', 'Where her book ended, their story began.', "I've decided to go on tour and support my new book, Uganda Be Kidding Me. I think we all know how much I love the sound of my own voice!", 'Love goes toward love as schoolboys from their books', 'The #1 novel of the year - now a motion picture!', 'Based on the novel by Henry James', "You don't have to k

#### <font color=darkblue> Searching for novel based movies based on the keywords </font>

In [7]:
def book_based_movie_1(tagline):
    try:
        match = re.search("(.*novel.*)", tagline)
        if match:
            return True
        else:
            return False
    except TypeError:
        return None
    
book_based_movies["isNovelBased_1"] = book_based_movies["tagline"].apply(book_based_movie_1)  
book_based_movies.head()

Unnamed: 0,id,original_title,keywords,tagline,isNovelBased_0,isNovelBased_1
0,135397,Jurassic World,monster|dna|tyrannosaurus rex|velociraptor|island,The park is open.,False,False
1,76341,Mad Max: Fury Road,future|chase|post-apocalyptic|dystopia|australia,What a Lovely Day.,False,False
2,262500,Insurgent,based on novel|revolution|dystopia|sequel|dyst...,One Choice Can Destroy You,True,False
3,140607,Star Wars: The Force Awakens,android|spaceship|jedi|space opera|3d,Every generation has a story.,False,False
4,168259,Furious 7,car race|speed|revenge|suspense|car,Vengeance Hits Home,False,False


#### <font color=darkblue> Combining both the records obtained using keywords and tagline </font>

In [8]:
book_based_movies["novelBased"] = book_based_movies["isNovelBased_0"] + book_based_movies["isNovelBased_1"]

book_based_movies.drop(["keywords", "tagline", "isNovelBased_0", "isNovelBased_1"], axis = 1, inplace = True)
book_based_movies.head()

Unnamed: 0,id,original_title,novelBased
0,135397,Jurassic World,0
1,76341,Mad Max: Fury Road,0
2,262500,Insurgent,1
3,140607,Star Wars: The Force Awakens,0
4,168259,Furious 7,0


In [9]:
book_based_movies[book_based_movies['novelBased'] == 1]

Unnamed: 0,id,original_title,novelBased
2,262500,Insurgent,1
5,281957,The Revenant,1
7,286217,The Martian,1
10,206647,Spectre,1
23,216015,Fifty Shades of Grey,1
...,...,...,...
10667,14384,Soldier Blue,1
10722,28295,I Saw What You Did,1
10808,40060,The Manitou,1
10812,31948,Gray Lady Down,1


In [10]:
book_based_movies[book_based_movies['novelBased'] == 2]

Unnamed: 0,id,original_title,novelBased
10660,10671,Airport,2


In [11]:

movies.drop(["keywords", "tagline"], axis = 1, inplace = True)

movies = pd.merge(movies,book_based_movies,how="left",left_on=["id", "original_title"],right_on=["id", "original_title"])

movies.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj,novelBased
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,2015-06-09,5562,6.5,2015,137999900.0,1392446000.0,0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,2015-05-13,6185,7.1,2015,137999900.0,348161300.0,0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015-03-18,2480,6.3,2015,101200000.0,271619000.0,1
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,2015-12-15,5292,7.5,2015,183999900.0,1902723000.0,0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,2015-04-01,2947,7.3,2015,174799900.0,1385749000.0,0


#### <font color=darkblue> Dropping movies records that aren't based on novels </font>

In [12]:
movies.drop(movies.index[(movies["novelBased"] != 1)], axis = 0, inplace = True)

In [13]:
movies = movies.reset_index(drop=True)

In [14]:
movies.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj,novelBased
0,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015-03-18,2480,6.3,2015,101200000.0,271619000.0,1
1,281957,tt1663202,9.1107,135000000,532950503,The Revenant,Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...,http://www.foxmovies.com/movies/the-revenant,Alejandro González Iñárritu,"In the 1820s, a frontiersman, Hugh Glass, sets...",156,Western|Drama|Adventure|Thriller,Regency Enterprises|Appian Way|CatchPlay|Anony...,2015-12-25,3929,7.2,2015,124199900.0,490314200.0,1
2,286217,tt3659388,7.6674,108000000,595380321,The Martian,Matt Damon|Jessica Chastain|Kristen Wiig|Jeff ...,http://www.foxmovies.com/movies/the-martian,Ridley Scott,"During a manned mission to Mars, Astronaut Mar...",141,Drama|Adventure|Science Fiction,Twentieth Century Fox Film Corporation|Scott F...,2015-09-30,4572,7.6,2015,99359960.0,547749700.0,1
3,206647,tt2379713,6.200282,245000000,880674609,Spectre,Daniel Craig|Christoph Waltz|Léa Seydoux|Ralph...,http://www.sonypictures.com/movies/spectre/,Sam Mendes,A cryptic message from Bond’s past sends him o...,148,Action|Adventure|Crime,Columbia Pictures|Danjaq|B24,2015-10-26,3254,6.2,2015,225399900.0,810220300.0,1
4,216015,tt2322441,4.710402,40000000,569651467,Fifty Shades of Grey,Dakota Johnson|Jamie Dornan|Jennifer Ehle|Eloi...,https://www.facebook.com/fiftyshadesofgreymovie,Sam Taylor-Johnson,When college senior Anastasia Steele steps in ...,125,Drama|Romance,Focus Features|Trigger Street Productions|Mich...,2015-02-11,1865,5.3,2015,36799980.0,524079100.0,1


Based on the keyword and tagline, we were able to create dataset of movies that are novel based. But there were no information on the writer for each of movies for our initial movies dataset and information was not provided in the movies.csv file. So, we used the second IMD_movies.csv to gather the author information.

In [15]:
moviesnewset = pd.read_csv("data/IMDb_movies.csv",low_memory=False)
moviesnewset.keys()


Index(['imdb_title_id', 'title', 'original_title', 'year', 'date_published',
       'genre', 'duration', 'country', 'language', 'director', 'writer',
       'production_company', 'actors', 'description', 'avg_vote', 'votes',
       'budget', 'usa_gross_income', 'worlwide_gross_income', 'metascore',
       'reviews_from_users', 'reviews_from_critics'],
      dtype='object')

In [16]:
moviesnewset.head()

Unnamed: 0,imdb_title_id,title,original_title,year,date_published,genre,duration,country,language,director,...,actors,description,avg_vote,votes,budget,usa_gross_income,worlwide_gross_income,metascore,reviews_from_users,reviews_from_critics
0,tt0000009,Miss Jerry,Miss Jerry,1894,1894-10-09,Romance,45,USA,,Alexander Black,...,"Blanche Bayliss, William Courtenay, Chauncey D...",The adventures of a female reporter in the 1890s.,5.9,154,,,,,1.0,2.0
1,tt0000574,The Story of the Kelly Gang,The Story of the Kelly Gang,1906,12/26/1906,"Biography, Crime, Drama",70,Australia,,Charles Tait,...,"Elizabeth Tait, John Tait, Norman Campbell, Be...",True story of notorious Australian outlaw Ned ...,6.1,589,"$2,250",,,,7.0,7.0
2,tt0001892,Den sorte drøm,Den sorte drøm,1911,8/19/1911,Drama,53,"Germany, Denmark",,Urban Gad,...,"Asta Nielsen, Valdemar Psilander, Gunnar Helse...",Two men of high rank are both wooing the beaut...,5.8,188,,,,,5.0,2.0
3,tt0002101,Cleopatra,Cleopatra,1912,11/13/1912,"Drama, History",100,USA,English,Charles L. Gaskill,...,"Helen Gardner, Pearl Sindelar, Miss Fielding, ...",The fabled queen of Egypt's affair with Roman ...,5.2,446,"$45,000",,,,25.0,3.0
4,tt0002130,L'Inferno,L'Inferno,1911,3/6/1911,"Adventure, Drama, Fantasy",68,Italy,Italian,"Francesco Bertolini, Adolfo Padovan",...,"Salvatore Papa, Arturo Pirovano, Giuseppe de L...",Loosely adapted from Dante's Divine Comedy and...,7.0,2237,,,,,31.0,14.0


In [17]:
moviesnewset.isna().sum()

imdb_title_id                0
title                        0
original_title               0
year                         0
date_published               0
genre                        0
duration                     0
country                     64
language                   833
director                    87
writer                    1572
production_company        4455
actors                      69
description               2115
avg_vote                     0
votes                        0
budget                   62145
usa_gross_income         70529
worlwide_gross_income    54839
metascore                72550
reviews_from_users        7597
reviews_from_critics     11797
dtype: int64

#### <font color=darkblue>Dropping movies that are not made in USA and not in english language </font>

In [18]:
moviesnewset.drop(moviesnewset.index[(moviesnewset["language"] != 'English')], axis = 0, inplace = True)
moviesnewset["CountryCheck"] = moviesnewset['country'].str.contains("USA")
moviesnewset = moviesnewset[moviesnewset["CountryCheck"] == True]
moviesnewset = moviesnewset[["original_title", "writer"]].reset_index(drop = True)
moviesnewset[['original_title','writer']]

Unnamed: 0,original_title,writer
0,Cleopatra,Victorien Sardou
1,"From the Manger to the Cross; or, Jesus of Naz...",Gene Gauntier
2,Richard III,"James Keane, William Shakespeare"
3,"Home, Sweet Home","D.W. Griffith, H.E. Aitken"
4,Traffic in Souls,
...,...,...
28380,Abduction 101,"Robin Entreinger, Steve Noir"
28381,Bulletproof 2,"Don Michael Paul, Rich Wilkes"
28382,VFW,"Max Brallier, Matthew McArdle"
28383,The Pilgrim's Progress,"John Bunyan, Robert Fernandez"


#### <font color=darkblue> Here we are combine both dataset </font>

Taking the dataset that include the tagline and keywords and dataset that include author and merge them together

In [19]:
movies = pd.merge(movies, moviesnewset, how = "inner", left_on = "original_title", right_on = "original_title")
movies

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,overview,...,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj,novelBased,writer
0,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,...,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015-03-18,2480,6.3,2015,1.012000e+08,2.716190e+08,1,Veronica Roth
1,281957,tt1663202,9.110700,135000000,532950503,The Revenant,Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...,http://www.foxmovies.com/movies/the-revenant,Alejandro González Iñárritu,"In the 1820s, a frontiersman, Hugh Glass, sets...",...,Western|Drama|Adventure|Thriller,Regency Enterprises|Appian Way|CatchPlay|Anony...,2015-12-25,3929,7.2,2015,1.241999e+08,4.903142e+08,1,D. Kerry Prior
2,216015,tt2322441,4.710402,40000000,569651467,Fifty Shades of Grey,Dakota Johnson|Jamie Dornan|Jennifer Ehle|Eloi...,https://www.facebook.com/fiftyshadesofgreymovie,Sam Taylor-Johnson,When college senior Anastasia Steele steps in ...,...,Drama|Romance,Focus Features|Trigger Street Productions|Mich...,2015-02-11,1865,5.3,2015,3.679998e+07,5.240791e+08,1,E.L. James
3,294254,tt4046784,3.968891,61000000,311256926,Maze Runner: The Scorch Trials,Dylan O'Brien|Kaya Scodelario|Thomas Brodie-Sa...,http://mazerunnermovies.com,Wes Ball,Thomas and his fellow Gladers face their great...,...,Action|Science Fiction|Thriller,Gotham Group|Temple Hill Entertainment|TSG Ent...,2015-09-09,1849,6.4,2015,5.611998e+07,2.863562e+08,1,James Dashner
4,257445,tt1051904,3.644541,58000000,150170815,Goosebumps,Jack Black|Dylan Minnette|Odeya Rush|Amy Ryan|...,http://www.goosebumps-movie.com/,Rob Letterman,A teenager teams up with the daughter of young...,...,Adventure|Horror|Comedy,Columbia Pictures|Original Film|Scholastic Ent...,2015-08-05,600,6.2,2015,5.335998e+07,1.381571e+08,1,R. L. Stine
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
209,15613,tt0106912,0.239060,19885552,19724334,Fire in the Sky,D. B. Sweeney|Robert Patrick|Craig Sheffer|Pet...,,Robert Lieberman,A group of men who were clearing bush for the ...,...,Drama|Horror|Thriller|Mystery,Paramount Pictures,1993-03-12,49,6.4,1993,3.001664e+07,2.977329e+07,1,Travis Walton
210,235,tt0092005,1.349043,8000000,52287000,Stand by Me,Wil Wheaton|River Phoenix|Corey Feldman|Jerry ...,,Rob Reiner,"The film's name comes from the song ""Stand By ...",...,Crime|Drama,Columbia Pictures,1986-08-07,792,7.5,1986,1.591525e+07,1.040201e+08,1,"Stephen King, Raynold Gideon"
211,14384,tt0066390,0.281532,0,0,Soldier Blue,Candice Bergen|Peter Strauss|Donald Pleasence|...,,Ralph Nelson,After a cavalry group is massacred by the Chey...,...,Action|Drama|Romance|Western,AVCO Embassy Pictures|Katzka-Loeb,1970-08-12,18,7.1,1970,0.000000e+00,0.000000e+00,1,Theodore V. Olsen
212,28295,tt0059297,0.014759,0,0,I Saw What You Did,Joan Crawford|John Ireland|Leif Erickson|Sara ...,,William Castle,When two teenagers make prank phone calls to s...,...,Horror|Mystery|Thriller,William Castle Productions,1965-07-21,11,4.4,1965,0.000000e+00,0.000000e+00,1,Ursula Curtiss


In [20]:
movies[["original_title", "writer","director","overview", "genres", "vote_average","release_year","novelBased" ]].reset_index(drop = True)

Unnamed: 0,original_title,writer,director,overview,genres,vote_average,release_year,novelBased
0,Insurgent,Veronica Roth,Robert Schwentke,Beatrice Prior must confront her inner demons ...,Adventure|Science Fiction|Thriller,6.3,2015,1
1,The Revenant,D. Kerry Prior,Alejandro González Iñárritu,"In the 1820s, a frontiersman, Hugh Glass, sets...",Western|Drama|Adventure|Thriller,7.2,2015,1
2,Fifty Shades of Grey,E.L. James,Sam Taylor-Johnson,When college senior Anastasia Steele steps in ...,Drama|Romance,5.3,2015,1
3,Maze Runner: The Scorch Trials,James Dashner,Wes Ball,Thomas and his fellow Gladers face their great...,Action|Science Fiction|Thriller,6.4,2015,1
4,Goosebumps,R. L. Stine,Rob Letterman,A teenager teams up with the daughter of young...,Adventure|Horror|Comedy,6.2,2015,1
...,...,...,...,...,...,...,...,...
209,Fire in the Sky,Travis Walton,Robert Lieberman,A group of men who were clearing bush for the ...,Drama|Horror|Thriller|Mystery,6.4,1993,1
210,Stand by Me,"Stephen King, Raynold Gideon",Rob Reiner,"The film's name comes from the song ""Stand By ...",Crime|Drama,7.5,1986,1
211,Soldier Blue,Theodore V. Olsen,Ralph Nelson,After a cavalry group is massacred by the Chey...,Action|Drama|Romance|Western,7.1,1970,1
212,I Saw What You Did,Ursula Curtiss,William Castle,When two teenagers make prank phone calls to s...,Horror|Mystery|Thriller,4.4,1965,1


### For convenience we are using only 90 records for movie dataset 

In [21]:
moviescheck = movies.iloc[0:90]

In [22]:
moviescheck

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,overview,...,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj,novelBased,writer
0,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,...,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015-03-18,2480,6.3,2015,1.012000e+08,2.716190e+08,1,Veronica Roth
1,281957,tt1663202,9.110700,135000000,532950503,The Revenant,Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...,http://www.foxmovies.com/movies/the-revenant,Alejandro González Iñárritu,"In the 1820s, a frontiersman, Hugh Glass, sets...",...,Western|Drama|Adventure|Thriller,Regency Enterprises|Appian Way|CatchPlay|Anony...,2015-12-25,3929,7.2,2015,1.241999e+08,4.903142e+08,1,D. Kerry Prior
2,216015,tt2322441,4.710402,40000000,569651467,Fifty Shades of Grey,Dakota Johnson|Jamie Dornan|Jennifer Ehle|Eloi...,https://www.facebook.com/fiftyshadesofgreymovie,Sam Taylor-Johnson,When college senior Anastasia Steele steps in ...,...,Drama|Romance,Focus Features|Trigger Street Productions|Mich...,2015-02-11,1865,5.3,2015,3.679998e+07,5.240791e+08,1,E.L. James
3,294254,tt4046784,3.968891,61000000,311256926,Maze Runner: The Scorch Trials,Dylan O'Brien|Kaya Scodelario|Thomas Brodie-Sa...,http://mazerunnermovies.com,Wes Ball,Thomas and his fellow Gladers face their great...,...,Action|Science Fiction|Thriller,Gotham Group|Temple Hill Entertainment|TSG Ent...,2015-09-09,1849,6.4,2015,5.611998e+07,2.863562e+08,1,James Dashner
4,257445,tt1051904,3.644541,58000000,150170815,Goosebumps,Jack Black|Dylan Minnette|Odeya Rush|Amy Ryan|...,http://www.goosebumps-movie.com/,Rob Letterman,A teenager teams up with the daughter of young...,...,Adventure|Horror|Comedy,Columbia Pictures|Original Film|Scholastic Ent...,2015-08-05,600,6.2,2015,5.335998e+07,1.381571e+08,1,R. L. Stine
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,13,tt0109830,6.715966,55000000,677945399,Forrest Gump,Tom Hanks|Robin Wright|Gary Sinise|Mykelti Wil...,,Robert Zemeckis,A man with a low IQ has accomplished great thi...,...,Comedy|Drama|Romance,Paramount Pictures,1994-07-06,4856,8.1,1994,8.091114e+07,9.973333e+08,1,Winston Groom
86,82693,tt1045658,3.931139,21000000,205738714,Silver Linings Playbook,Bradley Cooper|Jennifer Lawrence|Robert De Nir...,http://silverliningsplaybookmovie.com/,David O. Russell,After spending eight months in a mental instit...,...,Drama|Comedy|Romance,The Weinstein Company,2012-09-08,3317,6.9,2012,1.994463e+07,1.953992e+08,1,Matthew Quick
87,75780,tt0790724,2.944554,60000000,218340595,Jack Reacher,Tom Cruise|Rosamund Pike|Richard Jenkins|David...,http://www.jackreachermovie.com/,Christopher McQuarrie,"In an innocent heartland city, five are shot d...",...,Crime|Drama|Thriller,Paramount Pictures|Mutual Film Company|Skydanc...,2012-12-20,2281,6.2,2012,5.698466e+07,2.073677e+08,1,Lee Child
88,49529,tt0401729,1.588457,260000000,284139100,John Carter,Taylor Kitsch|Lynn Collins|Mark Strong|Willem ...,http://disney.go.com/johncarter/,Andrew Stanton,Civil War vet John Carter is transplanted to M...,...,Action|Adventure|Fantasy|Science Fiction,Walt Disney Pictures,2012-03-07,1479,6.0,2012,2.469335e+08,2.698595e+08,1,Edgar Rice Burroughs


In [23]:
movies_books_titles = moviescheck['original_title'].tolist()
print(movies_books_titles)
print(len(movies_books_titles))

['Insurgent', 'The Revenant', 'Fifty Shades of Grey', 'Maze Runner: The Scorch Trials', 'Goosebumps', 'Room', 'Room', 'Paper Towns', 'Mortdecai', 'The Longest Ride', 'Dark Places', 'Dark Places', 'Z for Zachariah', 'The Diary of a Teenage Girl', 'Tales of Halloween', 'And Then There Were None', 'The Maze Runner', 'Gone Girl', 'The Amazing Spider-Man 2', '300: Rise of an Empire', 'The Giver', 'Paddington', 'The Fault in Our Stars', 'Good People', 'The Boxtrolls', 'White Bird in a Blizzard', "Winter's Tale", 'Alexander and the Terrible, Horrible, No Good, Very Bad Day', 'The Scribbler', 'Every Secret Thing', 'Delirium', 'Delirium', 'Delirium', 'Delirium', 'Audrey Rose', 'The White Buffalo', 'Watchmen', 'Love and Other Impossible Pursuits', 'Me and Orson Welles', 'Walled In', 'Like Dandelion Dust', 'Alice in Wonderland', 'Alice in Wonderland', 'Alice in Wonderland', 'Alice in Wonderland', 'Alice in Wonderland', 'Alice in Wonderland', 'The Chronicles of Narnia: The Voyage of the Dawn Tread

In [24]:
movies_books_authors = moviescheck['writer'].tolist()
print(movies_books_authors)

['Veronica Roth', 'D. Kerry Prior', 'E.L. James', 'James Dashner', 'R. L. Stine', 'Kyle Henry', 'Emma Donoghue, Emma Donoghue', 'John Green', 'Kyril Bonfiglioli', 'Nicholas Sparks', 'Guy Crawford', 'Gillian Flynn', "Robert C. O'Brien", 'Phoebe Gloeckner', 'Axelle Carolyn, Dave Parker', 'Agatha Christie', 'James Dashner', 'Gillian Flynn', 'Alex Kurtzman, Roberto Orci', 'Frank Miller', 'Lois Lowry', 'Michael Bond', 'John Green', 'Marcus Sakey', 'Irena Brignull, Adam Pava', 'Laura Kasischke', 'Mark Helprin', 'Judith Viorst', 'Daniel Schaffer', 'Laura Lippman', 'Eddie Krell, Jim Loew', 'Adam Alleca', 'Jared Stanton, Thor Wixom', 'Francisco Castro, Andy Cheng', 'Frank De Felitta', 'Richard Sale, Richard Sale', 'Alan Moore', 'Ayelet Waldman', 'Robert Kaplow', 'Serge Brussolo, Rodolphe Tissot', 'Karen Kingsbury', 'Lewis Carroll', 'Lewis Carroll', 'Joseph L. Mankiewicz, William Cameron Menzies', 'Lewis Carroll', 'Lewis Carroll, Winston Hibler', 'Lewis Carroll', 'C. S. Lewis', 'Nicholas Sparks'

### Running google books API call

Google Books API allows acess to many of the operations available on Google Books website includes different features to your application. It is intended for developers who want to write applications that can interact with the Google Books API. The Google books API can be accessed by creating an google account, creating application and create your own API key. More information can be found [here](https://developers.google.com/books/docs/overview)

The Google Terms of Service for use of the APIs is available here: https://developer.google.com/books/terms.html. 

Google API rate limit:
- Queries per day : 1000
- Queries per minute per user : 100

#### This is what the raw api search query call looks like:

In [25]:
query = "harry potter"
ap1_key = "AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok"
address = "https://www.googleapis.com/books/v1/volumes?" +"q="+query+"&key=" + ap1_key
resp = requests.get(address)
result = resp.json()
pprint(result)

{'items': [{'accessInfo': {'accessViewStatus': 'NONE',
                           'country': 'US',
                           'embeddable': False,
                           'epub': {'isAvailable': False},
                           'pdf': {'isAvailable': False},
                           'publicDomain': False,
                           'quoteSharingAllowed': False,
                           'textToSpeechPermission': 'ALLOWED',
                           'viewability': 'NO_PAGES',
                           'webReaderLink': 'http://play.google.com/books/reader?id=eq9XvgAACAAJ&hl=&printsec=frontcover&source=gbs_api'},
            'etag': 'ycJ1mgPV+ug',
            'id': 'eq9XvgAACAAJ',
            'kind': 'books#volume',
            'saleInfo': {'country': 'US',
                         'isEbook': False,
                         'saleability': 'NOT_FOR_SALE'},
            'searchInfo': {'textSnippet': 'Relive all the magic of Newt&#39;s '
                                          'wo

#### Intregrating api calls with movies information and create a dataset

In [26]:
ap1_key = "AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok"
base_api_link = 'https://www.googleapis.com/books/v1/volumes?q=intitle:{}+inauthor:{}&key='+ ap1_key
book_info = []
not_found = []
for book, auth in zip(movies_books_titles, movies_books_authors):
     
    url_split = list(urllib.parse.urlsplit(base_api_link.format(book,auth)))
    url_split[3] = urllib.parse.quote(url_split[3], safe='q=&')
    url = urllib.parse.urlunsplit(url_split)
    print(url)
    
    r = requests.get(url)
    results = json.loads(r.text)
#     print(results)
    if "items" in results:
            i = 0
            volume_info = results["items"][i]
            try:
                volume_info["volumeInfo"]["title"]
            except KeyError:
                i += 1
                volume_info = results["items"][i]
            
            authors = volume_info["volumeInfo"]["authors"] if "authors" in volume_info["volumeInfo"] else  np.nan
            category = volume_info["volumeInfo"]["categories"] if "categories" in volume_info["volumeInfo"] else  np.nan
            pages = volume_info["volumeInfo"]["pageCount"] if "pageCount" in volume_info["volumeInfo"] else  np.nan
            publish_date = volume_info["volumeInfo"]["publishedDate"] if "publishedDate" in volume_info["volumeInfo"] else  np.nan
                
            book_info.append({"google_id": volume_info['id'],
            "title": volume_info["volumeInfo"]["title"],
            "author": authors, 
            "publish_date": publish_date,
            "category": category,
            "pages": pages})
    else: 
            not_found.append(book)

https://www.googleapis.com/books/v1/volumes?q=intitle%3AInsurgent%2Binauthor%3AVeronica%20Roth&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AThe%20Revenant%2Binauthor%3AD.%20Kerry%20Prior&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AFifty%20Shades%20of%20Grey%2Binauthor%3AE.L.%20James&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AMaze%20Runner%3A%20The%20Scorch%20Trials%2Binauthor%3AJames%20Dashner&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AGoosebumps%2Binauthor%3AR.%20L.%20Stine&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3ARoom%2Binauthor%3AKyle%20Henry&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3ARoom%2Binauthor%3AEmma%20Donoghue%2C%20Emma%20Donoghue&key=AIzaS

https://www.googleapis.com/books/v1/volumes?q=intitle%3AMy%20Girlfriend%27s%20Boyfriend%2Binauthor%3AKenneth%20Schapiro&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AMy%20Girlfriend%27s%20Boyfriend%2Binauthor%3ADaryn%20Tufts%2C%20Daryn%20Tufts&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AGirl%2C%20Interrupted%2Binauthor%3ASusanna%20Kaysen&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AThe%20General%27s%20Daughter%2Binauthor%3ANelson%20DeMille&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AIn%20Dreams%2Binauthor%3ABari%20Wood&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AThe%20Shipping%20News%2Binauthor%3AAnnie%20Proulx&key=AIzaSyC9J5_C1kJfIGejuMPkIKUmnmh2ir-5dok
https://www.googleapis.com/books/v1/volumes?q=intitle%3AThe%

In [27]:
not_found

['The Revenant',
 'Room',
 'Dark Places',
 'Tales of Halloween',
 'The Amazing Spider-Man 2',
 '300: Rise of an Empire',
 'Delirium',
 'Delirium',
 'Delirium',
 'Delirium',
 'Walled In',
 'Alice in Wonderland',
 'Flipped',
 'Never Let Me Go',
 "My Girlfriend's Boyfriend",
 "My Girlfriend's Boyfriend",
 'In Dreams',
 'Soul Surfer',
 'Wuthering Heights',
 'Little Birds',
 'Age of the Dragons',
 'Anna Karenina']

#### <font color=darkblue>Converting results to DataFrame </font>

In [28]:
df = pd.DataFrame(book_info, columns=['google_id', 'title', 'author', 'publish_date', 'category', 'pages'])
df

Unnamed: 0,google_id,title,author,publish_date,category,pages
0,CqyUNAEACAAJ,Insurgent,[Veronica Roth],2015-01-20,[Juvenile Fiction],592
1,bg-dAQAACAAJ,Grey - Fifty Shades of Grey von Christian selb...,[E. L. James],2015-08-21,,640
2,Y-qLDQAAQBAJ,The Maze Runner,[James Dashner],2015,[Amnesia],816
3,z2U0CwAAQBAJ,Goosebumps: Movie Novel,[R.L. Stine],2016-01-07,[Juvenile Fiction],144
4,TdKHvgEACAAJ,Room,[Emma Donoghue],2010-09-13,[Fiction],336
...,...,...,...,...,...,...
63,ebP-JwAACAAJ,Auto Focus,[Robert Graysmith],2002,[True Crime],310
64,pxWhzAEACAAJ,Forrest Gump,[Winston Groom],2012,[Fiction],228
65,upodvjM1cFAC,The Silver Linings Playbook,[Matthew Quick],2010-04-27,[Fiction],304
66,GLYsEAAAQBAJ,The Sentinel,"[Lee Child, Andrew Child]",2021-04-27,[Fiction],384


In [29]:
df = df.reset_index(drop=True)
df

Unnamed: 0,google_id,title,author,publish_date,category,pages
0,CqyUNAEACAAJ,Insurgent,[Veronica Roth],2015-01-20,[Juvenile Fiction],592
1,bg-dAQAACAAJ,Grey - Fifty Shades of Grey von Christian selb...,[E. L. James],2015-08-21,,640
2,Y-qLDQAAQBAJ,The Maze Runner,[James Dashner],2015,[Amnesia],816
3,z2U0CwAAQBAJ,Goosebumps: Movie Novel,[R.L. Stine],2016-01-07,[Juvenile Fiction],144
4,TdKHvgEACAAJ,Room,[Emma Donoghue],2010-09-13,[Fiction],336
...,...,...,...,...,...,...
63,ebP-JwAACAAJ,Auto Focus,[Robert Graysmith],2002,[True Crime],310
64,pxWhzAEACAAJ,Forrest Gump,[Winston Groom],2012,[Fiction],228
65,upodvjM1cFAC,The Silver Linings Playbook,[Matthew Quick],2010-04-27,[Fiction],304
66,GLYsEAAAQBAJ,The Sentinel,"[Lee Child, Andrew Child]",2021-04-27,[Fiction],384


In [30]:
df = DataFrame (df,columns=['title','author','publish_date','category','pages'])
df

Unnamed: 0,title,author,publish_date,category,pages
0,Insurgent,[Veronica Roth],2015-01-20,[Juvenile Fiction],592
1,Grey - Fifty Shades of Grey von Christian selb...,[E. L. James],2015-08-21,,640
2,The Maze Runner,[James Dashner],2015,[Amnesia],816
3,Goosebumps: Movie Novel,[R.L. Stine],2016-01-07,[Juvenile Fiction],144
4,Room,[Emma Donoghue],2010-09-13,[Fiction],336
...,...,...,...,...,...
63,Auto Focus,[Robert Graysmith],2002,[True Crime],310
64,Forrest Gump,[Winston Groom],2012,[Fiction],228
65,The Silver Linings Playbook,[Matthew Quick],2010-04-27,[Fiction],304
66,The Sentinel,"[Lee Child, Andrew Child]",2021-04-27,[Fiction],384


#### <font color=darkblue> Further dataset manipulation (removing uneeded columns, brackets, etc) </font>

In [31]:
df['author'] = df['author'].str.get(0)
df['category'] = df['category'].str.get(0)
df

Unnamed: 0,title,author,publish_date,category,pages
0,Insurgent,Veronica Roth,2015-01-20,Juvenile Fiction,592
1,Grey - Fifty Shades of Grey von Christian selb...,E. L. James,2015-08-21,,640
2,The Maze Runner,James Dashner,2015,Amnesia,816
3,Goosebumps: Movie Novel,R.L. Stine,2016-01-07,Juvenile Fiction,144
4,Room,Emma Donoghue,2010-09-13,Fiction,336
...,...,...,...,...,...
63,Auto Focus,Robert Graysmith,2002,True Crime,310
64,Forrest Gump,Winston Groom,2012,Fiction,228
65,The Silver Linings Playbook,Matthew Quick,2010-04-27,Fiction,304
66,The Sentinel,Lee Child,2021-04-27,Fiction,384


In [32]:
movies = moviescheck.rename(columns={'original_title': 'title'})

In [33]:
movies = movies[['title','writer','director','genres','vote_average','runtime','release_year']]
movies

Unnamed: 0,title,writer,director,genres,vote_average,runtime,release_year
0,Insurgent,Veronica Roth,Robert Schwentke,Adventure|Science Fiction|Thriller,6.3,119,2015
1,The Revenant,D. Kerry Prior,Alejandro González Iñárritu,Western|Drama|Adventure|Thriller,7.2,156,2015
2,Fifty Shades of Grey,E.L. James,Sam Taylor-Johnson,Drama|Romance,5.3,125,2015
3,Maze Runner: The Scorch Trials,James Dashner,Wes Ball,Action|Science Fiction|Thriller,6.4,132,2015
4,Goosebumps,R. L. Stine,Rob Letterman,Adventure|Horror|Comedy,6.2,103,2015
...,...,...,...,...,...,...,...
85,Forrest Gump,Winston Groom,Robert Zemeckis,Comedy|Drama|Romance,8.1,142,1994
86,Silver Linings Playbook,Matthew Quick,David O. Russell,Drama|Comedy|Romance,6.9,122,2012
87,Jack Reacher,Lee Child,Christopher McQuarrie,Crime|Drama|Thriller,6.2,130,2012
88,John Carter,Edgar Rice Burroughs,Andrew Stanton,Action|Adventure|Fantasy|Science Fiction,6.0,132,2012


### <font color=blue> Merging books and movie dataset </font>

In [34]:
archquery = df.merge(movies.set_index('writer'), on='title').drop_duplicates(subset=['title'])
archquery = archquery.reset_index(drop = True)

In [35]:
archquery

Unnamed: 0,title,author,publish_date,category,pages,director,genres,vote_average,runtime,release_year
0,Insurgent,Veronica Roth,2015-01-20,Juvenile Fiction,592,Robert Schwentke,Adventure|Science Fiction|Thriller,6.3,119,2015
1,The Maze Runner,James Dashner,2015,Amnesia,816,Wes Ball,Action|Mystery|Science Fiction|Thriller,7.0,113,2014
2,Room,Emma Donoghue,2010-09-13,Fiction,336,Lenny Abrahamson,Drama|Thriller,8.0,117,2015
3,Paper Towns,John Green,2018-10-23,Young Adult Fiction,648,Jake Schreier,Drama|Mystery|Romance,6.2,109,2015
4,The Longest Ride,Nicholas Sparks,2015-02-24,Fiction,416,"George Tillman, Jr.",Romance|Drama,7.3,128,2015
5,Dark Places,Gillian Flynn,2015-07,Fiction,448,Gilles Paquet-Brenner,Drama|Mystery|Thriller,5.7,113,2015
6,Z for Zachariah,Robert C. O'Brien,2021-06-01,Young Adult Fiction,240,Craig Zobel,Drama|Science Fiction|Thriller,5.5,97,2015
7,The Diary of a Teenage Girl,Phoebe Gloeckner,2015,Comics & Graphic Novels,320,Marielle Heller,Drama|Romance,6.8,102,2015
8,And Then There Were None,Agatha Christie,2004-05-03,Fiction,264,Craig Viveiros,Mystery|Drama,7.7,168,2015
9,Gone Girl,Gillian Flynn,2013-01-01,Fiction,475,David Fincher,Mystery|Thriller|Drama,7.9,145,2014


In [36]:
archquery.fillna('-',inplace=True)

In [37]:
archquery = archquery.drop(['pages'],axis=1)

### <font color=blue> Outcome </font>

In [38]:
archquery[:15]

Unnamed: 0,title,author,publish_date,category,director,genres,vote_average,runtime,release_year
0,Insurgent,Veronica Roth,2015-01-20,Juvenile Fiction,Robert Schwentke,Adventure|Science Fiction|Thriller,6.3,119,2015
1,The Maze Runner,James Dashner,2015,Amnesia,Wes Ball,Action|Mystery|Science Fiction|Thriller,7.0,113,2014
2,Room,Emma Donoghue,2010-09-13,Fiction,Lenny Abrahamson,Drama|Thriller,8.0,117,2015
3,Paper Towns,John Green,2018-10-23,Young Adult Fiction,Jake Schreier,Drama|Mystery|Romance,6.2,109,2015
4,The Longest Ride,Nicholas Sparks,2015-02-24,Fiction,"George Tillman, Jr.",Romance|Drama,7.3,128,2015
5,Dark Places,Gillian Flynn,2015-07,Fiction,Gilles Paquet-Brenner,Drama|Mystery|Thriller,5.7,113,2015
6,Z for Zachariah,Robert C. O'Brien,2021-06-01,Young Adult Fiction,Craig Zobel,Drama|Science Fiction|Thriller,5.5,97,2015
7,The Diary of a Teenage Girl,Phoebe Gloeckner,2015,Comics & Graphic Novels,Marielle Heller,Drama|Romance,6.8,102,2015
8,And Then There Were None,Agatha Christie,2004-05-03,Fiction,Craig Viveiros,Mystery|Drama,7.7,168,2015
9,Gone Girl,Gillian Flynn,2013-01-01,Fiction,David Fincher,Mystery|Thriller|Drama,7.9,145,2014


### <font color=blue> Searching for a specific title </font>

In [39]:
df10 = archquery.query("title == 'Alice in Wonderland'", inplace = False)
df10

Unnamed: 0,title,author,publish_date,category,director,genres,vote_average,runtime,release_year
22,Alice in Wonderland,Joseph Leo Mankiewicz,1933,Alice in wonderland (Motion picture),Tim Burton,Family|Fantasy|Adventure,6.3,108,2010
