### Movie budgets

Using this DataFrame we are going to try to predict if a film's budget is related to the love of the people for the movie. For this purpose we are going to use two different DataFrames.

In [1]:
import numpy as np
import os
import re
import pandas as pd
import seaborn as sns
from datetime import datetime
import squarify
import matplotlib.pyplot as plt

***

### df_budgets

In [2]:
df_budgets = pd.read_csv("top-500-movies.csv", index_col = 0)

In [3]:
df_budgets.head(2)

Unnamed: 0_level_0,release_date,title,url,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,2019-04-23,Avengers: Endgame,/movie/Avengers-Endgame-(2019)#tab=summary,400000000,858373000,2797800564,357115007.0,PG-13,Action,4662.0,181.0,2019.0
2,2011-05-20,Pirates of the Caribbean: On Stranger Tides,/movie/Pirates-of-the-Caribbean-On-Stranger-Ti...,379000000,241071802,1045713802,90151958.0,PG-13,Adventure,4164.0,136.0,2011.0


First, we drop the "url" column:

In [4]:
df_budgets.drop("url", inplace = True, axis = 1)

In [5]:
df_budgets.dtypes

release_date        object
title               object
production_cost      int64
domestic_gross       int64
worldwide_gross      int64
opening_weekend    float64
mpaa                object
genre               object
theaters           float64
runtime            float64
year               float64
dtype: object

In [6]:
df_budgets.isnull().sum()

release_date        1
title               0
production_cost     0
domestic_gross      0
worldwide_gross     0
opening_weekend    21
mpaa                8
genre               5
theaters           21
runtime            13
year                1
dtype: int64

We find there are many null values. We will fill them manually.

In [7]:
df_budgets["release_date"] = df_budgets["release_date"].replace(np.nan, 2022)

In [8]:
df_budgets[df_budgets["theaters"].isna()]

Unnamed: 0_level_0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
9,2023-07-11,Mission: Impossible Dead Reckoning Part One,290000000,0,0,,,Action,,,2023.0
84,2020-09-04,Mulan,200000000,0,69965374,,PG-13,Adventure,,115.0,2020.0
85,2021-07-02,The Tomorrow War,200000000,0,19220000,,PG-13,Action,,140.0,2021.0
86,2022-07-13,The Gray Man,200000000,0,451178,,PG-13,Thriller/Suspense,,129.0,2022.0
110,2005-12-09,"The Chronicles of Narnia: The Lion, the Witch a…",180000000,291710957,720539572,,,,,,2005.0
141,2022-03-10,Turning Red,175000000,0,10965045,,PG,Adventure,,100.0,2022.0
180,2019-11-01,The Irishman,159000000,0,910234,,R,Drama,,210.0,2019.0
182,2010-12-10,The Chronicles of Narnia: The Voyage of the Daw…,155000000,104386950,418186950,,,,,,2010.0
235,2021-11-04,Red Notice,150000000,0,173638,,PG-13,Action,,115.0,2021.0
236,2019-12-13,6 Underground,150000000,0,0,,R,Action,,128.0,2019.0


***

### df_ratings

In [9]:
df_ratings = pd.read_csv("movie_ratings.csv")

In [10]:
df_ratings.head()

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.6,6.0,5.0,9,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0


We drop all the columns we find no use for:

In [11]:
df_ratings.drop(["actors", "humor", "rhythm", "effort", "tension", "erotism", "total_votes", "notes"], inplace = True, axis = 1)

In [12]:
df_ratings[df_ratings["title"] == "Avengers: Endgame"]

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,avg_vote,critics_vote,public_vote,description
36532,166273,Avengers: Endgame,2019,Super-hero,181,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...


In [13]:
df_ratings.head(2)

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",7.7,8.0,7.0,"With two protruding front teeth, a slightly sl..."
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,6.5,6.0,7.0,"Samantha, not yet eighteen, leaves the comfort..."


In [14]:
df_budgets.head()

Unnamed: 0_level_0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,2019-04-23,Avengers: Endgame,400000000,858373000,2797800564,357115007.0,PG-13,Action,4662.0,181.0,2019.0
2,2011-05-20,Pirates of the Caribbean: On Stranger Tides,379000000,241071802,1045713802,90151958.0,PG-13,Adventure,4164.0,136.0,2011.0
3,2015-04-22,Avengers: Age of Ultron,365000000,459005868,1395316979,191271109.0,PG-13,Action,4276.0,141.0,2015.0
4,2015-12-16,Star Wars Ep. VII: The Force Awakens,306000000,936662225,2064615817,247966675.0,PG-13,Adventure,4134.0,136.0,2015.0
5,2018-04-25,Avengers: Infinity War,300000000,678815482,2048359754,257698183.0,PG-13,Action,4474.0,156.0,2018.0


In [15]:
df_budgets.isnull().sum()

release_date        0
title               0
production_cost     0
domestic_gross      0
worldwide_gross     0
opening_weekend    21
mpaa                8
genre               5
theaters           21
runtime            13
year                1
dtype: int64

In [16]:
df_ratings.isnull().sum()

filmtv_id          0
title              0
year               0
genre             95
duration           0
country           11
directors         33
avg_vote           0
critics_vote    4600
public_vote      474
description     1455
dtype: int64

In [17]:
df_ratings.shape[0]

40303

In [18]:
df_budgets.shape[0]

500

***

### We concat both columns.

In [19]:
df_final = pd.merge(df_budgets, df_ratings, on = "title", how = "outer")

In [20]:
df_final.head()

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre_x,theaters,runtime,...,filmtv_id,year_y,genre_y,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2019-04-23,Avengers: Endgame,400000000.0,858373000.0,2797801000.0,357115007.0,PG-13,Action,4662.0,181.0,...,166273.0,2019.0,Super-hero,181.0,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...
1,2011-05-20,Pirates of the Caribbean: On Stranger Tides,379000000.0,241071802.0,1045714000.0,90151958.0,PG-13,Adventure,4164.0,136.0,...,,,,,,,,,,
2,2015-04-22,Avengers: Age of Ultron,365000000.0,459005868.0,1395317000.0,191271109.0,PG-13,Action,4276.0,141.0,...,63444.0,2015.0,Fantasy,142.0,United States,Joss Whedon,5.8,5.83,6.0,The US government has created a new robot know...
3,2015-12-16,Star Wars Ep. VII: The Force Awakens,306000000.0,936662225.0,2064616000.0,247966675.0,PG-13,Adventure,4134.0,136.0,...,,,,,,,,,,
4,2018-04-25,Avengers: Infinity War,300000000.0,678815482.0,2048360000.0,257698183.0,PG-13,Action,4474.0,156.0,...,149289.0,2018.0,Super-hero,156.0,United States,"Anthony Russo, Joe Russo",6.8,7.11,7.0,The Avengers and their allies must be prepared...


In [21]:
df_final = df_final.dropna()

In [22]:
df_final.dropna()

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre_x,theaters,runtime,...,filmtv_id,year_y,genre_y,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2019-04-23,Avengers: Endgame,400000000.0,858373000.0,2.797801e+09,357115007.0,PG-13,Action,4662.0,181.0,...,166273.0,2019.0,Super-hero,181.0,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...
2,2015-04-22,Avengers: Age of Ultron,365000000.0,459005868.0,1.395317e+09,191271109.0,PG-13,Action,4276.0,141.0,...,63444.0,2015.0,Fantasy,142.0,United States,Joss Whedon,5.8,5.83,6.0,The US government has created a new robot know...
4,2018-04-25,Avengers: Infinity War,300000000.0,678815482.0,2.048360e+09,257698183.0,PG-13,Action,4474.0,156.0,...,149289.0,2018.0,Super-hero,156.0,United States,"Anthony Russo, Joe Russo",6.8,7.11,7.0,The Avengers and their allies must be prepared...
6,2017-11-13,Justice League,300000000.0,229024295.0,6.559452e+08,93842239.0,PG-13,Action,4051.0,121.0,...,121381.0,2017.0,Super-hero,121.0,United States,Zack Snyder,5.1,5.14,5.0,Moved by newfound faith in mankind and inspire...
7,2015-10-06,Spectre,300000000.0,200074175.0,8.795008e+08,70403148.0,PG-13,Action,3929.0,148.0,...,73835.0,2015.0,Action,148.0,"United States, Great Britain",Sam Mendes,6.1,6.38,6.0,A cryptic message from his past sends James Bo...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
588,2013-02-06,A Good Day to Die Hard,92000000.0,67349198.0,3.042492e+08,24834845.0,R,Action,3555.0,98.0,...,50764.0,2013.0,Action,99.0,United States,John Moore,4.2,4.00,4.0,Agent John McClane (Bruce Willis) finds himsel...
589,2004-04-09,The Alamo,92000000.0,22406362.0,2.391136e+07,9124701.0,PG-13,Western,2609.0,137.0,...,9618.0,1960.0,Western,192.0,United States,John Wayne,6.5,6.38,7.0,1836. In the ancient fortified mission of Alam...
590,2004-04-09,The Alamo,92000000.0,22406362.0,2.391136e+07,9124701.0,PG-13,Western,2609.0,137.0,...,26778.0,2004.0,Western,137.0,United States,John Lee Hancock,4.7,4.00,5.0,"Texas, 1836. One hundred and eighty-nine Texan..."
592,2013-12-19,The Secret Life of Walter Mitty,91000000.0,58236838.0,1.878612e+08,12765508.0,PG,Adventure,2922.0,114.0,...,13184.0,1947.0,Comedy,110.0,United States,Norman Z. McLeod,6.9,6.50,7.0,Oppressed and intimidated by an obsessive moth...


In [23]:
df_final.shape

(473, 21)

In [24]:
df_final.head()

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre_x,theaters,runtime,...,filmtv_id,year_y,genre_y,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2019-04-23,Avengers: Endgame,400000000.0,858373000.0,2797801000.0,357115007.0,PG-13,Action,4662.0,181.0,...,166273.0,2019.0,Super-hero,181.0,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...
2,2015-04-22,Avengers: Age of Ultron,365000000.0,459005868.0,1395317000.0,191271109.0,PG-13,Action,4276.0,141.0,...,63444.0,2015.0,Fantasy,142.0,United States,Joss Whedon,5.8,5.83,6.0,The US government has created a new robot know...
4,2018-04-25,Avengers: Infinity War,300000000.0,678815482.0,2048360000.0,257698183.0,PG-13,Action,4474.0,156.0,...,149289.0,2018.0,Super-hero,156.0,United States,"Anthony Russo, Joe Russo",6.8,7.11,7.0,The Avengers and their allies must be prepared...
6,2017-11-13,Justice League,300000000.0,229024295.0,655945200.0,93842239.0,PG-13,Action,4051.0,121.0,...,121381.0,2017.0,Super-hero,121.0,United States,Zack Snyder,5.1,5.14,5.0,Moved by newfound faith in mankind and inspire...
7,2015-10-06,Spectre,300000000.0,200074175.0,879500800.0,70403148.0,PG-13,Action,3929.0,148.0,...,73835.0,2015.0,Action,148.0,"United States, Great Britain",Sam Mendes,6.1,6.38,6.0,A cryptic message from his past sends James Bo...


In [25]:
df_final = df_final.drop(["year_x", "filmtv_id", "genre_y"], axis=1)

In [26]:
df_final.columns

Index(['release_date', 'title', 'production_cost', 'domestic_gross',
       'worldwide_gross', 'opening_weekend', 'mpaa', 'genre_x', 'theaters',
       'runtime', 'year_y', 'duration', 'country', 'directors', 'avg_vote',
       'critics_vote', 'public_vote', 'description'],
      dtype='object')

In [27]:
df_final = df_final.rename(columns={"genre_x": "genre", "year_y": "year"}, errors="raise")

In [28]:
columns = list(df_final.columns)

start1 = columns.index("production_cost")
end1 = columns.index("opening_weekend")
start2 = columns.index("theaters")
end2 = columns.index("year")

for index, col in enumerate(columns):
    if (start1 <= index) & (index <= end1):
        df_final[col] = df_final[col].astype(int)
    if (start2 <= index) & (index <= end2):
        df_final[col] = df_final[col].astype(int)

In [29]:
df_final.dtypes

release_date        object
title               object
production_cost      int64
domestic_gross       int64
worldwide_gross      int64
opening_weekend      int64
mpaa                object
genre               object
theaters             int64
runtime              int64
year                 int64
duration           float64
country             object
directors           object
avg_vote           float64
critics_vote       float64
public_vote        float64
description         object
dtype: object

In [30]:
df_final.head(2)

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2019-04-23,Avengers: Endgame,400000000,858373000,2797800564,357115007,PG-13,Action,4662,181,2019,181.0,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...
2,2015-04-22,Avengers: Age of Ultron,365000000,459005868,1395316979,191271109,PG-13,Action,4276,141,2015,142.0,United States,Joss Whedon,5.8,5.83,6.0,The US government has created a new robot know...


In [31]:
df_final = df_final.reset_index()

In [32]:
df_final = df_final.drop(["index"], axis = 1)

In [33]:
df_final[df_final["title"] == "The Incredibles"]

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year,duration,country,directors,avg_vote,critics_vote,public_vote,description
467,2004-10-22,The Incredibles,92000000,261441092,631441092,70467623,PG,Adventure,3933,115,2004,115.0,United States,Brad Bird,7.8,7.8,8.0,"After being two famous superheroes, Bob Parr a..."


In [34]:
df_final.head()

Unnamed: 0,release_date,title,production_cost,domestic_gross,worldwide_gross,opening_weekend,mpaa,genre,theaters,runtime,year,duration,country,directors,avg_vote,critics_vote,public_vote,description
0,2019-04-23,Avengers: Endgame,400000000,858373000,2797800564,357115007,PG-13,Action,4662,181,2019,181.0,United States,"Anthony Russo, Joe Russo",6.2,6.33,6.0,Half of the living beings in the universe were...
1,2015-04-22,Avengers: Age of Ultron,365000000,459005868,1395316979,191271109,PG-13,Action,4276,141,2015,142.0,United States,Joss Whedon,5.8,5.83,6.0,The US government has created a new robot know...
2,2018-04-25,Avengers: Infinity War,300000000,678815482,2048359754,257698183,PG-13,Action,4474,156,2018,156.0,United States,"Anthony Russo, Joe Russo",6.8,7.11,7.0,The Avengers and their allies must be prepared...
3,2017-11-13,Justice League,300000000,229024295,655945209,93842239,PG-13,Action,4051,121,2017,121.0,United States,Zack Snyder,5.1,5.14,5.0,Moved by newfound faith in mankind and inspire...
4,2015-10-06,Spectre,300000000,200074175,879500760,70403148,PG-13,Action,3929,148,2015,148.0,"United States, Great Britain",Sam Mendes,6.1,6.38,6.0,A cryptic message from his past sends James Bo...


In [35]:
df_final.to_csv("movies.csv")