# Exploring Data - Cleaning

## Project Overview

This project analyzes global box office data to project what movie attributes produce the greatest probability of higher viewership and ROI. Our analysis explores financial data, user ratings, and viewership data to help decide what movies to create.

In [1]:
import pandas as pd
import numpy as np

In [2]:
#importing files into dataframes
movie_actors_df = pd.read_csv('data/IMDB/name.basics.csv')
global_movie_title_df = pd.read_csv('data/IMDB/title.akas.csv')#, sep='\t', index_col=0)
movie_title_basics_df = pd.read_csv('data/IMDB/title.basics.csv')
movie_title_crew_df = pd.read_csv('data/IMDB/title.crew.csv')
movie_title_principals_df = pd.read_csv('data/IMDB/title.principals.csv')
movie_ratings_df = pd.read_csv('data/IMDB/title.ratings.csv')
movie_gross_df = pd.read_csv('data/bom.movie_gross.csv')

# Movie Gross

In [4]:
movie_gross_df.head()

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000,2010
3,Inception,WB,292600000.0,535700000,2010
4,Shrek Forever After,P/DW,238700000.0,513900000,2010


In [6]:
#Inspect metadata
movie_gross_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3387 entries, 0 to 3386
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           3387 non-null   object 
 1   studio          3382 non-null   object 
 2   domestic_gross  3359 non-null   float64
 3   foreign_gross   2037 non-null   object 
 4   year            3387 non-null   int64  
dtypes: float64(1), int64(1), object(3)
memory usage: 132.4+ KB


In [7]:
movie_gross_df['year'].unique()

array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018], dtype=int64)

In [8]:
#Fill null values in the studio column with 'Other'
movie_gross_df['studio'].fillna(value='Other', inplace=True)

In [9]:
#Inspect data where domestic_gross is null
movie_gross_df.loc[movie_gross_df['domestic_gross'].isna() == True]

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
230,It's a Wonderful Afterlife,UTV,,1300000,2010
298,Celine: Through the Eyes of the World,Sony,,119000,2010
302,White Lion,Scre.,,99600,2010
306,Badmaash Company,Yash,,64400,2010
327,Aashayein (Wishes),Relbig.,,3800,2010
537,Force,FoxS,,4800000,2011
713,Empire of Silver,NeoC,,19000,2011
871,Solomon Kane,RTWC,,19600000,2012
928,The Tall Man,Imag.,,5200000,2012
933,Keith Lemon: The Film,Other,,4000000,2012


Because we're trying to find out which movies produced in Hollywood do the best globally, we need to drop movies not produced domestically. 

In [10]:
movie_gross_df.dropna(subset=['domestic_gross'], inplace=True)

In [11]:
movie_gross_df.loc[movie_gross_df['foreign_gross'].isna() == True]

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
222,Flipped,WB,1800000.0,,2010
254,The Polar Express (IMAX re-issue 2010),WB,673000.0,,2010
267,Tiny Furniture,IFC,392000.0,,2010
269,Grease (Sing-a-Long re-issue),Par.,366000.0,,2010
280,Last Train Home,Zeit.,288000.0,,2010
...,...,...,...,...,...
3382,The Quake,Magn.,6200.0,,2018
3383,Edward II (2018 re-release),FM,4800.0,,2018
3384,El Pacto,Sony,2500.0,,2018
3385,The Swan,Synergetic,2400.0,,2018


Because we're trying to find out which movies produced in Hollywood do the best globally, we need to drop any movies that haven't reached the global box office. 

In [12]:
movie_gross_df.dropna(subset=['foreign_gross'], inplace=True)

In [20]:
type(movie_gross_df)

pandas.core.frame.DataFrame

In [39]:
type(movie_gross_df['domestic_gross'])

pandas.core.series.Series

In [None]:
print(movie_gross_df['domestic_gross'] = pd.to_numeric(movie_gross_df['domestic_gross']))

In [28]:
movie_gross_df['domestic_gross'] = pd.to_numeric(movie_gross_df['domestic_gross'])

In [40]:
movie_gross_df['foreign_gross']=movie_gross_df['foreign_gross'].str.replace(',','')

In [41]:
movie_gross_df['foreign_gross'] = pd.to_numeric(movie_gross_df['foreign_gross'])

In the below data frame we'll add both the foreign gross to the domestic gross to get the total gross.

In [42]:
total = movie_gross_df['domestic_gross'] + movie_gross_df['foreign_gross']
movie_gross_df['total_gross'] = total
movie_gross_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross
0,Toy Story 3,BV,415000000.0,652000000.0,2010,1.067000e+09
1,Alice in Wonderland (2010),BV,334200000.0,691300000.0,2010,1.025500e+09
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000.0,2010,9.603000e+08
3,Inception,WB,292600000.0,535700000.0,2010,8.283000e+08
4,Shrek Forever After,P/DW,238700000.0,513900000.0,2010,7.526000e+08
...,...,...,...,...,...,...
3275,I Still See You,LGF,1400.0,1500000.0,2018,1.501400e+06
3286,The Catcher Was a Spy,IFC,725000.0,229000.0,2018,9.540000e+05
3309,Time Freak,Grindstone,10000.0,256000.0,2018,2.660000e+05
3342,Reign of Judges: Title of Liberty - Concept Short,Darin Southa,93200.0,5200.0,2018,9.840000e+04


In the below dataframe we'll sort the total gross by highest to lowest.

In [43]:
highest_gross_df = movie_gross_df.sort_values(["total_gross"], ascending=False)
highest_gross_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross
727,Marvel's The Avengers,BV,623400000.0,895500000.0,2012,1.518900e+09
1875,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09
3080,Black Panther,BV,700100000.0,646900000.0,2018,1.347000e+09
328,Harry Potter and the Deathly Hallows Part 2,WB,381000000.0,960500000.0,2011,1.341500e+09
2758,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1.332600e+09
...,...,...,...,...,...,...
711,I'm Glad My Mother is Alive,Strand,8700.0,13200.0,2011,2.190000e+04
322,The Thorn in the Heart,Osci.,7400.0,10500.0,2010,1.790000e+04
1110,Cirkus Columbia,Strand,3500.0,9500.0,2012,1.300000e+04
715,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04


In order to use this data, we'll need to merge with our master CSV file, 'title.basics.csv'

In [44]:
movie_title_basics_df

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
...,...,...,...,...,...,...
146139,tt9916538,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama
146140,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
146141,tt9916706,Dankyavar Danka,Dankyavar Danka,2013,,Comedy
146142,tt9916730,6 Gunn,6 Gunn,2017,116.0,


In [49]:
ps_df = movie_title_basics_df[(movie_title_basics_df["primary_title"] ==  movie_title_basics_df["original_title"])  ]

In [50]:
ps2_df = movie_title_basics_df[(movie_title_basics_df["primary_title"] !=  movie_title_basics_df["original_title"])  ]

In [51]:
ps3_df = movie_title_basics_df[(movie_title_basics_df["primary_title"] !=  movie_title_basics_df["original_title"])  ]

In [52]:
ps_df['title'] = ps_df['primary_title']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ps_df['title'] = ps_df['primary_title']


In [53]:
ps_df

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres,title
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama",Sunghursh
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama,The Other Side of the Wind
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama",Sabse Bada Sukh
5,tt0111414,A Thin Life,A Thin Life,2018,75.0,Comedy,A Thin Life
6,tt0112502,Bigfoot,Bigfoot,2017,,"Horror,Thriller",Bigfoot
...,...,...,...,...,...,...,...
146139,tt9916538,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama,Kuambil Lagi Hatiku
146140,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary,Rodolpho Teóphilo - O Legado de um Pioneiro
146141,tt9916706,Dankyavar Danka,Dankyavar Danka,2013,,Comedy,Dankyavar Danka
146142,tt9916730,6 Gunn,6 Gunn,2017,116.0,,6 Gunn


In [57]:
ps2_df['title'] = ps2_df['primary_title']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ps2_df['title'] = ps2_df['primary_title']


In [58]:
ps3_df['title'] = ps3_df['original_title']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ps3_df['title'] = ps3_df['original_title']


In [60]:
pdList = [ps_df, ps2_df, ps3_df]  # List of your dataframes
new_ps_df = pd.concat(pdList)

In [61]:
new_ps_df

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres,title
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama",Sunghursh
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama,The Other Side of the Wind
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama",Sabse Bada Sukh
5,tt0111414,A Thin Life,A Thin Life,2018,75.0,Comedy,A Thin Life
6,tt0112502,Bigfoot,Bigfoot,2017,,"Horror,Thriller",Bigfoot
...,...,...,...,...,...,...,...
146026,tt9899938,Journey of the Sky Goddess,Kibaiyanse! Watashi,2019,116.0,"Comedy,Drama",Kibaiyanse! Watashi
146028,tt9900060,Lupin the Third: Fujiko Mine's Lie,Lupin the IIIrd: Mine Fujiko no Uso,2019,,"Adventure,Crime,Drama",Lupin the IIIrd: Mine Fujiko no Uso
146037,tt9900688,Big Three Dragons,Da San Yuan,2019,111.0,Comedy,Da San Yuan
146121,tt9914254,A Cherry Tale,Kirsebæreventyret,2019,85.0,Documentary,Kirsebæreventyret


In [62]:
highest_gross_ps_df = highest_gross_df.merge(new_ps_df, on = 'title')
highest_gross_ps_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi"
1,Black Panther,BV,700100000.0,646900000.0,2018,1.347000e+09,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi"
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1.332600e+09,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy"
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1.309500e+09,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi"
4,Frozen,BV,400700000.0,875700000.0,2013,1.276400e+09,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport"
...,...,...,...,...,...,...,...,...,...,...,...,...
2040,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8095720,Aurora,Aurora,2017,68.0,"Biography,Documentary,Drama"
2041,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8396182,Aurora,Aurora,2018,98.0,Drama
2042,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8553606,Aurora,Aurora,2019,106.0,"Comedy,Drama,Romance"
2043,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,110.0,"Horror,Thriller"


# Movie Actors

In [65]:
movie_actors_df

Unnamed: 0,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer","tt0837562,tt2398241,tt0844471,tt0118553"
1,nm0061865,Joseph Bauer,,,"composer,music_department,sound_department","tt0896534,tt6791238,tt0287072,tt1682940"
2,nm0062070,Bruce Baum,,,"miscellaneous,actor,writer","tt1470654,tt0363631,tt0104030,tt0102898"
3,nm0062195,Axel Baumann,,,"camera_department,cinematographer,art_department","tt0114371,tt2004304,tt1618448,tt1224387"
4,nm0062798,Pete Baxter,,,"production_designer,art_department,set_decorator","tt0452644,tt0452692,tt3458030,tt2178256"
...,...,...,...,...,...,...
606643,nm9990381,Susan Grobes,,,actress,
606644,nm9990690,Joo Yeon So,,,actress,"tt9090932,tt8737130"
606645,nm9991320,Madeline Smith,,,actress,"tt8734436,tt9615610"
606646,nm9991786,Michelle Modigliani,,,producer,


Now we'll separaate

In [66]:
movie_actors_df.assign(known_for_titles=movie_actors_df.known_for_titles.str.split(",")).explode("known_for_titles")

Unnamed: 0,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer",tt0837562
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer",tt2398241
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer",tt0844471
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer",tt0118553
1,nm0061865,Joseph Bauer,,,"composer,music_department,sound_department",tt0896534
...,...,...,...,...,...,...
606644,nm9990690,Joo Yeon So,,,actress,tt8737130
606645,nm9991320,Madeline Smith,,,actress,tt8734436
606645,nm9991320,Madeline Smith,,,actress,tt9615610
606646,nm9991786,Michelle Modigliani,,,producer,


In [80]:

highest_gross_ps_actor_df = pd.merge(highest_gross_ps_df, movie_actors_df, how ='left', left_on='tconst', right_on='known_for_titles')
highest_gross_ps_actor_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",,,,,,
1,Black Panther,BV,700100000.0,646900000.0,2018,1.347000e+09,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi",,,,,,
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1.332600e+09,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy",,,,,,
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1.309500e+09,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi",,,,,,
4,Frozen,BV,400700000.0,875700000.0,2013,1.276400e+09,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport",,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2388,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8396182,Aurora,Aurora,2018,98.0,Drama,nm9837414,Mirlan Satkymbaev,,,cinematographer,tt8396182
2389,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8553606,Aurora,Aurora,2019,106.0,"Comedy,Drama,Romance",,,,,,
2390,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,110.0,"Horror,Thriller",nm10073626,Federico Fernandez,,,producer,tt8821182
2391,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,110.0,"Horror,Thriller",nm10073629,Gary Barrozo,,,producer,tt8821182


In [77]:
actors_df = movie_title_principals_df[(movie_title_principals_df["category"] == "actor")]
actors_df

Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0111414,1,nm0246005,actor,,"[""The Man""]"
5,tt0323808,2,nm2694680,actor,,"[""Steve Thomson""]"
6,tt0323808,3,nm0574615,actor,,"[""Sir Lachlan Morrison""]"
14,tt0417610,1,nm0532721,actor,,"[""Lucio""]"
16,tt0417610,3,nm0069209,actor,,"[""Dr. Samaniego""]"
...,...,...,...,...,...,...
1028175,tt9681728,9,nm10397910,actor,,"[""Corpsman""]"
1028176,tt9689618,1,nm10439726,actor,,
1028177,tt9689618,2,nm10439727,actor,,
1028178,tt9689618,3,nm10439724,actor,,


In [78]:
actress_df = movie_title_principals_df[(movie_title_principals_df["category"] == "actress")]
actress_df

Unnamed: 0,tconst,ordering,nconst,category,job,characters
4,tt0323808,1,nm3579312,actress,,"[""Beth Boothby""]"
7,tt0323808,4,nm0502652,actress,,"[""Lady Delia Morrison""]"
15,tt0417610,2,nm0330974,actress,,"[""Diana""]"
17,tt0417610,4,nm0679167,actress,,"[""Adriana María""]"
24,tt0469152,1,nm0036109,actress,,"[""Eleanor Jordan""]"
...,...,...,...,...,...,...
1028142,tt9672244,1,nm0260884,actress,,"[""Marie""]"
1028145,tt9672244,4,nm0508708,actress,,
1028161,tt9679036,1,nm9742452,actress,,"[""Yuna Takahashi""]"
1028163,tt9679036,3,nm7751067,actress,,"[""Penelope (Penny) Fitzherbert""]"


In [81]:
highest_gross_ps_actor_df = pd.merge(highest_gross_ps_df, actors_df, how ='left', left_on='tconst', right_on='tconst')
highest_gross_ps_actor_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres,ordering,nconst,category,job,characters
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]"
1,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",2.0,nm0262635,actor,,"[""Steve Rogers"",""Captain America""]"
2,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",3.0,nm0749263,actor,,"[""Bruce Banner"",""Hulk""]"
3,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",4.0,nm1165110,actor,,"[""Thor""]"
4,Black Panther,BV,700100000.0,646900000.0,2018,1.347000e+09,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi",1.0,nm1569276,actor,,"[""T'Challa"",""Black Panther""]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4954,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,110.0,"Horror,Thriller",2.0,nm0667074,actor,,"[""Eddie""]"
4955,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,110.0,"Horror,Thriller",4.0,nm4912288,actor,,"[""Ricky""]"
4956,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt2208216,Vanishing Waves,Aurora,2012,124.0,"Romance,Sci-Fi,Thriller",1.0,nm1815886,actor,,"[""Lukas""]"
4957,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt2208216,Vanishing Waves,Aurora,2012,124.0,"Romance,Sci-Fi,Thriller",3.0,nm5104879,actor,,"[""Jonas""]"


In [82]:
top_five_df = highest_gross_ps_df.head(50)

In [83]:
top_five_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi"
1,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi"
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy"
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi"
4,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport"


In [85]:
rainmakers_df = pd.merge( top_five_df,  highest_gross_ps_actor_df, left_on='tconst', right_on='tconst')
rainmakers_df

Unnamed: 0,title_x,studio_x,domestic_gross_x,foreign_gross_x,year_x,total_gross_x,tconst,primary_title_x,original_title_x,start_year_x,...,primary_title_y,original_title_y,start_year_y,runtime_minutes_y,genres_y,ordering,nconst,category,job,characters
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]"
1,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",2.0,nm0262635,actor,,"[""Steve Rogers"",""Captain America""]"
2,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",3.0,nm0749263,actor,,"[""Bruce Banner"",""Hulk""]"
3,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",4.0,nm1165110,actor,,"[""Thor""]"
4,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,...,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi",1.0,nm1569276,actor,,"[""T'Challa"",""Black Panther""]"
5,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,...,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi",2.0,nm0430107,actor,,"[""Erik Killmonger""]"
6,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,...,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy",2.0,nm3915784,actor,,"[""Finn""]"
7,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,...,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy",3.0,nm0000434,actor,,"[""Luke Skywalker"",""Dobbu Scay""]"
8,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,...,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi",1.0,nm0695435,actor,,"[""Owen Grady""]"
9,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,...,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi",3.0,nm1245863,actor,,"[""Eli Mills""]"


In [86]:
rainmakers_df.groupby(['nconst']).agg([  'count'])

Unnamed: 0_level_0,title_x,studio_x,domestic_gross_x,foreign_gross_x,year_x,total_gross_x,tconst,primary_title_x,original_title_x,start_year_x,...,total_gross_y,primary_title_y,original_title_y,start_year_y,runtime_minutes_y,genres_y,ordering,category,job,characters
Unnamed: 0_level_1,count,count,count,count,count,count,count,count,count,count,...,count,count,count,count,count,count,count,count,count,count
nconst,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
nm0000375,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0000434,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0039162,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0262635,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0430107,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0695435,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0749263,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0954225,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm1165110,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm1245863,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1


In [87]:
top_fifty_df = highest_gross_ps_df.head(50)

In [88]:
top_fifty_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi"
1,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi"
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy"
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi"
4,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport"
5,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt2294629,Frozen,Frozen,2013,102.0,"Adventure,Animation,Comedy"
6,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1611845,Frozen,Wai nei chung ching,2010,92.0,"Fantasy,Romance"
7,Incredibles 2,BV,608600000.0,634200000.0,2018,1242800000.0,tt3606756,Incredibles 2,Incredibles 2,2018,118.0,"Action,Adventure,Animation"
8,Iron Man 3,BV,409000000.0,805800000.0,2013,1214800000.0,tt1300854,Iron Man 3,Iron Man Three,2013,130.0,"Action,Adventure,Sci-Fi"
9,Minions,Uni.,336000000.0,823400000.0,2015,1159400000.0,tt2293640,Minions,Minions,2015,91.0,"Adventure,Animation,Comedy"


In [89]:
rainmakers_df = pd.merge( top_fifty_df,  highest_gross_ps_actor_df, left_on='tconst', right_on='tconst')
rainmakers_df

Unnamed: 0,title_x,studio_x,domestic_gross_x,foreign_gross_x,year_x,total_gross_x,tconst,primary_title_x,original_title_x,start_year_x,...,primary_title_y,original_title_y,start_year_y,runtime_minutes_y,genres_y,ordering,nconst,category,job,characters
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]"
1,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",2.0,nm0262635,actor,,"[""Steve Rogers"",""Captain America""]"
2,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",3.0,nm0749263,actor,,"[""Bruce Banner"",""Hulk""]"
3,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi",4.0,nm1165110,actor,,"[""Thor""]"
4,Black Panther,BV,700100000.0,646900000.0,2018,1.347000e+09,tt1825683,Black Panther,Black Panther,2018,...,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi",1.0,nm1569276,actor,,"[""T'Challa"",""Black Panther""]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt0451279,Wonder Woman,Wonder Woman,2017,...,Wonder Woman,Wonder Woman,2017,141.0,"Action,Adventure,Fantasy",2.0,nm1517976,actor,,"[""Steve Trevor""]"
125,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt4028068,Wonder Woman,Wonder Woman,2014,...,Wonder Woman,Wonder Woman,2014,60.0,Sci-Fi,2.0,nm3715454,actor,,"[""Big Boss 1""]"
126,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt4028068,Wonder Woman,Wonder Woman,2014,...,Wonder Woman,Wonder Woman,2014,60.0,Sci-Fi,4.0,nm6556216,actor,,"[""Goon 2""]"
127,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt4283448,Wonder Woman,Wonder Woman,2016,...,Wonder Woman,Wonder Woman,2016,75.0,"Documentary,Drama,Sport",1.0,nm4276366,actor,,"[""Irate Civil Servant""]"


In [97]:
rainmakers_group = rainmakers_df.groupby(['nconst']).count() #.sort_values(["nconst"], ascending=False)

In [98]:
rainmakers_group.sort_values(["nconst"], ascending=False)

Unnamed: 0_level_0,title_x,studio_x,domestic_gross_x,foreign_gross_x,year_x,total_gross_x,tconst,primary_title_x,original_title_x,start_year_x,...,total_gross_y,primary_title_y,original_title_y,start_year_y,runtime_minutes_y,genres_y,ordering,category,job,characters
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
nm9382480,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm9133740,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm7604997,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,0,0,1,1,0,1
nm6819854,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm6745863,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,0,0,1,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nm0000198,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0000158,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0000146,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1
nm0000138,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,0,1


In [104]:
 grp = rainmakers_df.groupby(['nconst']).count()

In [136]:
rainmakers_df.to_csv (r'C:\Users\Jonathan\Documents\Flatiron\dsc-data-science-env-config\project_1\rainmakers.csv', index = False, header=True)

In [106]:
top_fifty_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi"
1,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi"
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy"
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi"
4,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport"
5,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt2294629,Frozen,Frozen,2013,102.0,"Adventure,Animation,Comedy"
6,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1611845,Frozen,Wai nei chung ching,2010,92.0,"Fantasy,Romance"
7,Incredibles 2,BV,608600000.0,634200000.0,2018,1242800000.0,tt3606756,Incredibles 2,Incredibles 2,2018,118.0,"Action,Adventure,Animation"
8,Iron Man 3,BV,409000000.0,805800000.0,2013,1214800000.0,tt1300854,Iron Man 3,Iron Man Three,2013,130.0,"Action,Adventure,Sci-Fi"
9,Minions,Uni.,336000000.0,823400000.0,2015,1159400000.0,tt2293640,Minions,Minions,2015,91.0,"Adventure,Animation,Comedy"


In [109]:
highest_gross_ps_actor_df.groupby('nconst').count()

Unnamed: 0_level_0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres,ordering,category,job,characters
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
nm0000092,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,2
nm0000093,9,9,9,9,9,9,9,9,9,9,9,9,9,9,0,9
nm0000095,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,2
nm0000100,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,2
nm0000101,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nm9923969,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1
nm9958502,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0
nm9958934,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0
nm9962502,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1


In [125]:
highest_gross_name_df = pd.merge( highest_gross_ps_actor_df,  movie_actors_df, left_on='nconst', right_on='nconst')
highest_gross_name_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,...,ordering,nconst,category,job,characters,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
1,Iron Man 3,BV,409000000.0,805800000.0,2013,1.214800e+09,tt1300854,Iron Man 3,Iron Man Three,2013,...,1.0,nm0000375,actor,,"[""Tony Stark""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
2,Captain America: Civil War,BV,408100000.0,745200000.0,2016,1.153300e+09,tt3498820,Captain America: Civil War,Captain America: Civil War,2016,...,2.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
3,Spider-Man: Homecoming,Sony,334200000.0,546000000.0,2017,8.802000e+08,tt2250912,Spider-Man: Homecoming,Spider-Man: Homecoming,2017,...,3.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
4,Avengers: Infinity War,BV,678800000.0,1369.5,2018,6.788014e+08,tt4154756,Avengers: Infinity War,Avengers: Infinity War,2018,...,1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4758,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8553606,Aurora,Aurora,2019,...,4.0,nm0084969,actor,,"[""Reijo""]",Hannu-Pekka Björkman,1969.0,,"actor,director,editor","tt0442973,tt0885415,tt4173170,tt1188992"
4759,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,...,2.0,nm0667074,actor,,"[""Eddie""]",Allan Paule,,,actor,"tt0099717,tt2590162,tt4838342,tt1160023"
4760,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt8821182,Aurora,Aurora,2018,...,4.0,nm4912288,actor,,"[""Ricky""]",Marco Gumabao,,,actor,"tt9759754,tt9653184,tt3837166,tt3772640"
4761,Aurora,CGld,5700.0,5100.0,2011,1.080000e+04,tt2208216,Vanishing Waves,Aurora,2012,...,1.0,nm1815886,actor,,"[""Lukas""]",Marius Jampolskis,1978.0,,actor,"tt2208216,tt4841390,tt4797448,tt1245721"


In [137]:
highest_gross_name_1_df = pd.merge( rainmakers_df,  movie_actors_df, left_on='nconst', right_on='nconst')
highest_gross_name_1_df

Unnamed: 0,title_x,studio_x,domestic_gross_x,foreign_gross_x,year_x,total_gross_x,tconst,primary_title_x,original_title_x,start_year_x,...,ordering,nconst,category,job,characters,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,1.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
1,Iron Man 3,BV,409000000.0,805800000.0,2013,1.214800e+09,tt1300854,Iron Man 3,Iron Man Three,2013,...,1.0,nm0000375,actor,,"[""Tony Stark""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
2,Captain America: Civil War,BV,408100000.0,745200000.0,2016,1.153300e+09,tt3498820,Captain America: Civil War,Captain America: Civil War,2016,...,2.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
3,Spider-Man: Homecoming,Sony,334200000.0,546000000.0,2017,8.802000e+08,tt2250912,Spider-Man: Homecoming,Spider-Man: Homecoming,2017,...,3.0,nm0000375,actor,,"[""Tony Stark"",""Iron Man""]",Robert Downey Jr.,1965.0,,"actor,producer,soundtrack","tt0848228,tt1300854,tt0988045,tt0371746"
4,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1.405400e+09,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,...,2.0,nm0262635,actor,,"[""Steve Rogers"",""Captain America""]",Chris Evans,1981.0,,"actor,producer,director","tt1843866,tt0848228,tt0458339,tt3498820"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Inception,WB,292600000.0,535700000.0,2010,8.283000e+08,tt1375666,Inception,Inception,2010,...,4.0,nm0913822,actor,,"[""Saito""]",Ken Watanabe,1959.0,,"actor,producer,director","tt0831387,tt1375666,tt2109248,tt0325710"
121,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt0451279,Wonder Woman,Wonder Woman,2017,...,2.0,nm1517976,actor,,"[""Steve Trevor""]",Chris Pine,1980.0,,"actor,producer,soundtrack","tt1408101,tt0451279,tt2660888,tt0796366"
122,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt4028068,Wonder Woman,Wonder Woman,2014,...,2.0,nm3715454,actor,,"[""Big Boss 1""]",Donald H. Steward,,,"actor,composer,producer","tt4322728,tt3056562,tt3534598,tt2195570"
123,Wonder Woman,WB,412600000.0,409300000.0,2017,8.219000e+08,tt4028068,Wonder Woman,Wonder Woman,2014,...,4.0,nm6556216,actor,,"[""Goon 2""]",Woody Wilson Hall,,,"actor,sound_department","tt5760674,tt5826552,tt4209044,tt4478168"


In [138]:
highest_gross_name_1_df.to_csv (r'C:\Users\Jonathan\Documents\Flatiron\dsc-data-science-env-config\project_1\highest_gross_name_1_df.csv', index = False, header=True)

In [118]:
# highest_actor_df = highest_gross_name_df.groupby('primary_name').count()

In [121]:
# highest_actor_df.sort_values(["primary_name"], ascending=False)

In [123]:
top_fifty_df

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year,total_gross,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,Avengers: Age of Ultron,BV,459000000.0,946400000.0,2015,1405400000.0,tt2395427,Avengers: Age of Ultron,Avengers: Age of Ultron,2015,141.0,"Action,Adventure,Sci-Fi"
1,Black Panther,BV,700100000.0,646900000.0,2018,1347000000.0,tt1825683,Black Panther,Black Panther,2018,134.0,"Action,Adventure,Sci-Fi"
2,Star Wars: The Last Jedi,BV,620200000.0,712400000.0,2017,1332600000.0,tt2527336,Star Wars: The Last Jedi,Star Wars: Episode VIII - The Last Jedi,2017,152.0,"Action,Adventure,Fantasy"
3,Jurassic World: Fallen Kingdom,Uni.,417700000.0,891800000.0,2018,1309500000.0,tt4881806,Jurassic World: Fallen Kingdom,Jurassic World: Fallen Kingdom,2018,128.0,"Action,Adventure,Sci-Fi"
4,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1323045,Frozen,Frozen,2010,93.0,"Adventure,Drama,Sport"
5,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt2294629,Frozen,Frozen,2013,102.0,"Adventure,Animation,Comedy"
6,Frozen,BV,400700000.0,875700000.0,2013,1276400000.0,tt1611845,Frozen,Wai nei chung ching,2010,92.0,"Fantasy,Romance"
7,Incredibles 2,BV,608600000.0,634200000.0,2018,1242800000.0,tt3606756,Incredibles 2,Incredibles 2,2018,118.0,"Action,Adventure,Animation"
8,Iron Man 3,BV,409000000.0,805800000.0,2013,1214800000.0,tt1300854,Iron Man 3,Iron Man Three,2013,130.0,"Action,Adventure,Sci-Fi"
9,Minions,Uni.,336000000.0,823400000.0,2015,1159400000.0,tt2293640,Minions,Minions,2015,91.0,"Adventure,Animation,Comedy"
