# Project 1: Explanatory Data Analysis & Data Presentation (Movies Dataset)

# Project Brief for Self-Coders

Here you´ll have the opportunity to code major parts of Project 1 on your own. If you need any help or inspiration, have a look at the Videos or the Jupyter Notebook with the full code. <br> <br>
Keep in mind that it´s all about __getting the right results/conclusions__. It´s not about finding the identical code. Things can be coded in many different ways. Even if you come to the same conclusions, it´s very unlikely that we have the very same code. 

## Data Import and first Inspection

1. __Import__ the movies dataset from the CSV file "movies_complete.csv". __Inspect__ the data.

__Some additional information on Features/Columns__:

* **id:** The ID of the movie (clear/unique identifier).
* **title:** The Official Title of the movie.
* **tagline:** The tagline of the movie.
* **release_date:** Theatrical Release Date of the movie.
* **genres:** Genres associated with the movie.
* **belongs_to_collection:** Gives information on the movie series/franchise the particular film belongs to.
* **original_language:** The language in which the movie was originally shot in.
* **budget_musd:** The budget of the movie in million dollars.
* **revenue_musd:** The total revenue of the movie in million dollars.
* **production_companies:** Production companies involved with the making of the movie.
* **production_countries:** Countries where the movie was shot/produced in.
* **vote_count:** The number of votes by users, as counted by TMDB.
* **vote_average:** The average rating of the movie.
* **popularity:** The Popularity Score assigned by TMDB.
* **runtime:** The runtime of the movie in minutes.
* **overview:** A brief blurb of the movie.
* **spoken_languages:** Spoken languages in the film.
* **poster_path:** The URL of the poster image.
* **cast:** (Main) Actors appearing in the movie.
* **cast_size:** number of Actors appearing in the movie.
* **director:** Director of the movie.
* **crew_size:** Size of the film crew (incl. director, excl. actors).

In [98]:
import pandas as pd
import numpy as np

In [5]:
movies = pd.read_csv('movies_complete.csv', index_col = 'id')

In [11]:
movies.head(n =2)

Unnamed: 0_level_0,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,...,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.554033,Pixar Animation Studios,United States of America,...,7.7,21.946943,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter
8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.797249,TriStar Pictures|Teitler Film|Interscope Commu...,United States of America,...,6.9,17.015539,104.0,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston


In [10]:
movies.shape

(44691, 21)

## The best and the worst movies...

2. __Filter__ the Dataset and __find the best/worst n Movies__ with the

- Highest Revenue
- Highest Budget
- Highest Profit (=Revenue - Budget)
- Lowest Profit (=Revenue - Budget)
- Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10) 
- Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Highest number of Votes
- Highest Rating (only movies with 10 or more Ratings)
- Lowest Rating (only movies with 10 or more Ratings)
- Highest Popularity

__Define__ an appropriate __user-defined function__ to reuse code.

In [24]:
def get_top5(df, column_name):
    return df.nlargest(n=5, columns = column_name)[['title', column_name]]

def get_bot5(df, column_name):
    return df.nsmallest(n=5, columns = column_name)[['title', column_name]]   

In [25]:
get_bot5(movies, 'revenue_musd')

Unnamed: 0_level_0,title,revenue_musd
id,Unnamed: 1_level_1,Unnamed: 2_level_1
51352,Anne Frank Remembered,1e-06
48787,Mute Witness,1e-06
45019,Washington Square,1e-06
274253,Belizaire the Cajun,1e-06
25471,The King of Masks,1e-06


__Movies Top 5 - Highest Revenue__

In [26]:
movies.nlargest(n = 5, columns = 'revenue_musd')[['title','revenue_musd']]

Unnamed: 0_level_0,title,revenue_musd
id,Unnamed: 1_level_1,Unnamed: 2_level_1
19995,Avatar,2787.965087
140607,Star Wars: The Force Awakens,2068.223624
597,Titanic,1845.034188
24428,The Avengers,1519.55791
135397,Jurassic World,1513.52881


__Movies Top 5 - Highest Budget__

In [27]:
get_top5(movies, 'budget_musd')

Unnamed: 0_level_0,title,budget_musd
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1865,Pirates of the Caribbean: On Stranger Tides,380.0
285,Pirates of the Caribbean: At World's End,300.0
99861,Avengers: Age of Ultron,280.0
1452,Superman Returns,270.0
38757,Tangled,260.0


__Movies Top 5 - Highest Profit__

In [30]:
movies['profit'] = movies.revenue_musd - movies.budget_musd
get_top5(movies,'profit')

Unnamed: 0_level_0,title,profit
id,Unnamed: 1_level_1,Unnamed: 2_level_1
19995,Avatar,2550.965087
140607,Star Wars: The Force Awakens,1823.223624
597,Titanic,1645.034188
135397,Jurassic World,1363.52881
168259,Furious 7,1316.24936


__Movies Top 5 - Lowest Profit__

In [31]:
get_bot5(movies,'profit')

Unnamed: 0_level_0,title,profit
id,Unnamed: 1_level_1,Unnamed: 2_level_1
57201,The Lone Ranger,-165.71009
10733,The Alamo,-119.180039
50321,Mars Needs Moms,-111.007242
339964,Valerian and the City of a Thousand Planets,-107.447384
1911,The 13th Warrior,-98.301101


__Movies Top 5 - Highest ROI__

In [33]:
movies['roi'] = movies.revenue_musd*100/movies.budget_musd
get_top5(movies,'roi')

Unnamed: 0_level_0,title,roi
id,Unnamed: 1_level_1,Unnamed: 2_level_1
13703,Less Than Zero,1239638000.0
3082,Modern Times,850000000.0
14968,Welcome to Dongmakgol,419747700.0
114903,Aquí Entre Nos,275558400.0
8856,"The Karate Kid, Part II",101861900.0


__Movies Top 5 - Lowest ROI__

In [34]:
get_bot5(movies,'roi')

Unnamed: 0_level_0,title,roi
id,Unnamed: 1_level_1,Unnamed: 2_level_1
14844,Chasing Liberty,5.2e-05
18475,The Cookout,7.5e-05
48781,Never Talk to Strangers,9.4e-05
38140,To Rob a Thief,0.00015
33927,Deadfall,0.00018


__Movies Top 5 - Most Votes__

In [36]:
get_top5(movies, 'vote_count')

Unnamed: 0_level_0,title,vote_count
id,Unnamed: 1_level_1,Unnamed: 2_level_1
27205,Inception,14075.0
155,The Dark Knight,12269.0
19995,Avatar,12114.0
24428,The Avengers,12000.0
293660,Deadpool,11444.0


__Movies Top 5 - Highest Rating__

In [37]:
get_top5(movies, 'vote_average')

Unnamed: 0_level_0,title,vote_average
id,Unnamed: 1_level_1,Unnamed: 2_level_1
58372,Reckless,10.0
278939,Girl in the Cadillac,10.0
73183,"The Haunted World of Edward D. Wood, Jr.",10.0
255546,Carmen Miranda: Bananas Is My Business,10.0
64562,Other Voices Other Rooms,10.0


__Movies Top 5 - Lowest Rating__

In [38]:
get_bot5(movies, 'vote_average')

Unnamed: 0_level_0,title,vote_average
id,Unnamed: 1_level_1,Unnamed: 2_level_1
303693,Inside,0.0
172545,Alive and Kicking,0.0
111744,Joe and Max,0.0
40873,Pete Seeger: The Power of Song,0.0
23253,Mr. Robinson Crusoe,0.0


__Movies Top 5 - Most Popular__

In [39]:
get_top5(movies, 'popularity')

Unnamed: 0_level_0,title,popularity
id,Unnamed: 1_level_1,Unnamed: 2_level_1
211672,Minions,547.488298
297762,Wonder Woman,294.337037
321612,Beauty and the Beast,287.253654
339403,Baby Driver,228.032744
177572,Big Hero 6,213.849907


## Find your next Movie

3. __Filter__ the Dataset for movies that meet the following conditions:

In [40]:
movies.head(n = 1)

Unnamed: 0_level_0,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,...,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit,roi
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.554033,Pixar Animation Studios,United States of America,...,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter,343.554033,1245.18011


__Search 1: Science Fiction Action Movie with Bruce Willis (sorted from high to low Rating)__

In [63]:
cond1 = movies.cast.str.contains('Bruce Willis', na = False)
cond2 = movies.genres.str.contains('Science Fiction', na = False)
cond3 = movies.genres.str.contains('Action', na = False)
movies[cond1 & cond2 & cond3].sort_values(by = 'vote_average', ascending = False)[['title','vote_average']]

Unnamed: 0_level_0,title,vote_average
id,Unnamed: 1_level_1,Unnamed: 2_level_1
18,The Fifth Element,7.3
59967,Looper,6.6
95,Armageddon,6.5
19959,Surrogates,5.9
72559,G.I. Joe: Retaliation,5.4
307663,Vice,4.1


__Search 2: Movies with Uma Thurman and directed by Quentin Tarantino (sorted from short to long runtime)__

In [65]:
cond1 = movies.cast.str.contains('Uma Thurman', na = False)
cond2 = movies.director.str.contains('Quentin Tarantino', na = False)
movies[cond1 & cond2].sort_values(by ='runtime', ascending = True)[['title','runtime']]

Unnamed: 0_level_0,title,runtime
id,Unnamed: 1_level_1,Unnamed: 2_level_1
24,Kill Bill: Vol. 1,111.0
393,Kill Bill: Vol. 2,136.0
680,Pulp Fiction,154.0


__Search 3: Most Successful Pixar Studio Movies between 2010 and 2015 (sorted from high to low Revenue)__

In [76]:
cond1 = movies.production_companies.str.contains('Pixar Animation Studios', na =False)
cond2 = movies.release_date >= '2010-01-01' 
cond3 = movies.release_date <= '2015-12-31'
movies[cond1 & (cond2 & cond3)].sort_values(by = 'revenue_musd', ascending = False)[['title','revenue_musd','release_date']]

Unnamed: 0_level_0,title,revenue_musd,release_date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10193,Toy Story 3,1066.969703,2010-06-16
150540,Inside Out,857.611174,2015-06-09
62211,Monsters University,743.559607,2013-06-20
49013,Cars 2,559.852396,2011-06-11
62177,Brave,538.983207,2012-06-21
105864,The Good Dinosaur,331.926147,2015-11-14
40619,Day & Night,,2010-06-17
200481,The Blue Umbrella,,2013-02-12
213121,Toy Story of Terror!,,2013-10-15
83564,La luna,,2011-01-01


__Search 4: Action or Thriller Movie with original language English and minimum Rating of 7.5 (most recent movies first)__

In [83]:
cond1 = movies.genres.str.contains('Action', na =False) | movies.genres.str.contains('Thriller', na = False)
cond2 = movies.original_language == 'en'
cond3 = movies.vote_average >= 7.5
movies[cond1 & cond2 & cond3].sort_values(by = 'release_date', ascending = False)[['title','release_date','vote_average']]

Unnamed: 0_level_0,title,release_date,vote_average
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
417320,Descendants 2,2017-07-21,7.5
374720,Dunkirk,2017-07-19,7.5
382614,The Book of Henry,2017-06-16,7.6
283995,Guardians of the Galaxy Vol. 2,2017-04-19,7.6
416445,Revengeance,2017-04-05,8.0
...,...,...,...
44892,The Music Box,1932-04-16,7.5
877,Scarface,1932-04-09,7.5
25768,"Steamboat Bill, Jr.",1928-02-14,7.9
961,The General,1926-12-31,8.0


## Are Franchises more successful?

4. __Analyze__ the Dataset and __find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies__ in terms of:

- mean revenue
- median Return on Investment
- mean budget raised
- mean popularity
- mean rating

hint: use groupby()

In [84]:
movies.columns

Index(['title', 'tagline', 'release_date', 'genres', 'belongs_to_collection',
       'original_language', 'budget_musd', 'revenue_musd',
       'production_companies', 'production_countries', 'vote_count',
       'vote_average', 'popularity', 'runtime', 'overview', 'spoken_languages',
       'poster_path', 'cast', 'cast_size', 'crew_size', 'director', 'profit',
       'roi'],
      dtype='object')

__Franchise vs. Stand-alone: Average Revenue__

In [99]:
movies['stand-alone'] = np.where(movies.belongs_to_collection.isnull(), 'stand-alone', 'franchise')
movies.groupby(by='stand-alone').revenue_musd.mean()

stand-alone
franchise      165.708193
stand-alone     44.742814
Name: revenue_musd, dtype: float64

__Franchise vs. Stand-alone: Return on Investment / Profitability (median)__

In [102]:
movies.groupby(by = 'stand-alone').profit.median()

stand-alone
franchise      64.234017
stand-alone     5.000000
Name: profit, dtype: float64

__Franchise vs. Stand-alone: Average Budget__

In [103]:
movies.groupby(by = 'stand-alone').budget_musd.mean()

stand-alone
franchise      38.319847
stand-alone    18.047741
Name: budget_musd, dtype: float64

__Franchise vs. Stand-alone: Average Popularity__

In [104]:
movies.groupby(by = 'stand-alone').popularity.mean()

stand-alone
franchise      6.245051
stand-alone    2.592726
Name: popularity, dtype: float64

__Franchise vs. Stand-alone: Average Rating__

In [105]:
movies.groupby(by = 'stand-alone').vote_average.mean()

stand-alone
franchise      5.956806
stand-alone    6.008787
Name: vote_average, dtype: float64

## Most Successful Franchises

5. __Find__ the __most successful Franchises__ in terms of

- __total number of movies__
- __total & mean budget__
- __total & mean revenue__
- __mean rating__

In [111]:
# total number of movies
movies_franchise = movies[movies['stand-alone'] == 'franchise']
movies_franchise.belongs_to_collection.value_counts().head()

The Bowery Boys                  29
Totò Collection                  27
James Bond Collection            26
Zatôichi: The Blind Swordsman    26
The Carry On Collection          25
Name: belongs_to_collection, dtype: int64

In [117]:
#total and mean budget
movies_franchise.groupby(by = 'belongs_to_collection').budget_musd.mean().sort_values(ascending = False).head()

belongs_to_collection
Tangled Collection                     260.0
Pirates of the Caribbean Collection    250.0
The Avengers Collection                250.0
The Hobbit Collection                  250.0
Man of Steel Collection                237.5
Name: budget_musd, dtype: float64

In [116]:
# total and mean revenue
movies_franchise.groupby(by = 'belongs_to_collection').revenue_musd.sum().sort_values(ascending = False).head()
movies_franchise.groupby(by = 'belongs_to_collection').revenue_musd.mean().sort_values(ascending = False).head()

belongs_to_collection
Avatar Collection          2787.965087
The Avengers Collection    1462.480802
Frozen Collection          1274.219009
Finding Nemo Collection     984.453213
The Hobbit Collection       978.507785
Name: revenue_musd, dtype: float64

In [122]:
# mean rating
movies_franchise.groupby(by='belongs_to_collection').vote_average.mean().sort_values(ascending = False).head(n =10)

belongs_to_collection
Argo Collection                        9.300000
Kenji Misumi's Trilogy of the Sword    9.000000
Dreileben                              9.000000
Bloodfight                             9.000000
Алиса в стране чудес (Коллекция)       8.700000
We Were Here                           8.650000
Kizumonogatari                         8.633333
Spirits' Homecoming Collection         8.500000
OSS 117 The Original Saga              8.500000
Glass Tiger collection                 8.500000
Name: vote_average, dtype: float64

## Most Successful Directors

6. __Find__ the __most successful Directors__ in terms of

- __total number of movies__
- __total revenue__
- __mean rating__

In [124]:
director = movies.groupby('director').agg({'title':'count',"revenue_musd":'sum','vote_average':'mean'})

In [125]:
director

Unnamed: 0_level_0,title,revenue_musd,vote_average
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dale Trevillion\t,2,0.000000,4.0
Davide Manuli,1,0.000000,6.9
E.W. Swackhamer,1,0.000000,5.9
Vitaliy Vorobyov,1,0.000000,5.5
Yeon Sang-Ho,4,2.129768,6.6
...,...,...,...
Ярополк Лапшин,1,0.000000,10.0
پیمان معادی,1,0.000000,6.0
塩谷 直義,1,0.000000,7.2
杰森·莫玛,1,0.000000,5.8


In [126]:
# total number of movies
director.nlargest(n=10, columns = 'title')

Unnamed: 0_level_0,title,revenue_musd,vote_average
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
John Ford,66,85.170757,6.381818
Michael Curtiz,65,37.8175,5.998246
Werner Herzog,54,24.57258,6.805556
Alfred Hitchcock,53,250.107584,6.639623
Georges Méliès,49,0.0,5.934694
Woody Allen,49,993.970588,6.691837
Jean-Luc Godard,46,0.867433,6.804348
Sidney Lumet,46,294.522734,6.576744
Charlie Chaplin,44,26.519181,6.540909
Raoul Walsh,43,1.21388,6.004762


In [127]:
director.nlargest(n=10, columns = 'revenue_musd')

Unnamed: 0_level_0,title,revenue_musd,vote_average
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Steven Spielberg,33,9256.621422,6.893939
Peter Jackson,13,6528.244659,7.138462
Michael Bay,13,6437.466781,6.392308
James Cameron,11,5900.61031,6.927273
David Yates,9,5334.563196,6.7
Christopher Nolan,11,4747.408665,7.618182
Robert Zemeckis,19,4138.233542,6.794737
Tim Burton,21,4032.916124,6.733333
Ridley Scott,24,3917.52924,6.604167
Chris Columbus,15,3866.836869,6.44


In [128]:
director.nlargest(n=10, columns = 'vote_average')

Unnamed: 0_level_0,title,revenue_musd,vote_average
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A.W. Vidmer,1,0.0,10.0
Amy Schatz,1,0.0,10.0
Ana Poliak,1,0.0,10.0
Andrew Bowser,1,0.0,10.0
Andrew Napier,1,0.0,10.0
Antonis Sotiropoulos,1,0.0,10.0
Barry Bruce,1,0.0,10.0
Brandon Chesbro,1,0.0,10.0
Brett M. Butler,1,0.0,10.0
Brian Skeet,1,0.0,10.0


## Most Successful Actors

In [130]:
movies.cast

id
862       Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...
8844      Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...
15602     Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...
31357     Whitney Houston|Angela Bassett|Loretta Devine|...
11862     Steve Martin|Diane Keaton|Martin Short|Kimberl...
                                ...                        
439050              Leila Hatami|Kourosh Tahami|Elham Korda
111109    Angel Aquino|Perry Dizon|Hazel Orencio|Joel To...
67758     Erika Eleniak|Adam Baldwin|Julie du Page|James...
227506    Iwan Mosschuchin|Nathalie Lissenko|Pavel Pavlo...
461257                                                  NaN
Name: cast, Length: 44691, dtype: object

In [131]:
movies.head(n=2)

Unnamed: 0_level_0,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,...,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit,roi,stand-alone
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.554033,Pixar Animation Studios,United States of America,...,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter,343.554033,1245.18011,franchise
8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.797249,TriStar Pictures|Teitler Film|Interscope Commu...,United States of America,...,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston,197.797249,404.30346,stand-alone


In [135]:
actor = movies.cast.str.split(pat = "|",expand = True)

In [139]:
actor.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,303,304,305,306,307,308,309,310,311,312
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
862,Tom Hanks,Tim Allen,Don Rickles,Jim Varney,Wallace Shawn,John Ratzenberger,Annie Potts,John Morris,Erik von Detten,Laurie Metcalf,...,,,,,,,,,,
8844,Robin Williams,Jonathan Hyde,Kirsten Dunst,Bradley Pierce,Bonnie Hunt,Bebe Neuwirth,David Alan Grier,Patricia Clarkson,Adam Hann-Byrd,Laura Bell Bundy,...,,,,,,,,,,
15602,Walter Matthau,Jack Lemmon,Ann-Margret,Sophia Loren,Daryl Hannah,Burgess Meredith,Kevin Pollak,,,,...,,,,,,,,,,
31357,Whitney Houston,Angela Bassett,Loretta Devine,Lela Rochon,Gregory Hines,Dennis Haysbert,Michael Beach,Mykelti Williamson,Lamont Johnson,Wesley Snipes,...,,,,,,,,,,
11862,Steve Martin,Diane Keaton,Martin Short,Kimberly Williams-Paisley,George Newbern,Kieran Culkin,BD Wong,Peter Michael Goetz,Kate McGregor-Stewart,Jane Adams,...,,,,,,,,,,


In [161]:
act = actor.stack().reset_index(level = 1, drop = True).to_frame()
act.columns = ['Actor']

In [162]:
act = act.merge(movies[['title','revenue_musd','vote_average','popularity']],how = 'left', left_index = True, right_index = True)

In [168]:
outcome = act.groupby(by='Actor').agg({'title':'count','revenue_musd': 'mean','vote_average':'mean'})

In [172]:
outcome.sort_values(by='title', ascending = False)

Unnamed: 0_level_0,title,revenue_musd,vote_average
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bess Flowers,240,14.756530,6.184186
Christopher Lee,148,324.725789,5.910204
John Wayne,125,11.242571,5.712097
Samuel L. Jackson,122,213.870258,6.266116
Michael Caine,110,191.747734,6.269444
...,...,...,...
James Vasquez,1,,5.000000
James Vieira,1,,3.100000
James Vincent Boland,1,,9.300000
James Vincent Romano,1,,


In [170]:
outcome

Unnamed: 0_level_0,title,revenue_musd,vote_average
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
\tCheung Chi-Sing,1,,5.9000
\tDouglas Hegdahl,1,,4.0000
\tRobert Osth,1,,6.0000
\tYip Chun,2,,6.7500
Jorge de los Reyes,1,,8.1000
...,...,...,...
长泽雅美,11,0.346485,6.4000
陳美貞,1,83.061158,7.0000
高桥一生,8,166.554231,6.7375
강계열,1,,6.0000
