# Using Pandas with Movie Data

### Goal of this notebook is to explore IMDB data using Pandas for practice

### Using IMDB data available [here](https://www.kaggle.com/PromptCloudHQ/imdb-data/data")


### Contents

#### 1. Data Summary - High level summary of dataset

#### 2. Missing Data - Investigation into missing data in dataset

#### 3. Revenue and Genre - Exploring the relationship btw the two characteristics

#### 4. Actors - Create summary table showing each actor's highlights in dataset.

#### Edit I made to the csv file:

1. There were inconsistencies with the spacing of the actor's names in the file. I conformed all actor data to have a single space after each comma.


In [86]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import itertools

In [87]:
#Import data

dataset = pd.read_csv('IMDB-Movie-Data.csv')

dataset.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


# Part 1 - Data Summary




In [88]:
#for the most part, there are not too many missing values across the dataset. 
#Revenue and metascore are the only two columns with missing data.

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
Rank                  1000 non-null int64
Title                 1000 non-null object
Genre                 1000 non-null object
Description           1000 non-null object
Director              1000 non-null object
Actors                1000 non-null object
Year                  1000 non-null int64
Runtime (Minutes)     1000 non-null int64
Rating                1000 non-null float64
Votes                 1000 non-null int64
Revenue (Millions)    872 non-null float64
Metascore             936 non-null float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.8+ KB


In [89]:
#The years span 2006 - 2016 as expected.
#The dataset has movies with bad ratings so it's not clear what this data represents exactly.


dataset.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


### What are those poorly rated movies that found their way in this dataset?

In [90]:
#Most of the 10 worst rated movies came out 2016, which may influence why they are included in the database.
#But why would movies from 2008 and 2009 be in the dataset too?

dataset.sort_values(by = 'Rating').head(10)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
829,830,Disaster Movie,Comedy,"Over the course of one evening, an unsuspectin...",Jason Friedberg,"Carmen Electra, Vanessa Lachey,Nicole Parker, ...",2008,87,1.9,77207,14.17,15.0
42,43,Don't Fuck in the Woods,Horror,A group of friends are going on a camping trip...,Shawn Burkett,"Brittany Blanton, Ayse Howard, Roman Jossart,N...",2016,73,2.7,496,,
871,872,Dragonball Evolution,"Action,Adventure,Fantasy",The young warrior Son Goku sets out on a quest...,James Wong,"Justin Chatwin, James Marsters, Yun-Fat Chow, ...",2009,85,2.7,59512,9.35,45.0
647,648,Tall Men,"Fantasy,Horror,Thriller",A challenged man is stalked by tall phantoms i...,Jonathan Holbrook,"Dan Crisafulli, Kay Whitney, Richard Garcia, P...",2016,133,3.2,173,,57.0
968,969,Wrecker,"Action,Horror,Thriller",Best friends Emily and Lesley go on a road tri...,Micheal Bafaro,"Anna Hutchison, Andrea Whitburn, Jennifer Koen...",2015,83,3.5,1210,,37.0
890,891,The Intent,"Crime,Drama",Gunz (Dylan Duffus) is thrust into a world of ...,Femi Oyeniran,"Dylan Duffus, Scorcher,Shone Romulus, Jade Asha",2016,104,3.5,202,,59.0
49,50,The Last Face,Drama,A director (Charlize Theron) of an internation...,Sean Penn,"Charlize Theron, Javier Bardem, Adèle Exarchop...",2016,130,3.7,987,,16.0
269,270,Satanic,Horror,Four friends on their way to Coachella stop of...,Jeffrey G. Hunt,"Sarah Hyland, Steven Krueger, Justin Chon, Cla...",2016,85,3.7,2384,,
525,526,Birth of the Dragon,"Action,Biography,Drama","Young, up-and-coming martial artist, Bruce Lee...",George Nolfi,"Billy Magnussen, Terry Chen, Teresa Navarro,Va...",2016,103,3.9,552,93.05,61.0
401,402,The Black Room,Horror,PAUL and JENNIFER HEMDALE have just moved into...,Rolfe Kanefsky,"Natasha Henstridge, Lukas Hassel, Lin Shaye,Do...",2016,91,3.9,240,,71.0


### It's not clear what this dataset represents exactly, but it seems like it generally has most of the major movies that came out in the timespan of 2006 to 2016. 

# Part 2 - Missing Data

### From the cell above, it appears that missing data for revenue and metascore comes from movies released in 2016. This could be because the revenue and metascores weren't finalized when the data was compiled.  

### Let's see if all the missing values are from 2016 movies

In [91]:
#renaming the revenue and runtime columns

dataset.rename(columns ={'Runtime (Minutes)': 'Runtime', 'Revenue (Millions)': 'Revenue'}, inplace = True)

dataset.head()


Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime,Rating,Votes,Revenue,Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


In [92]:
#find all the movies with missing value and group them by year made

#get a series containing all rows with a nan value
nanRows = pd.isnull(dataset).any(1).nonzero()[0]

#select the nan rows and then summarize by year
dataset.loc[nanRows, ["Year"]].apply(pd.value_counts).sort_index(ascending = False)

Unnamed: 0,Year
2016,99
2015,18
2014,5
2013,7
2012,2
2011,6
2010,3
2009,6
2008,4
2007,9


### It looks like most of the missing data comes from movies made in 2016 but there are missing values from movies released before then. Since Revenue and Metascore both have missing data, let's breakdown to look at missing data for each column

### Let's start by investigating movie revenue:

In [93]:
#find all the movies with missing revenue and group them by year made

#get an array containing all rows with a nan value
nanRev = dataset.Revenue.isnull().nonzero()[0]

#select the nan rows and then summarize by year
dataset.loc[nanRev, ["Year"]].apply(pd.value_counts).sort_index(ascending = False)

Unnamed: 0,Year
2016,92
2015,14
2014,4
2013,3
2011,1
2010,3
2009,4
2008,1
2007,4
2006,2


In [94]:
#What are the movies without revenue that came out before 2016?

dataset.iloc[nanRev, :][dataset.Year < 2016]

  app.launch_new_instance()


Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime,Rating,Votes,Revenue,Metascore
39,40,5- 25- 77,"Comedy,Drama","Alienated, hopeful-filmmaker Pat Johnson's epi...",Patrick Read Johnson,"John Francis Daley, Austin Pendleton, Colleen ...",2007,113,7.1,241,,
154,155,Twin Peaks: The Missing Pieces,"Drama,Horror,Mystery",Twin Peaks before Twin Peaks (1990) and at the...,David Lynch,"Chris Isaak, Kiefer Sutherland, C.H. Evans, Sa...",2014,91,8.1,1973,,
185,186,Love,"Drama,Romance",Murphy is an American living in Paris who ente...,Gaspar Noé,"Aomi Muyock, Karl Glusman, Klara Kristin, Juan...",2015,135,6.0,24003,,51.0
213,214,Old Boy,"Action,Drama,Mystery","Obsessed with vengeance, a man sets out to fin...",Spike Lee,"Josh Brolin, Elizabeth Olsen, Samuel L. Jackso...",2013,104,5.8,54679,,49.0
282,283,Death Proof,Thriller,Two separate sets of voluptuous women are stal...,Quentin Tarantino,"Kurt Russell, Zoë Bell, Rosario Dawson, Vaness...",2007,113,7.1,220236,,
398,399,Absolutely Anything,"Comedy,Sci-Fi",A group of eccentric aliens confer a human bei...,Terry Jones,"Simon Pegg, Kate Beckinsale, Sanjeev Bhaskar, ...",2015,85,6.0,26587,,31.0
428,429,Srpski film,"Horror,Mystery,Thriller",An aging porn star agrees to participate in an...,Srdjan Spasojevic,"Srdjan 'Zika' Todorovic, Sergej Trifunovic,Jel...",2010,104,5.2,43648,,55.0
463,464,Predestination,"Drama,Mystery,Sci-Fi","For his final assignment, a top temporal agent...",Michael Spierig,"Ethan Hawke, Sarah Snook, Noah Taylor, Madelei...",2014,97,7.5,187760,,69.0
479,480,Macbeth,"Drama,War","Macbeth, the Thane of Glamis, receives a proph...",Justin Kurzel,"Michael Fassbender, Marion Cotillard, Jack Mad...",2015,113,6.7,41642,,71.0
504,505,Mr. Nobody,"Drama,Fantasy,Romance",A boy stands on a station platform as a train ...,Jaco Van Dormael,"Jared Leto, Sarah Polley, Diane Kruger, Linh D...",2009,141,7.9,166872,,63.0


### Regarding missing revenue, the box office data is typically available on wikipedia. For example [Hachi: A Dog's Tale](https://en.wikipedia.org/wiki/Hachi:_A_Dog%27s_Tale)  collected 46.7 million.

### Perhaps in another session, I will try to build a solution that inputs box office data via the wikipedia api for the missing data

In [95]:
#What are the movies without a metascore that came out before 2016?

#get an array containing all rows with a nan value
nanMeta = dataset.Metascore.isnull().nonzero()[0]

#select the nan rows and then summarize by year
dataset.iloc[nanMeta, :][dataset.Year < 2016]



Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime,Rating,Votes,Revenue,Metascore
26,27,Bahubali: The Beginning,"Action,Adventure,Drama","In ancient India, an adventurous and daring ma...",S.S. Rajamouli,"Prabhas, Rana Daggubati, Anushka Shetty,Tamann...",2015,159,8.3,76193,6.5,
39,40,5- 25- 77,"Comedy,Drama","Alienated, hopeful-filmmaker Pat Johnson's epi...",Patrick Read Johnson,"John Francis Daley, Austin Pendleton, Colleen ...",2007,113,7.1,241,,
154,155,Twin Peaks: The Missing Pieces,"Drama,Horror,Mystery",Twin Peaks before Twin Peaks (1990) and at the...,David Lynch,"Chris Isaak, Kiefer Sutherland, C.H. Evans, Sa...",2014,91,8.1,1973,,
282,283,Death Proof,Thriller,Two separate sets of voluptuous women are stal...,Quentin Tarantino,"Kurt Russell, Zoë Bell, Rosario Dawson, Vaness...",2007,113,7.1,220236,,
402,403,Bronson,"Action,Biography,Crime",A young man who was sentenced to seven years i...,Nicolas Winding Refn,"Tom Hardy, Kelly Adams, Luing Andrews,Katy Barker",2008,92,7.1,93972,0.1,
417,418,Atonement,"Drama,Mystery,Romance","Fledgling writer Briony Tallis, as a thirteen-...",Joe Wright,"Keira Knightley, James McAvoy, Brenda Blethyn,...",2007,123,7.8,202890,50.92,
435,436,Filth,"Comedy,Crime,Drama","A corrupt, junkie cop with bipolar disorder at...",Jon S. Baird,"James McAvoy, Jamie Bell, Eddie Marsan, Imogen...",2013,97,7.1,81301,0.03,
445,446,Silent Hill,"Adventure,Horror,Mystery","A woman, Rose, goes in search for her adopted ...",Christophe Gans,"Radha Mitchell, Laurie Holden, Sean Bean,Debor...",2006,125,6.6,184152,46.98,
526,527,Elysium,"Action,Drama,Sci-Fi","In the year 2154, the very wealthy live on a m...",Neill Blomkamp,"Matt Damon, Jodie Foster, Sharlto Copley, Alic...",2013,109,6.6,358932,,
532,533,Deja Vu,"Action,Sci-Fi,Thriller","After a ferry is bombed in New Orleans, an A.T...",Tony Scott,"Denzel Washington, Paula Patton, Jim Caviezel,...",2006,126,7.0,253858,,


### Similarly to movie revenue, metascores that are missing in this dataset are available via [metacritic](http://www.metacritic.com/movie/the-walk). 

# Part 3 - Revenue and Genre

### It seems like Action movies will be the highest grossing genre. They have a broad audiences, and typically studios will invest more into this genre.

### Comedies, Dramas, and Thrillers seem like they will make less than Action movies. I would guess that Comedies would have more hits than Dramas/Thrillers because slapstick comedies can draw in broad audiences.

### Identify all the genres in the dataset: 

In [96]:
#via nachrism's work on kaggle, this code succintly breaks down all the unique genres listed in the dataset
#link to nachrism's work(https://www.kaggle.com/nachrism/imdb-eda)

unique_genres = dataset['Genre'].unique()
individual_genres = []
for genre in unique_genres:
    individual_genres.append(genre.split(','))

individual_genres = list(itertools.chain.from_iterable(individual_genres))

individual_genres = set(individual_genres)

individual_genres

{'Action',
 'Adventure',
 'Animation',
 'Biography',
 'Comedy',
 'Crime',
 'Drama',
 'Family',
 'Fantasy',
 'History',
 'Horror',
 'Music',
 'Musical',
 'Mystery',
 'Romance',
 'Sci-Fi',
 'Sport',
 'Thriller',
 'War',
 'Western'}

### Build Dataframes for each genre

In [97]:
#what's the average box office sales for a movie for each genre?

#start by a creating a dictionary of series determining whether each genre is listed for each movie

genreMembers = {x: dataset["Genre"].str.contains(x) for x in individual_genres}

#get dataframe for each genre match

genreFrames = {x: dataset.loc[genreMembers[x], :] for x in genreMembers}

### Calculate frequency and average revenue for each genre

In [98]:
genreInfo = {}

for item in genreFrames:
    genreInfo[item] = [genreFrames[item].shape[0], genreFrames[item].loc[:, 'Revenue'].mean()]
    
genreInfo = pd.DataFrame(genreInfo, dtype = 'int64')

genreInfo = genreInfo.T

genreInfo.columns = ['Movie_Count', 'Mean_Revenue']

genreInfo.sort_values(by = 'Mean_Revenue', ascending = False)

Unnamed: 0,Movie_Count,Mean_Revenue
Animation,49,191
Adventure,259,154
Sci-Fi,120,135
Fantasy,101,131
Family,51,126
Action,303,124
Western,7,111
Musical,5,81
Comedy,279,75
Thriller,195,69


### My preconception that Action would be the highest grossing genre was wrong. It came in sixth well behind the two other 'A' genres in Animation and Adventure. 

### Comedy, Drama and Thrillers were all well below in average revenue in comparison to action.


### For more detail on the data, let's find more descriptive statistics for revenue in for each genre

In [99]:
genreInfo = {}

for item in genreFrames:
    genreInfo[item] = genreFrames[item]["Revenue"].describe()
    
genreInfo = pd.DataFrame(genreInfo)

genreInfo = genreInfo.T.sort_values(by = 'mean', ascending = False)

genreInfo

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Animation,47.0,191.223404,119.641355,0.29,101.015,191.45,254.63,486.29
Adventure,252.0,154.177024,138.128866,0.05,49.1675,115.16,233.9375,936.63
Sci-Fi,110.0,135.552545,139.378843,0.01,35.005,85.71,197.05,652.18
Fantasy,93.0,131.850108,154.089695,0.08,31.14,75.28,218.63,936.63
Family,49.0,126.175714,92.876542,1.2,64.0,85.46,197.99,364.0
Action,286.0,124.494476,130.985135,0.03,35.1975,85.955,175.455,936.63
Western,5.0,111.824,54.178083,42.62,89.29,93.38,162.8,171.03
Musical,5.0,81.642,59.853671,24.34,38.51,52.88,143.7,148.78
Comedy,255.0,75.750784,85.803254,0.01,12.13,52.69,110.62,486.29
Thriller,153.0,69.577255,78.676445,0.0,11.28,43.0,98.9,448.13


### Note that the counts decreased from the previous table because the movies w/ missing revenue data are not counted. 

### Across the board, the mean revenue is higher than the median, indicating data is skewed right. This is unsurprising since blockbusters bring in so much more revenue than other films. For example, the max revenue for each genre is almost always further from the median than the min revenue which is at least zero. 

# Part 4 - Actor summary table

### The actor summary table is going to provide a high level overview of each actor's work. It will:

1. Count the billings of each actor (1st lead, 2nd lead, 3rd, 4th) 
2. Identify and count the movies they were in that grossed over 100 million
3. Get the average imdb rating for the movies they were in
4. Identify and count the most common movie genre they worked on

### Section 1 - count the billings of each actor

#### I am going to create four separate groups for 1st, 2nd, 3rd, 4th lead.

In [100]:
#each Actors list contains the main four actors in order of their billing

print(type(dataset['Actors'][0]))

print(dataset['Actors'][90])

print(dataset['Actors'][900])

<class 'str'>
Hugh Jackman, Jake Gyllenhaal, Viola Davis,Melissa Leo
Jason Sudeikis, Alison Brie, Jordan Carlos,Margarita Levieva


In [161]:
#create actors
actors = dataset.Actors

#build list of lists of actors in each movie
actorList = [actors[i].split(',') for i, people in enumerate(actors)]

#check to see if four actors are present for all movies. It turns out one movie only has three actors. 
countActors = []
for i, movie in enumerate(actorList):
    countActors.append([len(movie), i])
    
countActors.sort()

countActors



[[3, 622],
 [4, 0],
 [4, 1],
 [4, 2],
 [4, 3],
 [4, 4],
 [4, 5],
 [4, 6],
 [4, 7],
 [4, 8],
 [4, 9],
 [4, 10],
 [4, 11],
 [4, 12],
 [4, 13],
 [4, 14],
 [4, 15],
 [4, 16],
 [4, 17],
 [4, 18],
 [4, 19],
 [4, 20],
 [4, 21],
 [4, 22],
 [4, 23],
 [4, 24],
 [4, 25],
 [4, 26],
 [4, 27],
 [4, 28],
 [4, 29],
 [4, 30],
 [4, 31],
 [4, 32],
 [4, 33],
 [4, 34],
 [4, 35],
 [4, 36],
 [4, 37],
 [4, 38],
 [4, 39],
 [4, 40],
 [4, 41],
 [4, 42],
 [4, 43],
 [4, 44],
 [4, 45],
 [4, 46],
 [4, 47],
 [4, 48],
 [4, 49],
 [4, 50],
 [4, 51],
 [4, 52],
 [4, 53],
 [4, 54],
 [4, 55],
 [4, 56],
 [4, 57],
 [4, 58],
 [4, 59],
 [4, 60],
 [4, 61],
 [4, 62],
 [4, 63],
 [4, 64],
 [4, 65],
 [4, 66],
 [4, 67],
 [4, 68],
 [4, 69],
 [4, 70],
 [4, 71],
 [4, 72],
 [4, 73],
 [4, 74],
 [4, 75],
 [4, 76],
 [4, 77],
 [4, 78],
 [4, 79],
 [4, 80],
 [4, 81],
 [4, 82],
 [4, 83],
 [4, 84],
 [4, 85],
 [4, 86],
 [4, 87],
 [4, 88],
 [4, 89],
 [4, 90],
 [4, 91],
 [4, 92],
 [4, 93],
 [4, 94],
 [4, 95],
 [4, 96],
 [4, 97],
 [4, 98],
 [4, 99],

In [162]:
#insert a dummy string so that the actor assignment works throughout the lists

actorList[622].append('')

actorList[622]

['Willem Dafoe', ' Charlotte Gainsbourg', ' Storm Acheche Sahlstrøm', '']

In [163]:
#zip the lists so that the 1st, 2nd, 3rd, 4th leads are all grouped together
zippedActors = zip(*actorList)

zippedActors = list(zippedActors)

zippedActors

[('Chris Pratt',
  'Noomi Rapace',
  'James McAvoy',
  'Matthew McConaughey',
  'Will Smith',
  'Matt Damon',
  'Ryan Gosling',
  'Essie Davis',
  'Charlie Hunnam',
  'Jennifer Lawrence',
  'Eddie Redmayne',
  'Taraji P. Henson',
  'Felicity Jones',
  "Auli'i Cravalho",
  'Anne Hathaway',
  'Louis C.K.',
  'Andrew Garfield',
  'Matt Damon',
  'Dev Patel',
  'Amy Adams',
  'Matthew McConaughey',
  'Casey Affleck',
  'Emma Booth',
  'Anna Kendrick',
  'Liam Hemsworth',
  'Fiona Gordon',
  'Prabhas',
  'Jocelin Donahue',
  'Mila Kunis',
  'Michael Fassbender',
  'Zoey Deutch',
  'Amy Adams',
  'James McAvoy',
  'Ryan Reynolds',
  'Milla Jovovich',
  'Chris Evans',
  'Matthew McConaughey',
  'Benedict Cumberbatch',
  'Denzel Washington',
  'John Francis Daley',
  'Seth Rogen',
  'Mahershala Ali',
  'Brittany Blanton',
  'Michael Keaton',
  'Gabriel Chavarria',
  'Johnny Depp',
  'Jessica Chastain',
  'Hermione Corfield',
  'Chris Pine',
  'Charlize Theron',
  'Daisy Ridley',
  'Kate Beckin

In [166]:
#build dictionary from zip to setup dataframe
actorDict = {i+1: zippedActors[i] for i, x in enumerate(zippedActors)}

actorDict

{1: ('Chris Pratt',
  'Noomi Rapace',
  'James McAvoy',
  'Matthew McConaughey',
  'Will Smith',
  'Matt Damon',
  'Ryan Gosling',
  'Essie Davis',
  'Charlie Hunnam',
  'Jennifer Lawrence',
  'Eddie Redmayne',
  'Taraji P. Henson',
  'Felicity Jones',
  "Auli'i Cravalho",
  'Anne Hathaway',
  'Louis C.K.',
  'Andrew Garfield',
  'Matt Damon',
  'Dev Patel',
  'Amy Adams',
  'Matthew McConaughey',
  'Casey Affleck',
  'Emma Booth',
  'Anna Kendrick',
  'Liam Hemsworth',
  'Fiona Gordon',
  'Prabhas',
  'Jocelin Donahue',
  'Mila Kunis',
  'Michael Fassbender',
  'Zoey Deutch',
  'Amy Adams',
  'James McAvoy',
  'Ryan Reynolds',
  'Milla Jovovich',
  'Chris Evans',
  'Matthew McConaughey',
  'Benedict Cumberbatch',
  'Denzel Washington',
  'John Francis Daley',
  'Seth Rogen',
  'Mahershala Ali',
  'Brittany Blanton',
  'Michael Keaton',
  'Gabriel Chavarria',
  'Johnny Depp',
  'Jessica Chastain',
  'Hermione Corfield',
  'Chris Pine',
  'Charlize Theron',
  'Daisy Ridley',
  'Kate Bec

In [168]:
#create dataframe of the actor's roles.
actorRoles = pd.DataFrame(actorDict)

actorRoles.head()

Unnamed: 0,1,2,3,4
0,Chris Pratt,Vin Diesel,Bradley Cooper,Zoe Saldana
1,Noomi Rapace,Logan Marshall-Green,Michael Fassbender,Charlize Theron
2,James McAvoy,Anya Taylor-Joy,Haley Lu Richardson,Jessica Sula
3,Matthew McConaughey,Reese Witherspoon,Seth MacFarlane,Scarlett Johansson
4,Will Smith,Jared Leto,Margot Robbie,Viola Davis
