# Exploratory Data Analysis and Visualization of Films from 1980-2020

This project focuses on conducting exploratory data analysis and data visualization of films produced between 1980-2020. The project aims to analyze and explore the relationships between various features of the films, such as the genre, rating, budget, gross, director, writer, and runtime.

By conducting exploratory data analysis, we can identify trends, patterns, and relationships within the dataset, which can then be visualized using various data visualization techniques such as bar charts, scatter plots, and histograms. This will help to better understand the characteristics of the films produced during this period, and enable us to make predictions about future films.

## Imports and Reading Data

In [256]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
# Make float values look better
# We will use USA and UK format system
pd.options.display.float_format = '{:,.2f}'.format
sns.set_style('darkgrid')
sns.set_palette("Set2")

In [257]:
df = pd.read_csv('movies.csv')

## Data Understanding

- head()
- tail()
- info()
- Dataframe shape
- describe()
- isna()


In [258]:
df.head()

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
0,The Shining,R,Drama,1980,"June 13, 1980 (United States)",8.4,927000.0,Stanley Kubrick,Stephen King,Jack Nicholson,United Kingdom,19000000.0,46998772.0,Warner Bros.,146.0
1,The Blue Lagoon,R,Adventure,1980,"July 2, 1980 (United States)",5.8,65000.0,Randal Kleiser,Henry De Vere Stacpoole,Brooke Shields,United States,4500000.0,58853106.0,Columbia Pictures,104.0
2,Star Wars: Episode V - The Empire Strikes Back,PG,Action,1980,"June 20, 1980 (United States)",8.7,1200000.0,Irvin Kershner,Leigh Brackett,Mark Hamill,United States,18000000.0,538375067.0,Lucasfilm,124.0
3,Airplane!,PG,Comedy,1980,"July 2, 1980 (United States)",7.7,221000.0,Jim Abrahams,Jim Abrahams,Robert Hays,United States,3500000.0,83453539.0,Paramount Pictures,88.0
4,Caddyshack,R,Comedy,1980,"July 25, 1980 (United States)",7.3,108000.0,Harold Ramis,Brian Doyle-Murray,Chevy Chase,United States,6000000.0,39846344.0,Orion Pictures,98.0


In [259]:
df.tail()

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
7663,More to Life,,Drama,2020,"October 23, 2020 (United States)",3.1,18.0,Joseph Ebanks,Joseph Ebanks,Shannon Bond,United States,7000.0,,,90.0
7664,Dream Round,,Comedy,2020,"February 7, 2020 (United States)",4.7,36.0,Dusty Dukatz,Lisa Huston,Michael Saquella,United States,,,Cactus Blue Entertainment,90.0
7665,Saving Mbango,,Drama,2020,"April 27, 2020 (Cameroon)",5.7,29.0,Nkanya Nkwai,Lynno Lovert,Onyama Laura,United States,58750.0,,Embi Productions,
7666,It's Just Us,,Drama,2020,"October 1, 2020 (United States)",,,James Randall,James Randall,Christina Roz,United States,15000.0,,,120.0
7667,Tee em el,,Horror,2020,"August 19, 2020 (United States)",5.7,7.0,Pereko Mosia,Pereko Mosia,Siyabonga Mabaso,South Africa,,,PK 65 Films,102.0


In [260]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7668 entries, 0 to 7667
Data columns (total 15 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   name      7668 non-null   object 
 1   rating    7591 non-null   object 
 2   genre     7668 non-null   object 
 3   year      7668 non-null   int64  
 4   released  7666 non-null   object 
 5   score     7665 non-null   float64
 6   votes     7665 non-null   float64
 7   director  7668 non-null   object 
 8   writer    7665 non-null   object 
 9   star      7667 non-null   object 
 10  country   7665 non-null   object 
 11  budget    5497 non-null   float64
 12  gross     7479 non-null   float64
 13  company   7651 non-null   object 
 14  runtime   7664 non-null   float64
dtypes: float64(5), int64(1), object(9)
memory usage: 898.7+ KB


In [261]:
df.describe()

Unnamed: 0,year,score,votes,budget,gross,runtime
count,7668.0,7665.0,7665.0,5497.0,7479.0,7664.0
mean,2000.41,6.39,88108.5,35589876.19,78500541.02,107.26
std,11.15,0.97,163323.76,41457296.6,165725124.32,18.58
min,1980.0,1.9,7.0,3000.0,309.0,55.0
25%,1991.0,5.8,9100.0,10000000.0,4532055.5,95.0
50%,2000.0,6.5,33000.0,20500000.0,20205757.0,104.0
75%,2010.0,7.1,93000.0,45000000.0,76016691.5,116.0
max,2020.0,9.3,2400000.0,356000000.0,2847246203.0,366.0


In [262]:
df.isna().sum()

name           0
rating        77
genre          0
year           0
released       2
score          3
votes          3
director       0
writer         3
star           1
country        3
budget      2171
gross        189
company       17
runtime        4
dtype: int64

In [263]:
# Converting the count of null values to a percentage
df.isna().sum() / df.shape[0] * 100

name        0.00
rating      1.00
genre       0.00
year        0.00
released    0.03
score       0.04
votes       0.04
director    0.00
writer      0.04
star        0.01
country     0.04
budget     28.31
gross       2.46
company     0.22
runtime     0.05
dtype: float64

## Data Preparation

- Identifying duplicated columns
- Deleting and filling some missing values

In [264]:
# Select duplicate rows in a DataFrame. (There is no duplicated rows)
df.loc[df.duplicated(subset = ['name','year','director'])]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [265]:
df[df['budget'].isna()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
16,Fame,R,Drama,1980,"May 16, 1980 (United States)",6.60,21000.00,Alan Parker,Christopher Gore,Eddie Barth,United States,,21202829.00,Metro-Goldwyn-Mayer (MGM),134.00
19,Stir Crazy,R,Comedy,1980,"December 12, 1980 (United States)",6.80,26000.00,Sidney Poitier,Bruce Jay Friedman,Gene Wilder,United States,,101300000.00,Columbia Pictures,111.00
24,Urban Cowboy,PG,Drama,1980,"June 6, 1980 (United States)",6.40,14000.00,James Bridges,Aaron Latham,John Travolta,United States,,46918287.00,Paramount Pictures,132.00
25,Altered States,R,Horror,1980,"December 25, 1980 (United States)",6.90,33000.00,Ken Russell,Paddy Chayefsky,William Hurt,United States,,19853892.00,Warner Bros.,102.00
26,Little Darlings,R,Comedy,1980,"March 21, 1980 (United States)",6.50,5100.00,Ron Maxwell,Kimi Peck,Tatum O'Neal,United States,,34326249.00,Stephen Friedman/Kings Road Productions,96.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7657,Leap,,Drama,2020,"September 25, 2020 (United States)",6.70,903.00,Peter Ho-Sun Chan,Ji Zhang,Gong Li,China,,25818882.00,,135.00
7659,I Am Fear,Not Rated,Horror,2020,"March 3, 2020 (United States)",3.40,447.00,Kevin Shulman,Kevin Shulman,Kristina Klebe,United States,,13266.00,Roxwell Films,87.00
7660,Aloha Surf Hotel,,Comedy,2020,"November 5, 2020 (United States)",7.10,14.00,Stefan C. Schaefer,Stefan C. Schaefer,Augie Tulba,United States,,,Abominable Pictures,90.00
7664,Dream Round,,Comedy,2020,"February 7, 2020 (United States)",4.70,36.00,Dusty Dukatz,Lisa Huston,Michael Saquella,United States,,,Cactus Blue Entertainment,90.00


### Deleting Some Values (Rating / Gross)

- Missing rating values are 77 (1.004173 %)

- Missing gross values are 189 (2.464789 %)

In [266]:
df.dropna(subset = ['rating'],inplace = True)

In [267]:
df.dropna(subset = ['gross'], inplace = True)

In [268]:
df.isna().sum()

name           0
rating         0
genre          0
year           0
released       0
score          0
votes          0
director       0
writer         3
star           0
country        0
budget      2001
gross          0
company        9
runtime        1
dtype: int64

### Filling Missing Values

- director column
- budget column
- company column
- writer column
- runtime column
- released column

In [269]:
# I accidentally discovered these rows..
df[df['director'] == 'Directors']

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
117,The Fox and the Hound,G,Animation,1981,"July 10, 1981 (United States)",7.3,87000.0,Directors,Daniel P. Mannix,Mickey Rooney,United States,12000000.0,63456988.0,Walt Disney Animation Studios,83.0
317,Bugs Bunny's 3rd Movie: 1001 Rabbit Tales,G,Animation,1982,"November 19, 1982 (United States)",7.1,1800.0,Directors,John W. Dunn,Mel Blanc,United States,,78350.0,Warner Bros. Animation,74.0
355,Twilight Zone: The Movie,PG,Horror,1983,"June 24, 1983 (United States)",6.5,34000.0,Directors,John Landis,Dan Aykroyd,United States,10000000.0,29450919.0,Amblin Entertainment,101.0
773,Godzilla 1985,PG,Action,1985,"August 23, 1985 (United States)",6.2,6100.0,Directors,Reuben Bercovitch,Raymond Burr,Japan,2000000.0,4116395.0,Toho Company,87.0
783,He-Man and She-Ra: The Secret of the Sword,G,Animation,1985,"March 22, 1985 (United States)",7.3,2600.0,Directors,Larry DiTillio,John Erwin,United States,2000000.0,7660857.0,Filmation Associates,100.0
902,The Great Mouse Detective,G,Animation,1986,"July 2, 1986 (United States)",7.2,46000.0,Directors,Peter Young,Vincent Price,United States,14000000.0,38625550.0,Walt Disney Pictures,74.0
1011,My Little Pony: The Movie,G,Animation,1986,"June 6, 1986 (United States)",6.0,2500.0,Directors,George Arthur Bloom,Danny DeVito,United States,,5958456.0,Sunbow Productions,86.0
1103,Amazon Women on the Moon,R,Comedy,1987,"September 18, 1987 (United States)",6.2,10000.0,Directors,Michael Barrie,Rosanna Arquette,United States,,548696.0,Universal Pictures,85.0
1109,Aria,R,Comedy,1987,"September 15, 1987 (United States)",5.8,2600.0,Directors,Robert Altman,John Hurt,United Kingdom,,1028679.0,Lightyear Entertainment,90.0
2341,Batman: Mask of the Phantasm,PG,Animation,1993,"December 25, 1993 (United States)",7.8,45000.0,Directors,Alan Burnett,Kevin Conroy,United States,6000000.0,5635204.0,Warner Bros. Animation,76.0


In [270]:
# Found data from the internet
film_directors = {
    "The Fox and the Hound": "Ted Berman",
    "Bugs Bunny's 3rd Movie: 1001 Rabbit Tales": "Friz Freleng",
    "Twilight Zone: The Movie": "John Landis",
    "Godzilla 1985": "Koji Hashimoto",
    "He-Man and She-Ra: The Secret of the Sword": "Ed Friedman",
    "The Great Mouse Detective": "Ron Clements",
    "My Little Pony: The Movie": "Michael Joens",
    "Amazon Women on the Moon": "John Landis",
    "Aria": "Robert Altman",
    "Batman: Mask of the Phantasm": "Bruce Timm",
    "We're Back! A Dinosaur's Story": "Dick Zondag",
    "Four Rooms": "Allison Anders",
    "Beavis and Butt-Head Do America": "Mike Judge",
    "Fantasia 2000": "James Algar",
    "Digimon: The Movie": "Mamoru Hosoda",
    "Paris, je t'aime": "Olivier Assayas",
    "Grindhouse": "Robert Rodriguez",
    "New York, I Love You": "Fatih Akin",
    "V/H/S": "Matt Bettinelli-Olpin",
    "The ABCs of Death": "Kaare Andrews",
    "Movie 43": "Elizabeth Banks",
    "V/H/S/2": "Simon Barrett",
    "Southbound": "Roxanne Benjamin",
    "Moana": "Ron Clements",
    "Leap!": "Eric Summer"
}

for movie,director in film_directors.items():
    df.loc[df["name"] == movie, "director"] = director

In [271]:
df[df['director'] == 'Directors']

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [272]:
# filling missing budget values with mean of budgets
df['budget'].fillna(df['budget'].mean(),inplace = True)

In [273]:
# Checking if there is missing value left
df[df["budget"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [274]:
df[df["company"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
408,A Night in Heaven,R,Drama,1983,"November 18, 1983 (United States)",4.3,1200.0,John G. Avildsen,Joan Tewkesbury,Christopher Atkins,United States,35998144.2,5563663.0,,83.0
633,The Bear,PG,Biography,1984,"September 28, 1984 (United States)",6.1,270.0,Richard C. Sarafian,Michael Kane,Gary Busey,United States,221000.0,2687148.0,,110.0
969,Modern Girls,PG-13,Comedy,1986,"November 7, 1986 (United States)",5.8,1300.0,Jerry Kramer,Laurie Craig,Daphne Zuniga,United States,35998144.2,604849.0,,84.0
1033,P.O.W. the Escape,R,Action,1986,"April 4, 1986 (United States)",5.0,533.0,Gideon Amir,Malcolm Barbour,David Carradine,United States,35998144.2,2497233.0,,90.0
1572,Heart of Dixie,PG,Drama,1989,"August 25, 1989 (United States)",5.2,677.0,Martin Davidson,Anne Rivers Siddons,Ally Sheedy,United States,8000000.0,1097333.0,,95.0
1594,Lost Angels,R,Drama,1989,"May 5, 1989 (United States)",6.0,881.0,Hugh Hudson,Michael Weller,Donald Sutherland,United States,35998144.2,1247946.0,,116.0
1630,Staying Together,R,Comedy,1989,"November 10, 1989 (United States)",6.2,761.0,Lee Grant,Monte Merrick,Sean Astin,United States,35998144.2,4348025.0,,91.0
1806,Streets,R,Action,1990,"January 19, 1990 (United States)",5.7,712.0,Katt Shea,Andy Ruben,Christina Applegate,United States,35998144.2,1510053.0,,85.0
7599,End of the Century,Unrated,Drama,2019,"August 16, 2019 (United States)",6.9,2700.0,Lucio Castro,Lucio Castro,Juan Barberini,Argentina,35998144.2,103047.0,,84.0


In [275]:
# Found data from the internet
movies_company_names = {
    'A Night in Heaven': '20th Century Fox',
    'The Bear': 'Renn Productions, Gaumont, Columbia Pictures',
    'Modern Girls': 'Atlantic Entertainment Group, New World Pictures',
    'P.O.W. the Escape': 'De Laurentiis Entertainment Group',
    'Heart of Dixie': 'HBO Pictures, Island Pictures',
    'Lost Angels': 'Hemdale Film Corporation, Island Pictures',
    'Staying Together': 'Hemdale Film Corporation',
    'Streets': 'Atlantic Entertainment Group, New World Pictures',
    'End of the Century': 'MAFILM, NFDC'
}

for names,company in movies_company_names.items():
    df.loc[df["name"] == names, "company"] = company

In [276]:
# Checking if there is missing value left
df[df["company"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [277]:
df[df["writer"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
1820,The Garden,Unrated,Drama,1990,1990 (United States),6.4,840.0,Derek Jarman,,Tilda Swinton,United Kingdom,35998144.2,5006.0,Basilisk Communications,92.0
5834,The Trip,Not Rated,Comedy,2010,"April 24, 2011 (Greece)",7.0,22000.0,Michael Winterbottom,,Steve Coogan,United Kingdom,35998144.2,3945217.0,Baby Cow Productions,112.0
7655,Legend of Deification,TV-PG,Animation,2020,"October 1, 2020 (United States)",6.6,1300.0,Teng Cheng,,Guangtao Jiang,China,35998144.2,240663149.0,Beijing Enlight Pictures,110.0


In [278]:
df.loc[(df['name'] == "The Garden") & (df['director'] == 'Derek Jarman'),'writer'] = "Derek Jarman"
df.loc[(df['name'] == "The Trip") & (df['director'] == 'Michael Winterbottom'),'writer'] = "Steve Coogan"
df.loc[(df['name'] == "Legend of Deification") & (df['director'] == 'Teng Cheng'),'writer'] = "Li Wei"

In [279]:
# Checking if there is missing value left
df[df["writer"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [280]:
df[df["runtime"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
6195,One for the Money,PG-13,Action,2012,"January 27, 2012 (United States)",5.3,41000.0,Julie Anne Robinson,Stacy Sherman,Katherine Heigl,United States,40000000.0,38084162.0,Lakeshore Entertainment,


In [281]:
df.loc[(df['name'] == "One for the Money") & (df['director'] == 'Julie Anne Robinson'),'runtime'] = 91.0

In [282]:
# Checking if there is missing value left
df[df["runtime"].isnull()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [286]:
# There are no missing values anymore

df.isna().sum()

name        0
rating      0
genre       0
year        0
released    0
score       0
votes       0
director    0
writer      0
star        0
country     0
budget      0
gross       0
company     0
runtime     0
dtype: int64

In [287]:
# Initially there were 7668 pieces of data
# Now there are 7425 pieces
df.shape

(7425, 15)

### Editing Released Column

In [288]:
# Let's clean up the values in the "Released" column from unnecessary details 
# and convert the data type from object to datetime.

df["released"] = df["released"].astype(str)

df['released'] = df['released'].str.replace(r'\(.*\)', '',regex = True).str.strip()

# This code doesn't work right now because not all values are like 'June 13, 1984'
#df['released'] = pd.to_datetime(df['released'], format='%B %d, %Y')

# errors = 'coerce' generates NaT values instead of bad entries
df['released'] = pd.to_datetime(df['released'], format='%B %d, %Y', errors='coerce')

# Determining values that don't have %B %d, %Y format
na_values = df.loc[df['released'].isna()]
na_values

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime
312,Five Days One Summer,PG,Drama,1982,NaT,6.1,1000.0,Fred Zinnemann,Michael Austin,Sean Connery,United States,15000000.0,199078.0,Cable and Wireless Finance,108.0
376,Nostalghia,Not Rated,Drama,1983,NaT,8.1,24000.0,Andrei Tarkovsky,Andrei Tarkovsky,Oleg Yankovskiy,Italy,35998144.2,55269.0,Rai 2,125.0
439,Heat and Dust,R,Drama,1983,NaT,6.6,1600.0,James Ivory,Ruth Prawer Jhabvala,Julie Christie,United Kingdom,35998144.2,1772889.0,Merchant Ivory Productions,130.0
449,Getting It on,R,Comedy,1983,NaT,3.7,208.0,William Olsen,William Olsen,Martin Yost,United States,220000.0,975414.0,Seventh Avenue Films,96.0
463,Slayground,R,Crime,1983,NaT,4.9,360.0,Terry Bedford,Trevor Preston,Peter Coyote,United Kingdom,35998144.2,108128.0,Jennie and Company,89.0
467,My Brother's Wedding,Not Rated,Drama,1983,NaT,7.2,826.0,Charles Burnett,Charles Burnett,Everett Silas,United States,50000.0,26177.0,Charles Burnett Productions,115.0
719,Tampopo,Not Rated,Comedy,1985,NaT,8.0,17000.0,Jûzô Itami,Jûzô Itami,Ken Watanabe,Japan,35998144.2,444213.0,Itami Productions,114.0
731,Trouble in Mind,R,Comedy,1985,NaT,6.5,1800.0,Alan Rudolph,Alan Rudolph,Kris Kristofferson,United States,35998144.2,19632.0,Pfeiffer/Blocker Production,111.0
786,Taipei Story,Not Rated,Drama,1985,NaT,7.7,2500.0,Edward Yang,T'ien-wen Chu,Chin Tsai,Taiwan,35998144.2,35336.0,Evergreen Film Company,119.0
800,O.C. and Stiggs,R,Comedy,1985,NaT,5.4,1200.0,Robert Altman,Tod Carroll,Daniel Jenkins,United States,7000000.0,29815.0,Metro-Goldwyn-Mayer (MGM),109.0


In [289]:
# Found data from the internet
# Changing some NaT values

movies_and_dates = {
  "Five Days One Summer": "1982-07-23",
  "Nostalghia": "1983-06-01",
  "Heat and Dust": "1983-08-05",
  "Getting It on": "1983-11-04",
  "Slayground": "1984-01-13",
  "My Brother's Wedding": "1984-09-07",
  "Tampopo": "1985-04-19",
  "Trouble in Mind": "1985-09-13",
  "Taipei Story": "1985-11-20",
  "O.C. and Stiggs": "1987-03-27",
  "Duet for One": "1988-05-27",
  "The House on Carroll Street": "1988-08-05",
  "Five Corners": "1988-09-09",
  "A Man in Love": "1988-09-23",
  "Drowning by Numbers": "1988-10-14",
  "Chocolat": "1988-12-22",
  "Stormy Monday": "1988-12-23",
  "Pascali's Island": "1988-12-23",
  "Platoon Leader": "1988-12-31",
  "Slaves of New York": "1989-03-17",
  "Scenes from the Class Struggle in Beverly Hills": "1989-04-05",
  "Monsieur Hire": "1989-05-03",
  "Too Beautiful for You": "1989-05-24",
  "Rosalie Goes Shopping": "1989-08-11",
  "Nikita": "1990-02-21",
  "The Comfort of Strangers": "1991-06-14",
  "Life Is Sweet": "1991-10-25",
  "The Field": "1991-12-20",
  "Ju Dou": "1991-12-25",
  "Hidden Agenda": "1991-12-25",
  "My Father's Glory": "1991-12-25",
  "Come See the Paradise": "1991-12-25",
  "The Garden": "1991-12-25",
  "May Fools": "1991-12-25",
  "The Nasty Girl": "1991-12-25",
  "Hear My Song": "1991-12-27",
  "The Adjuster": "1992-05-15",
  "Impromptu": "1992-06-05",
  "Liebestraum": "1992-08-14",
  "Killing Zoe": "1994-10-01",
  "Cronos": "1994-10-07",
  "Bhaji on the Beach": "1994-11-10",
  "Torment": "1994-12-02",
  "The White Balloon": "1995-02-17",
  "Dahmer": "2002-06-21",
  "Hatchet II": "2010-08-26",
  "The Human Centipede II (Full Sequence)": "2011-10-07",
  "Romeo and Juliet": "2013-10-11"
}

for names, dates in movies_and_dates.items():
    df.loc[df["name"] == names,"released"] = pd.to_datetime(dates, format = '%Y-%m-%d')

In [290]:
df.loc[df['released'].isna()]

Unnamed: 0,name,rating,genre,year,released,score,votes,director,writer,star,country,budget,gross,company,runtime


In [291]:
# Let's remove year column, we will use released column when we need year value

df = df.drop(['year'], axis = 1)

In [292]:
df.head()

Unnamed: 0,name,rating,genre,released,score,votes,director,writer,star,country,budget,gross,company,runtime
0,The Shining,R,Drama,1980-06-13,8.4,927000.0,Stanley Kubrick,Stephen King,Jack Nicholson,United Kingdom,19000000.0,46998772.0,Warner Bros.,146.0
1,The Blue Lagoon,R,Adventure,1980-07-02,5.8,65000.0,Randal Kleiser,Henry De Vere Stacpoole,Brooke Shields,United States,4500000.0,58853106.0,Columbia Pictures,104.0
2,Star Wars: Episode V - The Empire Strikes Back,PG,Action,1980-06-20,8.7,1200000.0,Irvin Kershner,Leigh Brackett,Mark Hamill,United States,18000000.0,538375067.0,Lucasfilm,124.0
3,Airplane!,PG,Comedy,1980-07-02,7.7,221000.0,Jim Abrahams,Jim Abrahams,Robert Hays,United States,3500000.0,83453539.0,Paramount Pictures,88.0
4,Caddyshack,R,Comedy,1980-07-25,7.3,108000.0,Harold Ramis,Brian Doyle-Murray,Chevy Chase,United States,6000000.0,39846344.0,Orion Pictures,98.0


In [293]:
df.tail()

Unnamed: 0,name,rating,genre,released,score,votes,director,writer,star,country,budget,gross,company,runtime
7652,The Eight Hundred,Not Rated,Action,2020-08-28,6.8,3700.0,Hu Guan,Hu Guan,Zhi-zhong Huang,China,80000000.0,461421559.0,Beijing Diqi Yinxiang Entertainment,149.0
7653,The Quarry,R,Crime,2020-04-17,5.4,2400.0,Scott Teems,Scott Teems,Shea Whigham,United States,35998144.2,3661.0,Prowess Pictures,98.0
7655,Legend of Deification,TV-PG,Animation,2020-10-01,6.6,1300.0,Teng Cheng,Li Wei,Guangtao Jiang,China,35998144.2,240663149.0,Beijing Enlight Pictures,110.0
7656,Tulsa,PG-13,Comedy,2020-06-03,5.0,294.0,Scott Pryor,Scott Pryor,Scott Pryor,United States,35998144.2,413378.0,Pryor Entertainment,120.0
7659,I Am Fear,Not Rated,Horror,2020-03-03,3.4,447.0,Kevin Shulman,Kevin Shulman,Kristina Klebe,United States,35998144.2,13266.0,Roxwell Films,87.0
