# IMDB TOP MOVIE IN 2021

## Introduction

### Author: Yunhao Mei

#### Movies are a form of relaxing entertainment that originated in the early 19th century. And through a series of technological development to form the present appearance. 1891 Inventor Thomas Edison and lab assistant William Dickson invented the so-called kinetoscope. The kinetoscope was a cabinet with a window through which individual viewers could experience the illusion of a moving  image. As movie projectors became more popular, two French brothers, Auguste and Louis Lumiere, modified them in 1985 to make them lighter. By the end of the year, Louis had put on the world's first commercial film.

#### As the demand for movies increased, production wages came into being. This also meant the rise of commercial films. As cinema became more popular among the middle class, and as feature films began to keep audiences in their seats longer, the cinema was born. Because Hollywood is an ideal location for filmmaking: the climate is mild and sunny, and the land is rich and there are many different terrains, it is the largest filmmaking location.

#### With the development of the film industry, there are more and more movies, and the evaluation of the movie has become a reference of whether to watch the movie or not. Now, there are many websites that recommend movies, such as IMDB, whose ratings are rated by registered users. While these sites' reviews of films are not entirely impartial, they serve as a range of references.

## About the Data

####  This is a public data on Kaggle that records movie information on IMDB. The data records the top 250 movies for each year from 1996 to 2021. For research data, the 2021 data is even more time-sensitive. This article mainly discusses the number of movies each year, the highest-grossing movies and the score. Data is not everything, and this article is based only on this data set.
https://www.kaggle.com/datasets/mustafacicek/imdb-top-250-lists-1996-2020

### Data processing

In [4]:
import pandas as pd
import bqplot

In [5]:
#read the data
imdb_top = pd.read_csv("imdbTop250.csv")
imdb_top_2021 = imdb_top.query('IMDByear==2021')
imdb_top_2021

Unnamed: 0,Ranking,IMDByear,IMDBlink,Title,Date,RunTime,Genre,Rating,Score,Votes,Gross,Director,Cast1,Cast2,Cast3,Cast4
6250,1,2021,/title/tt0111161/,The Shawshank Redemption,1994,142,Drama,9.3,80.0,2529673,28.34,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler
6251,2,2021,/title/tt0068646/,The Godfather,1972,175,"Crime, Drama",9.2,100.0,1741574,134.97,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton
6252,3,2021,/title/tt0071562/,The Godfather: Part II,1974,202,"Crime, Drama",9.0,90.0,1208326,57.30,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton
6253,4,2021,/title/tt0468569/,The Dark Knight,2008,152,"Action, Crime, Drama",9.0,84.0,2480130,534.86,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine
6254,5,2021,/title/tt0050083/,12 Angry Men,1957,96,"Crime, Drama",9.0,96.0,747360,4.36,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6495,246,2021,/title/tt0058946/,The Battle of Algiers,1966,121,"Drama, War",8.1,96.0,57995,0.06,Gillo Pontecorvo,Brahim Hadjadj,Jean Martin,Yacef Saadi,Samia Kerbash
6496,247,2021,/title/tt0050783/,Nights of Cabiria,1957,110,Drama,8.1,,47318,0.75,Federico Fellini,Giulietta Masina,François Périer,Franca Marzi,Dorian Gray
6497,248,2021,/title/tt0093779/,The Princess Bride,1987,98,"Adventure, Family, Fantasy",8.1,77.0,416207,30.86,Rob Reiner,Cary Elwes,Mandy Patinkin,Robin Wright,Chris Sarandon
6498,249,2021,/title/tt7060344/,Raatchasan,2018,170,"Crime, Drama, Mystery",8.4,,37474,,Ram Kumar,Vishnu Vishal,Amala Paul,Radha Ravi,Sangili Murugan


In [6]:
# no of movie on dif year
title_groupby_date = imdb_top_2021.groupby('Date')['Title'].count()
title_groupby_date.index = title_groupby_date.index.astype(int)
max_year = title_groupby_date.index.max()
min_year = title_groupby_date.index.min()
title_groupby_date[1921]

title_groupby_date2 = pd.DataFrame({'Date': [i for i in range(min_year, max_year)],
                                    "Count": [title_groupby_date[i] if i in title_groupby_date.index else 0 for i in
                                              range(min_year, max_year)]})

title_groupby_date2

Unnamed: 0,Date,Count
0,1921,1
1,1922,0
2,1923,0
3,1924,1
4,1925,1
...,...,...
95,2016,5
96,2017,3
97,2018,6
98,2019,6


## The year in which the selected films were released

#### As a rule of thumb, the most popular movies in recent years should be almost none that old. Because it's a long way to go in the history of film. The number of movies grows every year, and the people who like them get older. New audiences find it hard to accept films that look old.

In [11]:
#year-number
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

x_ax = bqplot.Axis(scale=x_sc, label='Date')
y_ax = bqplot.Axis(scale=y_sc, label='Count', orientation='vertical')

tt = bqplot.Tooltip(fields=['x', 'y'], labels=['Date', 'Count'], show_labels=True)

line_title_count = bqplot.Lines(x=title_groupby_date2['Date'], y=title_groupby_date2['Count'], data=title_groupby_date2,
                                scales={'x': x_sc, 'y': y_sc}, tooltip=tt)

fig = bqplot.Figure(marks=[line_title_count], axes=[x_ax, y_ax], title='Number-Year')

bar_title_count = bqplot.Bars(x=title_groupby_date2['Date'], y=title_groupby_date2['Count'], data=title_groupby_date2,
                              scales={'x': x_sc, 'y': y_sc}, tooltip=tt)

fig2 = bqplot.Figure(marks=[bar_title_count], axes=[x_ax, y_ax], title='Number-Year')
display(fig, fig2)

Figure(axes=[Axis(label='Date', scale=LinearScale()), Axis(label='Count', orientation='vertical', scale=Linear…

Figure(axes=[Axis(label='Date', scale=LinearScale()), Axis(label='Count', orientation='vertical', scale=Linear…

## Box office accumulated

#### Box office has always been the most direct and intuitive way to evaluate movies.  In the nearly 100 years of film history, despite the development of time, inflation and even the influence of historical environment, the number of box office can also indicate the quality of a film to a certain extent.  The IMDB

In [12]:
#top 10 box(gross)
title_gross_top_10 = imdb_top_2021[['Title', 'Gross']].sort_values(by=['Gross'], ascending=False).head(10)
title_gross_top_10

Unnamed: 0,Title,Gross
6325,Avengers: Endgame,858.37
6312,Avengers: Infinity War,678.82
6253,The Dark Knight,534.86
6322,The Dark Knight Rises,448.14
6288,The Lion King,422.78
6371,Toy Story 3,415.0
6416,Jurassic Park,402.45
6464,Harry Potter and the Deathly Hallows: Part 2,381.01
6426,Finding Nemo,380.84
6256,The Lord of the Rings: The Return of the King,377.85


In [13]:
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.LinearScale()

x_ax = bqplot.Axis(scale=x_sc, tick_rotate=45, label='Title')
y_ax = bqplot.Axis(scale=y_sc, label='Gross', orientation='vertical')

tt = bqplot.Tooltip(fields=['x', 'y'], labels=['Title', 'Gross'])

line_title_count = bqplot.Bars(x=title_gross_top_10['Title'], y=title_gross_top_10['Gross'],
                               scales={'x': x_sc, 'y': y_sc}, tooltip=tt)

fig = bqplot.Figure(marks=[line_title_count], axes=[x_ax, y_ax], title='Gross Top10')

display(fig)

Figure(axes=[Axis(label='Title', scale=OrdinalScale(), tick_rotate=45), Axis(label='Gross', orientation='verti…

###                                                                            Avengers:Endgame

![jupyter](https://upload.wikimedia.org/wikipedia/en/0/0d/Avengers_Endgame_poster.jpg)

## The Movie Rating

#### For movie ratings, there may be a big difference because of the different participants. Although many of the authoritative ratings are made by professional film critics, public opinion should also be a part of film ratings. The data is based on the top 250 movies rated on IMDB in 2021, and can be adjusted with a slider.

In [14]:
import ipywidgets

slider = ipywidgets.IntSlider(min=0, max=250, value=50, step=1)

# rating
title_rank_top_50 = imdb_top_2021[['Title', 'Rating']].sort_values(by=['Rating'], ascending=False).head(
    slider.value)
# draw
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.LinearScale()

x_ax = bqplot.Axis(scale=x_sc, tick_rotate=-45, label='Title')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Rating')

tt = bqplot.Tooltip(fields=['x', 'y'], labels=['Title', 'Rating'])
# Link functions.
scatters3 = bqplot.Scatter(x=title_rank_top_50['Title'], y=title_rank_top_50['Rating'],
                               scales={'x': x_sc, 'y': y_sc}, tooltip=tt)
def on_value_change(change):
    if (change['name'] == 'value'):
        title_rank_top_50 = imdb_top_2021[['Title', 'Rating']].sort_values(by=['Rating'], ascending=False).head(slider.value)
        scatters3.x = title_rank_top_50['Title']
        scatters3.y = title_rank_top_50['Rating']

slider.observe(on_value_change, names='value')
fig = bqplot.Figure(marks=[scatters3], axes=[x_ax, y_ax], title=('Rating N'))
ipywidgets.VBox([slider,fig])


VBox(children=(IntSlider(value=50, max=250), Figure(axes=[Axis(label='Title', scale=OrdinalScale(), tick_rotat…

### The Shawshank Redemption

![jupyter](https://upload.wikimedia.org/wikipedia/en/8/81/ShawshankRedemptionMoviePoster.jpg)

#### Through data analysis, we can also find that the highest box office is not necessarily the best evaluation, and the film with high score may not get good box office. In The 2021 review, The Shawshank Redemption still has The highest rating. As mentioned before, the box office of films is affected by many factors, just like the epidemic in recent years, which has caused a great impact on the film industry. When we look for movies to relax, we should pay attention to both box office and movie reviews.