## Dataset
#### https://www.kaggle.com/datasets/victorsoeiro/netflix-tv-shows-and-movies?select=titles.csv

1. Developing a content-based recommender system using the genres and/or descriptions.
2. Identifying the main content available on the streaming.
3. Network analysis on the cast of the titles.
4. Exploratory data analysis to find interesting insights.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
import plotly.express as py
import plotly.graph_objects as go
from plotly.offline import init_notebook_mode, iplot
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

In [3]:
import plotly.io as pio
pio.renderers.default = "notebook_connected"

### Title of Movie & Show

In [4]:
title = pd.read_csv('titles.csv')
title.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.6,
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3
3,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,"['fantasy', 'action', 'comedy']",['GB'],,tt0071853,8.2,534486.0,15.461,7.811
4,tm120801,The Dirty Dozen,MOVIE,12 American military prisoners in World War II...,1967,,150,"['war', 'action']","['GB', 'US']",,tt0061578,7.7,72662.0,20.398,7.6


In [5]:
title.shape

(5850, 15)

In [6]:
title.genres.value_counts()

['comedy']                                                       484
['documentation']                                                329
['drama']                                                        328
['comedy', 'drama']                                              135
['drama', 'romance']                                             124
                                                                ... 
['drama', 'war', 'action', 'thriller', 'history', 'european']      1
['thriller', 'crime', 'drama', 'western']                          1
['drama', 'scifi', 'fantasy', 'horror']                            1
['horror', 'fantasy', 'thriller']                                  1
['documentation', 'music', 'reality']                              1
Name: genres, Length: 1726, dtype: int64

In [7]:
title1 = title.assign(genre=title['genres'].str.split(",")).explode('genre')

In [8]:
title1["genre"] = title1["genre"].str.replace("^\['|'\]$","")


The default value of regex will change from True to False in a future version.



In [9]:
title1['genre'] = title1["genre"].str.replace("'","")
# title['country'] = title["country"].str.replace("'","")

In [10]:
title1['genre'] = title1['genre'].str.strip()
# title['country'] = title['country'].str.strip()

In [11]:
title1.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score,genre
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.6,,documentation
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,crime
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3,drama
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3,action


In [12]:
title1['genre'].value_counts()

drama            2968
comedy           2325
thriller         1228
action           1157
romance           971
documentation     952
crime             936
animation         705
family            682
fantasy           630
scifi             589
european          443
horror            378
music             262
history           254
reality           234
sport             170
war               163
[]                 59
western            41
Name: genre, dtype: int64

In [13]:
title1[title1['genre'] == '[]']
title1.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score,genre
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.6,,documentation
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,crime
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3,drama
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3,action


In [14]:
title1.drop(title1[title1['genre'] == '[]'].index, inplace=True)

In [15]:
title1['genre'].value_counts()

drama            2968
comedy           2325
thriller         1228
action           1157
romance           971
documentation     952
crime             936
animation         705
family            682
fantasy           630
scifi             589
european          443
horror            378
music             262
history           254
reality           234
sport             170
war               163
western            41
Name: genre, dtype: int64

In [16]:
title.groupby('release_year')['release_year'].count().reset_index(name="count")

Unnamed: 0,release_year,count
0,1945,1
1,1954,2
2,1956,1
3,1958,1
4,1959,1
...,...,...
58,2018,773
59,2019,836
60,2020,814
61,2021,787


In [17]:
countData = title['release_year'].value_counts().rename_axis("Release Year").reset_index(name="Count")
countData.head(1)
# countData.plot(kind='bar', figsize=(12, 5))

Unnamed: 0,Release Year,Count
0,2019,836


In [18]:
# countData.iplot(kind='bar', x="Release Year", y="Count", title="Total content Release based Year")

bar = py.bar(countData, x="Release Year", y="Count", title="Total content Release based Year")
bar.show()

### Plot Based on Release Year

In [19]:
release_year = title.groupby(['release_year', 'type'])[['type']].apply(lambda x: len(x)).reset_index()
# countCont
movie = release_year[release_year['type'] == 'MOVIE']
show = release_year[release_year['type'] == 'SHOW']

# Movie
movie['Movie'] = movie[0]
movie = movie.drop([0, 'type'], axis=1)

# Show
show['Show'] = show[0]
show = show.drop([0, 'type'], axis=1)

In [20]:
release_year = pd.merge(movie, show, on='release_year', how='inner')
release_year.head(1)

Unnamed: 0,release_year,Movie,Show
0,1969,1,1


In [21]:
release_year.iplot(kind="bar", x="release_year", color=["dodgerblue", "mediumseagreen"], title="Release Year")

### Plot Based on Gernes Type

In [22]:
genData = title1.groupby(['genre', 'type'])[['type']].apply(lambda x: len(x)).reset_index()
movie=genData[genData['type'] == 'MOVIE']
show=genData[genData['type'] == 'SHOW']

# Movie
movie['movie'] = movie[0]
movie = movie.drop([0, 'type'], axis=1)

# Show
show['show'] = show[0]
show = show.drop([0, 'type'], axis=1)

In [23]:
genData = pd.merge(movie, show, on='genre', how='inner')
genData.head(1)

Unnamed: 0,genre,movie,show
0,action,718,439


In [24]:
genData.iplot(kind="bar", x="genre", color=["dodgerblue", "limegreen"], title="Genres")

### Plot Based on Gernes Type & IMDB Ratings (People like most)

In [25]:
genData = title1[title1['imdb_score'] > 5].groupby(['genre', 'type'])[['type']].apply(lambda x: len(x)).reset_index()
movie=genData[genData['type'] == 'MOVIE']
show=genData[genData['type'] == 'SHOW']

# Movie
movie['movie'] = movie[0]
movie = movie.drop([0, 'type'], axis=1)

# Show
show['show'] = show[0]
show = show.drop([0, 'type'], axis=1)

genData = pd.merge(movie, show, on='genre', how='inner')
genData.head(1)

Unnamed: 0,genre,movie,show
0,action,584,407


In [26]:
genData.iplot(kind="bar", x="genre", title="Genres", color=["dodgerblue", "limegreen"])

# Credit Dataset

In [27]:
credit = pd.read_csv('credits.csv')
credit.head()

Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR
2,7064,tm84618,Albert Brooks,Tom,ACTOR
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR


In [28]:
credit.shape

(77801, 5)

In [29]:
credit.id.unique().shape

(5489,)

In [30]:
title[title['id'] == 'tm84618']

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179


In [31]:
credit[credit['id'] == 'tm84618'].head()

Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR
2,7064,tm84618,Albert Brooks,Tom,ACTOR
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR


In [32]:
df = pd.merge(title1, credit, on='id', how='inner')
df.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score,genre,person_id,name,character,role
0,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama,3748,Robert De Niro,Travis Bickle,ACTOR
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama,14658,Jodie Foster,Iris Steensma,ACTOR
2,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama,7064,Albert Brooks,Tom,ACTOR
3,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama,3739,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179,drama,48933,Cybill Shepherd,Betsy,ACTOR


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 223223 entries, 0 to 223222
Data columns (total 20 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   id                    223223 non-null  object 
 1   title                 223223 non-null  object 
 2   type                  223223 non-null  object 
 3   description           223189 non-null  object 
 4   release_year          223223 non-null  int64  
 5   age_certification     144510 non-null  object 
 6   runtime               223223 non-null  int64  
 7   genres                223223 non-null  object 
 8   production_countries  223223 non-null  object 
 9   seasons               45961 non-null   float64
 10  imdb_id               217156 non-null  object 
 11  imdb_score            216381 non-null  float64
 12  imdb_votes            216219 non-null  float64
 13  tmdb_popularity       223203 non-null  float64
 14  tmdb_score            221139 non-null  float64
 15  

In [34]:
df.shape

(223223, 20)

In [35]:
df.groupby(['character', 'type'])['type'].count()

character            type 
"Barton"             MOVIE    2
"Big Tony" Hamilton  MOVIE    2
"Birdy"              SHOW     5
"Buła"               MOVIE    3
"Czuły Roman"        MOVIE    2
                             ..
青雄                   SHOW     4
볼트                   SHOW     1
새미                   SHOW     1
월주(아역)               SHOW     4
츄우                   SHOW     2
Name: type, Length: 48609, dtype: int64

In [36]:
countChar = df.groupby(['character', 'type'])['type'].count()
# df[df[countChar] == 'Barton']
countChar

character            type 
"Barton"             MOVIE    2
"Big Tony" Hamilton  MOVIE    2
"Birdy"              SHOW     5
"Buła"               MOVIE    3
"Czuły Roman"        MOVIE    2
                             ..
青雄                   SHOW     4
볼트                   SHOW     1
새미                   SHOW     1
월주(아역)               SHOW     4
츄우                   SHOW     2
Name: type, Length: 48609, dtype: int64

In [37]:
mylist = ["a", "a", "b", "c", "c"]
# mylist
[*set(mylist)]

['a', 'c', 'b']

In [38]:
from copy import copy, deepcopy
name = "Swarup"
id(name)

140679530630256

In [39]:
# Shallow Copy
n = copy(name)
id(n)

140679530630256

In [40]:
# Deep Copy
n1 = deepcopy(name)
id(n1)

140679530630256