In [2]:
import pandas as pd
from datetime import datetime

Load the dataset

In [3]:
netflix_titles = pd.read_csv("netflix_titles.csv")
netflix_titles.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


List all columns in the dataset

In [4]:
# getting the columns of the dataset
columns = list(netflix_titles.columns)
columns

['show_id',
 'type',
 'title',
 'director',
 'cast',
 'country',
 'date_added',
 'release_year',
 'rating',
 'duration',
 'listed_in',
 'description']

Check the missing values in each column using **isnull()** and **ismean()**

In [5]:
print("Missing values distribution: ")
print(netflix_titles.isnull().mean())

Missing values distribution: 
show_id         0.000000
type            0.000000
title           0.000000
director        0.299080
cast            0.093675
country         0.094357
date_added      0.001135
release_year    0.000000
rating          0.000454
duration        0.000341
listed_in       0.000000
description     0.000000
dtype: float64


How do I deal with these columns with missing values?

1. Drop the column completely. If the column isn’t that important to your analysis, just drop it.
2. Keep the column. In this case, because the director, cast and country columns maybe important for further analysis.
3. Imputation — the process of replacing missing data with substituted values

Check the datatype of each column in the dataset

In [6]:
print("Column datatypes: ")
print(netflix_titles.dtypes)

Column datatypes: 
show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object


Collect all columns with the same datatype in a list

In [7]:
# getting all the columns with string/mixed type values
str_cols = list(netflix_titles.columns)
str_cols.remove('release_year')

Remove the leading and trailing whitespaces in the columns

In [8]:
# removing leading and trailing characters from columns with str type
for i in str_cols:
    netflix_titles[i] = netflix_titles[i].str.strip()

Put all columns with NaN/missing values in a list

In [9]:
# names of the columns
columns = ['director', 'cast', 'country', 'rating', 'date_added']

Loop through the columns to fill the entries with NaN values with ""

In [10]:
for column in columns:
    netflix_titles[column] = netflix_titles[column].fillna("")

In [11]:
rows = []
for i in range(len(netflix_titles)):
    if netflix_titles['date_added'].iloc[i] == "":
        rows.append(i)

In [12]:
rows

[6066, 6174, 6795, 6806, 6901, 7196, 7254, 7406, 7847, 8182]

In [13]:
# examine those rows to confirm null state
netflix_titles.loc[rows, :]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
6066,s6067,TV Show,A Young Doctor's Notebook and Other Stories,,"Daniel Radcliffe, Jon Hamm, Adam Godley, Chris...",United Kingdom,,2013,TV-MA,2 Seasons,"British TV Shows, TV Comedies, TV Dramas","Set during the Russian Revolution, this comic ..."
6174,s6175,TV Show,Anthony Bourdain: Parts Unknown,,Anthony Bourdain,United States,,2018,TV-PG,5 Seasons,Docuseries,This CNN original series has chef Anthony Bour...
6795,s6796,TV Show,Frasier,,"Kelsey Grammer, Jane Leeves, David Hyde Pierce...",United States,,2003,TV-PG,11 Seasons,"Classic & Cult TV, TV Comedies",Frasier Crane is a snooty but lovable Seattle ...
6806,s6807,TV Show,Friends,,"Jennifer Aniston, Courteney Cox, Lisa Kudrow, ...",United States,,2003,TV-14,10 Seasons,"Classic & Cult TV, TV Comedies",This hit sitcom follows the merry misadventure...
6901,s6902,TV Show,Gunslinger Girl,,"Yuuka Nanri, Kanako Mitsuhashi, Eri Sendai, Am...",Japan,,2008,TV-14,2 Seasons,"Anime Series, Crime TV Shows","On the surface, the Social Welfare Agency appe..."
7196,s7197,TV Show,Kikoriki,,Igor Dmitriev,,,2010,TV-Y,2 Seasons,Kids' TV,A wacky rabbit and his gang of animal pals hav...
7254,s7255,TV Show,La Familia P. Luche,,"Eugenio Derbez, Consuelo Duval, Luis Manuel Áv...",United States,,2012,TV-14,3 Seasons,"International TV Shows, Spanish-Language TV Sh...","This irreverent sitcom featues Ludovico, Feder..."
7406,s7407,TV Show,Maron,,"Marc Maron, Judd Hirsch, Josh Brener, Nora Zeh...",United States,,2016,TV-MA,4 Seasons,TV Comedies,"Marc Maron stars as Marc Maron, who interviews..."
7847,s7848,TV Show,Red vs. Blue,,"Burnie Burns, Jason Saldaña, Gustavo Sorola, G...",United States,,2015,NR,13 Seasons,"TV Action & Adventure, TV Comedies, TV Sci-Fi ...","This parody of first-person shooter games, mil..."
8182,s8183,TV Show,The Adventures of Figaro Pho,,"Luke Jurevicius, Craig Behenna, Charlotte Haml...",Australia,,2015,TV-Y7,2 Seasons,"Kids' TV, TV Comedies","Imagine your worst fears, then multiply them: ..."


See if there are any other variables that you can obtain by extracting them from other variables

In [14]:
# extracting months added and years added
month_added = []
year_added = []

In [15]:
netflix_titles

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


Put values into month_added and year_added lists

In [16]:
for i in range(len(netflix_titles)):
    # replacing NaN values with 0
    if i in rows:
        month_added.append(0)
        year_added.append(0)
    else:
        date = netflix_titles['date_added'].iloc[i].split(" ")
        month_added.append(date[0])
        year_added.append(int(date[2])) 

Convert month names into numbers

In [17]:
# turning month names into month numbers
for i, month in enumerate(month_added):
    if month != 0:
        datetime_obj = datetime.strptime(month, "%B")
        month_number = datetime_obj.month
        month_added[i] = month_number

Verify month_added and year_added

In [18]:
# checking all months
print(set(month_added))
print(set(year_added))

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
{2016, 2017, 2018, 2019, 2020, 2021, 0, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015}


Inserting the month and year columns into the dataset

In [19]:
netflix_titles.insert(7, "month_added", month_added, allow_duplicates = True)
netflix_titles.insert(8, "year_added", year_added, allow_duplicates = True)
netflix_titles.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",9,2021,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",9,2021,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",9,2021,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",9,2021,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",9,2021,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


Separating original dataset to tv show and movie dataset respectively

In [20]:
shows = []
films = []

# looping through the dataset to identify rows that are TV shows and films
for i in range(len(netflix_titles)):
    if netflix_titles['type'].iloc[i] == "TV Show":
        shows.append(i)
    else:
        films.append(i)
 
# grouping rows that are TV shows
netflix_shows = netflix_titles.loc[shows, :]

#grouping rows that are films
netflix_films = netflix_titles.loc[films, :]

# reseting the index of the new datasets
netflix_shows = netflix_shows.set_index([pd.Index(range(0, len(netflix_shows)))])
netflix_films = netflix_films.set_index([pd.Index(range(0, len(netflix_films)))])

In [38]:
netflix_shows

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,seasons,listed_in,description
0,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",9,2021,2021,TV-MA,2,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
1,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",9,2021,2021,TV-MA,1,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
2,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",9,2021,2021,TV-MA,1,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
3,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",9,2021,2021,TV-MA,2,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
4,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",9,2021,2021,TV-MA,1,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2671,s8796,TV Show,Yu-Gi-Oh! Arc-V,,"Mike Liscio, Emily Bauer, Billy Bob Thompson, ...","Japan, Canada","May 1, 2018",5,2018,2015,TV-Y7,2,"Anime Series, Kids' TV",Now that he's discovered the Pendulum Summonin...
2672,s8797,TV Show,Yunus Emre,,"Gökhan Atalay, Payidar Tüfekçioglu, Baran Akbu...",Turkey,"January 17, 2017",1,2017,2016,TV-PG,2,"International TV Shows, TV Dramas","During the Mongol invasions, Yunus Emre leaves..."
2673,s8798,TV Show,Zak Storm,,"Michael Johnston, Jessica Gee-George, Christin...","United States, France, South Korea, Indonesia","September 13, 2018",9,2018,2016,TV-Y7,3,Kids' TV,Teen surfer Zak Storm is mysteriously transpor...
2674,s8801,TV Show,Zindagi Gulzar Hai,,"Sanam Saeed, Fawad Khan, Ayesha Omer, Mehreen ...",Pakistan,"December 15, 2016",12,2016,2012,TV-PG,1,"International TV Shows, Romantic TV Shows, TV ...","Strong-willed, middle-class Kashaf and carefre..."


In [39]:
netflix_films

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,length,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",9,2021,2020,PG-13,90,Documentaries,"As her father nears the end of his life, filmm..."
1,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",9,2021,2021,PG,91,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
2,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",9,2021,1993,TV-MA,125,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
3,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",9,2021,2021,PG-13,104,"Comedies, Dramas",A woman adjusting to life after a loss contend...
4,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",9,2021,2021,TV-MA,127,"Dramas, International Movies",After most of her family is murdered in a terr...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6126,s8802,Movie,Zinzana,Majid Al Ansari,"Ali Suliman, Saleh Bakri, Yasa, Ali Al-Jabri, ...","United Arab Emirates, Jordan","March 9, 2016",3,2016,2015,TV-MA,96,"Dramas, International Movies, Thrillers",Recovering alcoholic Talal wakes up inside a s...
6127,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",11,2019,2007,R,158,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
6128,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",11,2019,2009,R,88,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
6129,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",1,2020,2006,PG,88,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [21]:
# get length of movie or number of seasons of show
def getDuration(data):
    count = 0
    durations = []
    for value in data:
	# filling in missing values
        if type(value) is float:
            durations.append(0)
        else:
            values = value.split(" ")
            durations.append(int(values[0]))
    return durations

In [22]:
# inserting new duration type column for shows (renamed column)
netflix_shows.insert(11, 'seasons', getDuration(netflix_shows['duration']))
netflix_shows = netflix_shows.drop(['duration'], axis=1)
netflix_shows.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,seasons,listed_in,description
0,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",9,2021,2021,TV-MA,2,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
1,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",9,2021,2021,TV-MA,1,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
2,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",9,2021,2021,TV-MA,1,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
3,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",9,2021,2021,TV-MA,2,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
4,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",9,2021,2021,TV-MA,1,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...


In [23]:
# inserting new duration type column for films (renamed column)
netflix_films.insert(11, 'length', getDuration(netflix_films['duration']))
netflix_films = netflix_films.drop(['duration'], axis=1)
netflix_films.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,length,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",9,2021,2020,PG-13,90,Documentaries,"As her father nears the end of his life, filmm..."
1,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",9,2021,2021,PG,91,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
2,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",9,2021,1993,TV-MA,125,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
3,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",9,2021,2021,PG-13,104,"Comedies, Dramas",A woman adjusting to life after a loss contend...
4,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",9,2021,2021,TV-MA,127,"Dramas, International Movies",After most of her family is murdered in a terr...


In [24]:
# getting the unique ratings for films
netflix_films['rating'].unique()

array(['PG-13', 'PG', 'TV-MA', 'TV-PG', 'TV-14', 'TV-Y', 'R', 'TV-G',
       'TV-Y7', 'G', 'NC-17', '74 min', '84 min', '66 min', 'NR', '',
       'TV-Y7-FV', 'UR'], dtype=object)

In [25]:
# getting the unique ratings for shows
netflix_shows['rating'].unique()

array(['TV-MA', 'TV-14', 'TV-Y7', 'TV-PG', 'TV-Y', 'TV-G', 'R', 'NR', '',
       'TV-Y7-FV'], dtype=object)

In [26]:
incorrect_ratings = ['74 min', '84 min', '66 min']
for i in range(len(netflix_films)):
    if netflix_films['rating'].iloc[i] in incorrect_ratings:
        print(netflix_films.iloc[i])
        print("")

show_id                                                     s5542
type                                                        Movie
title                                             Louis C.K. 2017
director                                               Louis C.K.
cast                                                   Louis C.K.
country                                             United States
date_added                                          April 4, 2017
month_added                                                     4
year_added                                                   2017
release_year                                                 2017
rating                                                     74 min
length                                                          0
listed_in                                                  Movies
description     Louis C.K. muses on religion, eternal love, gi...
Name: 3562, dtype: object

show_id                                          

In [27]:
# getting the row indices
index = [3562, 3738, 3747]

# fixing the entries
for i in index:
    split_value = netflix_films['rating'].iloc[i].split(" ")
    length = split_value[0]
    netflix_films['length'].iloc[i] = length
    netflix_films['rating'].iloc[i] = "NR"
    
# double checking the entries again
for i in index:
    print(netflix_films.iloc[i])

show_id                                                     s5542
type                                                        Movie
title                                             Louis C.K. 2017
director                                               Louis C.K.
cast                                                   Louis C.K.
country                                             United States
date_added                                          April 4, 2017
month_added                                                     4
year_added                                                   2017
release_year                                                 2017
rating                                                         NR
length                                                         74
listed_in                                                  Movies
description     Louis C.K. muses on religion, eternal love, gi...
Name: 3562, dtype: object
show_id                                           

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['length'].iloc[i] = length
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"


In [28]:
# fixing the entries
for i in range(len(netflix_films)):
    if netflix_films['rating'].iloc[i] == "UR":
        netflix_films['rating'].iloc[i] = "NR"

# double checking
netflix_films['rating'].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_films['rating'].iloc[i] = "NR"


array(['PG-13', 'PG', 'TV-MA', 'TV-PG', 'TV-14', 'TV-Y', 'R', 'TV-G',
       'TV-Y7', 'G', 'NC-17', 'NR', '', 'TV-Y7-FV'], dtype=object)

In [29]:
# function to get unique values of a column
def getUnique(data):
    unique_values = set()
    for value in data:
        if type(value) is float:
            unique_values.add(None)
        else:
            values = value.split(", ")
            for i in values:
                unique_values.add(i)
    return list(unique_values)

In [30]:
unique_countries = getUnique(netflix_titles['country'])
unique_countries

['',
 'Lebanon',
 'Malta',
 'Australia',
 'Cambodia',
 'Mexico',
 'Guatemala',
 'Argentina',
 'Indonesia',
 'Denmark',
 'Vietnam',
 'Slovenia',
 'Italy',
 'Norway',
 'Poland,',
 'Mozambique',
 'Canada',
 'Mongolia',
 'South Korea',
 'Uruguay',
 'Chile',
 'Venezuela',
 'Cayman Islands',
 'Paraguay',
 'Netherlands',
 'Puerto Rico',
 'Kenya',
 'Iraq',
 'Spain',
 'Botswana',
 'Kazakhstan',
 'Ghana',
 'Uganda',
 'Zimbabwe',
 'New Zealand',
 'Serbia',
 'United Arab Emirates',
 'Bermuda',
 'Singapore',
 'India',
 'Slovakia',
 'Malawi',
 'Palestine',
 'Bulgaria',
 'China',
 'Ireland',
 'Namibia',
 'Turkey',
 'Russia',
 'Ecuador',
 'Iran',
 'France',
 'Cameroon',
 'Ethiopia',
 'Israel',
 'Sweden',
 'Mauritius',
 'Peru',
 'Germany',
 'Senegal',
 'Qatar',
 'Montenegro',
 'Lithuania',
 'Syria',
 'Latvia',
 'Panama',
 'Austria',
 'Nicaragua',
 'West Germany',
 'Jamaica',
 'Afghanistan',
 'East Germany',
 'Brazil',
 'Greece',
 'Soviet Union',
 'Kuwait',
 'Vatican City',
 'Ukraine',
 'Hungary',
 'Alg

In [31]:
# converting soviet union to russia and east/west germany to germany
for i in range(len(netflix_titles)):
    if type(netflix_titles['country'].iloc[i]) is not float:
        countries = netflix_titles['country'].iloc[i].split(", ")
        for j in range(len(countries)):
            if "Germany" in countries[j]:
                countries[j] = "Germany"
            elif "Soviet Union" in countries[j]:
                countries[j] = "Russia"
        netflix_titles['country'].iloc[i] = ", ".join(countries)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_titles['country'].iloc[i] = ", ".join(countries)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_titles['country'].iloc[i] = ", ".join(countries)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  netflix_titles['country'].iloc[i] = ", ".join(countries)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [32]:
# getting unique film genres
unique_genres_films = getUnique(netflix_films['listed_in'])
unique_genres_films

['Romantic Movies',
 'Sports Movies',
 'Cult Movies',
 'Documentaries',
 'Horror Movies',
 'Sci-Fi & Fantasy',
 'Children & Family Movies',
 'Thrillers',
 'Faith & Spirituality',
 'Stand-Up Comedy',
 'Dramas',
 'Action & Adventure',
 'LGBTQ Movies',
 'International Movies',
 'Music & Musicals',
 'Movies',
 'Classic Movies',
 'Independent Movies',
 'Anime Features',
 'Comedies']

In [33]:
# getting unique show genres
unique_genres_shows = getUnique(netflix_shows['listed_in'])
unique_genres_shows

['TV Horror',
 'Stand-Up Comedy & Talk Shows',
 'International TV Shows',
 'TV Sci-Fi & Fantasy',
 'Teen TV Shows',
 'TV Action & Adventure',
 'Crime TV Shows',
 'Korean TV Shows',
 'Reality TV',
 'TV Shows',
 'Classic & Cult TV',
 'Anime Series',
 'TV Dramas',
 'British TV Shows',
 'Romantic TV Shows',
 'TV Thrillers',
 'Science & Nature TV',
 'Docuseries',
 'TV Mysteries',
 'TV Comedies',
 'Spanish-Language TV Shows',
 "Kids' TV"]

In [34]:
# checking for TV shows
# replace netflix_shows with netflix_films to check for movies
count = 0
index = []
for i, value in enumerate(netflix_shows['listed_in']):
    genres = value.split(", ")
    if "TV Shows" in genres:
        count += 1
        index.append(i)
print("count %s" %count)
print("index %s" %index)

count 16
index [59, 110, 272, 286, 452, 599, 991, 1432, 1548, 1808, 1840, 2107, 2160, 2190, 2465, 2559]


In [35]:
# printing the first 5 rows of all rows that have TV Shows as its genre
netflix_shows.iloc[index[0:5]]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,seasons,listed_in,description
59,s149,TV Show,HQ Barbers,Gerhard Mostert,"Hakeem Kae-Kazim, Chioma Omeruah, Orukotan Ade...",,"September 1, 2021",9,2021,2020,TV-14,1,TV Shows,When a family run barber shop in the heart of ...
110,s298,TV Show,Navarasa,"Bejoy Nambiar, Priyadarshan, Karthik Narain, V...","Suriya, Vijay Sethupathi, Revathy, Prakash Raj...",India,"August 6, 2021",8,2021,2021,TV-MA,1,TV Shows,"From amusement to awe, the nine human emotions..."
272,s727,TV Show,Metallica: Some Kind of Monster,"Joe Berlinger, Bruce Sinofsky","James Hetfield, Lars Ulrich, Kirk Hammett, Rob...",United States,"June 13, 2021",6,2021,2014,TV-MA,1,TV Shows,This collection includes the acclaimed rock do...
286,s772,TV Show,Pretty Guardian Sailor Moon Eternal The Movie,Chiaki Kon,"Kotono Mitsuishi, Hisako Kanemoto, Rina Satou,...",,"June 3, 2021",6,2021,2021,TV-14,1,TV Shows,When a dark power enshrouds the Earth after a ...
452,s1332,TV Show,Five Came Back: The Reference Films,,,United States,"February 9, 2021",2,2021,1945,TV-MA,1,TV Shows,This collection includes 12 World War II-era p...


In [36]:
count = 0
index = []
for i, value in enumerate(netflix_films['listed_in']):
    genres = value.split(", ")
    if "Movies" in genres:
        count += 1
        index.append(i)
print("count %s" %count)
print("index %s" %index)

count 57
index [197, 310, 456, 457, 458, 476, 477, 1906, 1938, 1941, 2146, 2165, 2621, 2711, 2758, 2862, 2863, 2867, 3036, 3137, 3138, 3139, 3140, 3141, 3142, 3225, 3226, 3228, 3232, 3517, 3562, 3652, 3694, 3722, 3738, 3747, 3789, 3824, 3883, 4271, 4273, 4543, 4544, 4784, 4910, 4911, 5006, 5178, 5259, 5290, 5292, 5293, 5295, 5476, 5477, 5478, 6092]


In [37]:
# printing the first 5 rows of all rows that have Movies as its genre
netflix_films.iloc[index[0:5]]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,month_added,year_added,release_year,rating,length,listed_in,description
197,s309,Movie,American Masters: Inventing David Geffen,Susan Lacy,David Geffen,United States,"August 4, 2021",8,2021,2012,TV-MA,115,Movies,"The son of Jewish immigrants, David Geffen eme..."
310,s471,Movie,Bridgerton - The Afterparty,,"David Spade, London Hughes, Fortune Feimster",,"July 13, 2021",7,2021,2021,TV-14,39,Movies,"""Bridgerton"" cast members share behind-the-sce..."
456,s730,Movie,Bling Empire - The Afterparty,,"David Spade, London Hughes, Fortune Feimster",,"June 12, 2021",6,2021,2021,TV-MA,36,Movies,"The stars of ""Bling Empire"" discuss the show's..."
457,s731,Movie,Cobra Kai - The Afterparty,,"David Spade, London Hughes, Fortune Feimster",,"June 12, 2021",6,2021,2021,TV-MA,34,Movies,"Ralph Macchio, William Zabka and more from the..."
458,s733,Movie,To All the Boys: Always and Forever - The Afte...,,"Cast members of the ""To All the Boys"" films di...",,"June 12, 2021",6,2021,2021,TV-MA,36,Movies,"Cast members of the ""To All the Boys"" films di..."
