<a href="https://colab.research.google.com/github/gauriagarwal18/NETFLIX_MOVIES_AND_TV_SHOWS_CLUSTERING/blob/master/Netflix_Movies_And_Tv_Shows_Clustering_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Problem Statement**

This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

## <b>In this  project, you are required to do </b>
1. Exploratory Data Analysis 

2. Understanding what type content is available in different countries

3. Is Netflix has increasingly focusing on TV rather than movies in recent years.
4. Clustering similar content by matching text-based features



# **Attribute Information**

1. show_id : Unique ID for every Movie / Tv Show

2. type : Identifier - A Movie or TV Show

3. title : Title of the Movie / Tv Show

4. director : Director of the Movie

5. cast : Actors involved in the movie / show

6. country : Country where the movie / show was produced

7. date_added : Date it was added on Netflix

8. release_year : Actual Releaseyear of the movie / show

9. rating : TV Rating of the movie / show

10. duration : Total Duration - in minutes or number of seasons

11. listed_in : Genere

12. description: The Summary description

In [None]:
import numpy as np
import scipy
import pandas as pd
import math
import random
import sklearn
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
from sklearn.decomposition import LatentDirichletAllocation
import matplotlib.pyplot as plt

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
path = '/content/drive/MyDrive/AlmaBetter_Capstone_projects/capstone_project4/Copy of NETFLIX MOVIES AND TV SHOWS CLUSTERING.csv'
netflix_original = pd.read_csv(path,parse_dates=[6])

In [None]:
netflix_data = netflix_original.copy()

In [None]:
netflix_data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,2020-08-14,2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,2016-12-23,2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,2018-12-20,2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,2017-11-16,2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,2020-01-01,2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


In [None]:
netflix_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7787 entries, 0 to 7786
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   show_id       7787 non-null   object        
 1   type          7787 non-null   object        
 2   title         7787 non-null   object        
 3   director      5398 non-null   object        
 4   cast          7069 non-null   object        
 5   country       7280 non-null   object        
 6   date_added    7777 non-null   datetime64[ns]
 7   release_year  7787 non-null   int64         
 8   rating        7780 non-null   object        
 9   duration      7787 non-null   object        
 10  listed_in     7787 non-null   object        
 11  description   7787 non-null   object        
dtypes: datetime64[ns](1), int64(1), object(10)
memory usage: 730.2+ KB


In [None]:
netflix_data.describe(include = 'all')

  """Entry point for launching an IPython kernel.


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
count,7787,7787,7787,5398,7069,7280,7777,7787.0,7780,7787,7787,7787
unique,7787,2,7787,4049,6831,681,1512,,14,216,492,7769
top,s1,Movie,3%,"Raúl Campos, Jan Suter",David Attenborough,United States,2020-01-01 00:00:00,,TV-MA,1 Season,Documentaries,Multiple women report their husbands as missin...
freq,1,5377,1,18,18,2555,119,,2863,1608,334,3
first,,,,,,,2008-01-01 00:00:00,,,,,
last,,,,,,,2021-01-16 00:00:00,,,,,
mean,,,,,,,,2013.93258,,,,
std,,,,,,,,8.757395,,,,
min,,,,,,,,1925.0,,,,
25%,,,,,,,,2013.0,,,,


##Data cleaning

In [None]:
netflix=netflix_data

In [None]:
from datetime import datetime
from datetime import date

In [None]:
def separate_date(date_time):
  years,months,dates=[],[],[]
  for i in date_time:
    years.append(i.year)
    months.append(i.month)
    dates.append(i.day)
  return years,months,dates
  

In [None]:
years,months,dates=separate_date(netflix["date_added"])
netflix["year_added"]=years
netflix["month_added"]=months

In [None]:
netflix.drop(["director","date_added","show_id"], axis=1, inplace=True)

In [None]:
#for casts column
netflix["cast"].fillna("",inplace=True)
count_replace=["country","ratings","year_added","month_added"]

In [None]:
netflix_shows= netflix[netflix["type"]=="TV Show"]
netflix_movies=netflix[netflix["type"]=="Movie"]

In [None]:
#now we note that all the features in our data are numerical
continuous_features=list(netflix.describe().columns)
categorical_features=list(netflix.describe(include="object").columns)


In [None]:
def print_null_percent(df):
  null_percent=pd.Series()
  for col in df.columns:
    null_percent[col]=((df.shape[0]-df[col].count())/(df.shape[0]))*100
  print("columns with null values\n",null_percent[null_percent!=0])


In [None]:

def cleaning(df,continuous_col=[],discrete_col=[],print_null=True,th=20.0):
  """
  this function removes all the null values from the data 
  """

  print(f"before cleaning\n")
  print(f"shape of data: {df.shape}")
  if(print_null):
    print_null_percent(df)
  
  #step1
  #preserving columns having at least 20% of not null values
  df.dropna(axis=1,inplace=True,thresh=((th/100.0)*df.shape[0]))
  #preserving rows having at least 20% of not null values
  df.dropna(axis=0,inplace=True,thresh=((th/100.0)*df.shape[1]))

  #step2
  df.drop_duplicates(inplace=True,ignore_index=True)
  

  #step3
  #removing all the null values
  for c1 in df.columns:

    #i.e it is an non catagorical column
    if c1 in continuous_col: 
      df[c1].fillna(df[c1].mean(),inplace=True)
    else:
      df[c1].fillna(df[c1].value_counts().idxmax(),inplace=True)

  print(f"\n\nAfter cleaning the data\n")
  print(f"shape of data: {df.shape}")
  print_null_percent(df)
  return df

In [None]:
netflix_shows=cleaning(netflix_shows,[],count_replace,th=20)

before cleaning

shape of data: (2410, 11)
columns with null values
 country        11.493776
rating          0.082988
year_added      0.414938
month_added     0.414938
dtype: float64


After cleaning the data

shape of data: (2410, 11)
columns with null values
 Series([], dtype: float64)


  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [None]:
netflix_movies=cleaning(netflix_movies,[],count_replace,th=20)

before cleaning

shape of data: (5377, 11)
columns with null values
 country    4.277478
rating     0.092989
dtype: float64


After cleaning the data

shape of data: (5377, 11)
columns with null values
 Series([], dtype: float64)


  


In [None]:
help(pd.concat)

Help on function concat in module pandas.core.reshape.concat:

concat(objs: 'Iterable[NDFrame] | Mapping[Hashable, NDFrame]', axis=0, join='outer', ignore_index: 'bool' = False, keys=None, levels=None, names=None, verify_integrity: 'bool' = False, sort: 'bool' = False, copy: 'bool' = True) -> 'FrameOrSeriesUnion'
    Concatenate pandas objects along a particular axis with optional set logic
    along the other axes.
    
    Can also add a layer of hierarchical indexing on the concatenation axis,
    which may be useful if the labels are the same (or overlapping) on
    the passed axis number.
    
    Parameters
    ----------
    objs : a sequence or mapping of Series or DataFrame objects
        If a mapping is passed, the sorted keys will be used as the `keys`
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised.
    a

In [None]:
netflix=pd.concat([netflix_shows,netflix_movies])
netflix

Unnamed: 0,type,title,cast,country,release_year,rating,duration,listed_in,description,year_added,month_added
0,TV Show,3%,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...,2020.0,8.0
1,TV Show,46,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,2016,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries",A genetics professor experiments with a treatm...,2017.0,7.0
2,TV Show,1983,"Robert Więckiewicz, Maciej Musiał, Michalina O...","Poland, United States",2018,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Dramas","In this dark alt-history thriller, a naïve law...",2018.0,11.0
3,TV Show,1994,,Mexico,2019,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...",Archival video and new interviews examine Mexi...,2019.0,5.0
4,TV Show,Feb-09,"Shahd El Yaseen, Shaila Sabt, Hala, Hanadi Al-...",United States,2018,TV-14,1 Season,"International TV Shows, TV Dramas","As a psychology professor faces Alzheimer's, h...",2019.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...
5372,Movie,Zoom,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",2020.0,1.0
5373,Movie,Zozo,"Imad Creidi, Antoinette Turk, Elias Gergi, Car...","Sweden, Czech Republic, United Kingdom, Denmar...",2005,TV-MA,99 min,"Dramas, International Movies",When Lebanon's Civil War deprives Zozo of his ...,2020.0,10.0
5374,Movie,Zubaan,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,2019.0,3.0
5375,Movie,Zulu Man in Japan,Nasty C,United States,2019,TV-MA,44 min,"Documentaries, International Movies, Music & M...","In this documentary, South African rapper Nast...",2020.0,9.0


In [None]:
netflix_data=netflix

##Reading Data And Importing Libraries Required For Analysis


**We are using the following libraries for analysis:**
- Numpy: We will use numpy arrays as they are comparitively faster than lists, also columns of dataframes behaves as numpy arrays

- Pandas: for reading the data from csv file, for data cleaning and for preparing data for analysis

- matplotlib,seaborn: for different visualisations, for drawing conclusions from data and for exploratory data analysis. 

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
#Download datasets 
from google.colab import drive
drive.mount('/content/drive')
import os
path="/content/drive/My Drive/AlmaBetter_Capstone_projects/capstone_project4/Copy of NETFLIX MOVIES AND TV SHOWS CLUSTERING.csv"
netflix_original=pd.read_csv(path,parse_dates=[6])
netflix= netflix_original.copy()

Mounted at /content/drive


In [None]:
netflix["type"].value_counts()

Movie      5377
TV Show    2410
Name: type, dtype: int64

Before making any analysis or cleaning we must note that dataset have two types: movies and TV show, and both type of shows have different qualities so let's first separate these two.

##Data Cleaning And Description
Describing the data and understanding the distribution of the columns.

Cleaning the data,removing null values if present and removing duplicates and outliers.

###Data Description

In [None]:
#getting an idea about what type of data each column have, by having a look at top 5 rows of data
netflix.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,2020-08-14,2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,2016-12-23,2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,2018-12-20,2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,2017-11-16,2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,2020-01-01,2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


In [None]:
#we must note the info separately for both types as well
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7787 entries, 0 to 7786
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   show_id       7787 non-null   object        
 1   type          7787 non-null   object        
 2   title         7787 non-null   object        
 3   director      5398 non-null   object        
 4   cast          7069 non-null   object        
 5   country       7280 non-null   object        
 6   date_added    7777 non-null   datetime64[ns]
 7   release_year  7787 non-null   int64         
 8   rating        7780 non-null   object        
 9   duration      7787 non-null   object        
 10  listed_in     7787 non-null   object        
 11  description   7787 non-null   object        
dtypes: datetime64[ns](1), int64(1), object(10)
memory usage: 730.2+ KB


from this we lead to following conclusions:
- director is not available in most of the tv shows so this column is not that important so we will drop it.
- the null value in casts will be replced by an empty string as cast can not be guessed  randomly
- the null values in country and rating columns will be replced by the country and ratings respectively occuring maximum number of times 
- for date_added column,  we will first divide the column into three subcolumns and then we will work accordingly.
 

In [None]:
netflix.describe()

Unnamed: 0,release_year
count,7787.0
mean,2013.93258
std,8.757395
min,1925.0
25%,2013.0
50%,2017.0
75%,2018.0
max,2021.0


In [None]:
netflix.describe(include="object")


Unnamed: 0,show_id,type,title,director,cast,country,rating,duration,listed_in,description
count,7787,7787,7787,5398,7069,7280,7780,7787,7787,7787
unique,7787,2,7787,4049,6831,681,14,216,492,7769
top,s1,Movie,3%,"Raúl Campos, Jan Suter",David Attenborough,United States,TV-MA,1 Season,Documentaries,Multiple women report their husbands as missin...
freq,1,5377,1,18,18,2555,2863,1608,334,3


###Data Cleaning

In [None]:
from datetime import datetime
from datetime import date

In [None]:
def separate_date(date_time):
  years,months,dates=[],[],[]
  for i in date_time:
    years.append(i.year)
    months.append(i.month)
    dates.append(i.day)
  return years,months,dates
  

In [None]:
years,months,dates=separate_date(netflix["date_added"])
netflix["year_added"]=years
netflix["month_added"]=months

In [None]:
netflix.drop(["director","date_added","show_id"], axis=1, inplace=True)

In [None]:
#for casts column
netflix["cast"].fillna("",inplace=True)
count_replace=["country","ratings","year_added","month_added"]

In [None]:
netflix_shows= netflix[netflix["type"]=="TV Show"]
netflix_movies=netflix[netflix["type"]=="Movie"]

In [None]:
netflix_shows.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2410 entries, 0 to 7785
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   type          2410 non-null   object 
 1   title         2410 non-null   object 
 2   cast          2410 non-null   object 
 3   country       2133 non-null   object 
 4   release_year  2410 non-null   int64  
 5   rating        2408 non-null   object 
 6   duration      2410 non-null   object 
 7   listed_in     2410 non-null   object 
 8   description   2410 non-null   object 
 9   year_added    2400 non-null   float64
 10  month_added   2400 non-null   float64
dtypes: float64(2), int64(1), object(8)
memory usage: 225.9+ KB


In [None]:
netflix_movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5377 entries, 1 to 7786
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   type          5377 non-null   object 
 1   title         5377 non-null   object 
 2   cast          5377 non-null   object 
 3   country       5147 non-null   object 
 4   release_year  5377 non-null   int64  
 5   rating        5372 non-null   object 
 6   duration      5377 non-null   object 
 7   listed_in     5377 non-null   object 
 8   description   5377 non-null   object 
 9   year_added    5377 non-null   float64
 10  month_added   5377 non-null   float64
dtypes: float64(2), int64(1), object(8)
memory usage: 504.1+ KB


In [None]:
#now we note that all the features in our data are numerical
continuous_features=list(netflix.describe().columns)
categorical_features=list(netflix.describe(include="object").columns)


In [None]:
#shape of data we have before data cleaning
netflix.shape

(7787, 11)

In [None]:
netflix.columns

Index(['type', 'title', 'cast', 'country', 'release_year', 'rating',
       'duration', 'listed_in', 'description', 'year_added', 'month_added'],
      dtype='object')

In [None]:
def print_null_percent(df):
  null_percent=pd.Series()
  for col in df.columns:
    null_percent[col]=((df.shape[0]-df[col].count())/(df.shape[0]))*100
  print("columns with null values\n",null_percent[null_percent!=0])


In [None]:

def cleaning(df,continuous_col=[],discrete_col=[],print_null=True,th=20.0):
  """
  this function removes all the null values from the data 
  """

  print(f"before cleaning\n")
  print(f"shape of data: {df.shape}")
  if(print_null):
    print_null_percent(df)
  
  #step1
  #preserving columns having at least 20% of not null values
  df.dropna(axis=1,inplace=True,thresh=((th/100.0)*df.shape[0]))
  #preserving rows having at least 20% of not null values
  df.dropna(axis=0,inplace=True,thresh=((th/100.0)*df.shape[1]))

  #step2
  df.drop_duplicates(inplace=True,ignore_index=True)
  

  #step3
  #removing all the null values
  for c1 in df.columns:

    #i.e it is an non catagorical column
    if c1 in continuous_col: 
      df[c1].fillna(df[c1].mean(),inplace=True)
    else:
      df[c1].fillna(df[c1].value_counts().idxmax(),inplace=True)

  print(f"\n\nAfter cleaning the data\n")
  print(f"shape of data: {df.shape}")
  print_null_percent(df)
  return df

In [None]:
netflix_shows=cleaning(netflix_shows,[],count_replace,th=20)

before cleaning

shape of data: (2410, 11)
columns with null values
 country        11.493776
rating          0.082988
year_added      0.414938
month_added     0.414938
dtype: float64


After cleaning the data

shape of data: (2410, 11)
columns with null values
 Series([], dtype: float64)


  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [None]:
netflix_movies=cleaning(netflix_movies,[],count_replace,th=20)

before cleaning

shape of data: (5377, 11)
columns with null values
 country    4.277478
rating     0.092989
dtype: float64


After cleaning the data

shape of data: (5377, 11)
columns with null values
 Series([], dtype: float64)


  


In [None]:
def remove_outliers2(df,continuous_col=[]):

  if len(continuous_col)==0:

   continuous_col=df.describe().columns
  df[continuous_col].boxplot(rot=90)
  plt.title("before removing outliers",)
  plt.show()
  
  for c in continuous_col:
    df.index=np.arange(0,df.shape[0])
    Q1=np.quantile(df[c],0.25)
    Q3=np.quantile(df[c],0.75)
    IQR= Q3 - Q1
    upper=np.where(df[c]>=(Q3+1.5*IQR))[0]
    #print(upper[0])
    lower=np.where(df[c]<=(Q1-1.5*IQR))[0]   #it will be a tuple and we require a numpy array which is at it's first index.
    #print(lower)
    outliers_idx=np.unique(np.append(upper,lower)) 
    df.drop(outliers_idx, inplace = True) 
    
    #df.loc[upper][c]=Q3
    #df.loc[lower][c]=Q1
     
  df[continuous_col].boxplot(rot=90)
  plt.title("after removing outliers",)
  plt.show()
  return df

In [None]:
netflix.columns

Index(['type', 'title', 'cast', 'country', 'release_year', 'rating',
       'duration', 'listed_in', 'description', 'year_added', 'month_added'],
      dtype='object')

In [None]:
listed=[x.split(",") for x in netflix["listed_in"]]
listed=[c.strip() for i in listed for c in i]
list(set(listed))

['TV Horror',
 'British TV Shows',
 'LGBTQ Movies',
 'Comedies',
 'Docuseries',
 'TV Action & Adventure',
 'International TV Shows',
 'TV Thrillers',
 'Classic Movies',
 'Science & Nature TV',
 'Movies',
 'Action & Adventure',
 'Romantic TV Shows',
 'Thrillers',
 'Anime Series',
 "Kids' TV",
 'Sci-Fi & Fantasy',
 'Children & Family Movies',
 'Romantic Movies',
 'Classic & Cult TV',
 'TV Comedies',
 'TV Shows',
 'International Movies',
 'Cult Movies',
 'Reality TV',
 'Dramas',
 'Independent Movies',
 'Horror Movies',
 'Anime Features',
 'TV Dramas',
 'Crime TV Shows',
 'Music & Musicals',
 'Stand-Up Comedy & Talk Shows',
 'Korean TV Shows',
 'Faith & Spirituality',
 'Stand-Up Comedy',
 'Teen TV Shows',
 'Documentaries',
 'Spanish-Language TV Shows',
 'Sports Movies',
 'TV Mysteries',
 'TV Sci-Fi & Fantasy']

In [None]:
for i in ['type', 'country', 'release_year', 'rating',
     'listed_in', 'year_added', 'month_added']:
  print(f"{i}:\n{netflix[i].value_counts()}\n\n\n")

type:
Movie      5377
TV Show    2410
Name: type, dtype: int64



title:
3%                                          1
Results                                     1
Rich in Love                                1
Ricardo Quevedo: Los amargados somos más    1
Ricardo Quevedo: Hay gente así              1
                                           ..
Hamza's Suitcase                            1
Hamid                                       1
Hamburger Hill                              1
Hamara Dil Aapke Paas Hai                   1
ZZ TOP: THAT LITTLE OL' BAND FROM TEXAS     1
Name: title, Length: 7787, dtype: int64



country:
United States                                                   2555
India                                                            923
United Kingdom                                                   397
Japan                                                            226
South Korea                                                      183
                        