# TV Shows and Movies listed on Netflix

In [None]:
#####Import librarys 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
netflixs = pd.read_csv("../input/netflix-shows/netflix_titles.csv")


In [None]:
netflix_datas = netflixs.copy()

### Description
>This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

>In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

>Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

### Inspiration
>Some of the interesting questions (tasks) which can be performed on this dataset -

>Understanding what content is available in different countries

>Identifying similar content by matching text-based features

>Network analysis of Actors / Directors and find interesting insights

>Is Netflix has increasingly focusing on TV rather than movies in recent years.

In [None]:
netflix_datas.head()

## Columns details
#### show_id :-         Unique ID for every Movie / Tv Show
#### typeIdentifier :- A Movie or TV Show
#### title :-                 Title of the Movie / Tv Show
#### director :-           Director of the Movie
#### cast:-                  Actors involved in the movie / show
#### country :-Country where the movie / show was produced
#### date_added :-Date it was added on Netflix
#### release_year :- Actual Release year of the move / show
#### rating :-TV Rating of the movie / show
#### duration :-Total Duration - in minutes or number of seasons
#### listed_in :- Genere
#### description :-The summary description

In [None]:
#### Shape of data
netflix_datas.shape

In [None]:
#### sum of null values
netflix_datas.isnull().sum()

In [None]:
(netflix_datas.isnull().sum()/netflix_datas.shape[0])*100

###### Totally 
  >director 32 % null values 
  
  >cast 9% null values
  
  >cuntry 7% null values 

#### Fillng null value with mode

In [None]:
netflix2 =netflix_datas.copy()

netflix2 = netflix2.fillna(netflix2.mode().iloc[0])

In [None]:
netflix2.isnull().sum()

In [None]:
#### all stats operation - mean , SD,range,quartile

netflix2.describe(include='all')

In [None]:
#### data types 
netflix2.dtypes

In [None]:
####total info of data
netflix2.info()

### Now i understand basics of NETFLIX_data 
 > Head, Size , shape ,mean,range,SD,Null values ,Data type,quartile,fillin null value with mode

#### my selection approaches (methods)
   >Panel Data (cross_sectional & time series)
   
   >Scale of mesurement:- Categorical & Numerical data
    
   >exploring genres 

#### Unique numbers of Show_id & Genere

In [None]:
netflix2.show_id.nunique()

In [None]:
netflix2.listed_in.nunique()

#### 461 unique combination genres sum of each 

In [None]:
netflix2['listed_in'].value_counts()

In [None]:
netflix2['listed_in'].value_counts().describe()

#### i wanted to select most numbers of different combination genres group (different types of geners groups)
  >So first i need to find out outliers 


#### Now find a outlers

In [None]:
genres =netflix2['listed_in'].value_counts()

## I'm considering max standard deviation is 3
  > Z-score method

In [None]:
outliers = []
def find_outliers(data):
    
    max_std=3
    mean  = np.mean(data)
    std  = np.std(data)
    
    
    for i in data:
        z_score=(i - mean)/std
        if np.abs(z_score)> max_std:
            outliers.append(i)
    return outliers        

In [None]:
outlier_are = find_outliers(genres)

In [None]:
outlier_are

In [None]:
#### Outlies are the popular ganres
sns.countplot(netflix2['listed_in'])

#### Finding total % of Movies & TV Shows in data

In [None]:
movies = netflix2[netflix2['type'] == 'Movie']
tv_shows =netflix2[netflix2['type'] == 'TV Show']

In [None]:
### total num of movies
movies.show_id.count().sum()

In [None]:
### total num of tv_shows
tv_shows.show_id.count().sum()

In [None]:
(netflix2['type'].value_counts()/netflix2.shape[0])*100

#### Total number of shows by types
 >68.41% MOVIES
 
 >31.58% TV SHOWS

#### same things showing in graph (Movies & Tv shows)

In [None]:
sns.countplot(netflix2['type'])
plt.title('Show Type')

#### find numbers in release_year

In [None]:
shows_by_years=netflix2.release_year.value_counts()[:15]

shows_by_years

In [None]:
shows_by_years.plot('bar')
plt.title('Shows add years ')

> Most number of shows add from 2016 to 2018 are above 600 shows 

#### Total number of Movies & Tv shows by Country

In [None]:
plt.figure(1, figsize=(15, 7))
plt.title("Country with maximum shows")
sns.countplot(x = "country",hue='type', order=netflix2['country'].value_counts().index[0:15] ,data=netflix2,palette='Accent')

# Explore indian shows 

In [None]:
india=netflix2[netflix2["country"]=="India"]
india.show_id.count().sum()

In [None]:
india['type'].unique()

#### Finding what type of shows are more numbers

In [None]:
sns.countplot(data=india,x="type")

### Indian shows top 10 contents 

#### Top 10 rating

In [None]:
india['rating'].value_counts()

In [None]:
sns.countplot(data=india,x="rating")

#### Top 10 Indian most num shows avilable by  genres 

In [None]:
india['listed_in'].value_counts()[0:10]

#### Top 10 Indian most num shows avilable by  director

In [None]:
india['director'].value_counts()[0:10]

#### Top 10 Indian most num shows avilable by  cast

In [None]:
india['cast'].value_counts()[0:10]

# Conclusion

   >in this dataset 6234 rows of data and 12 different feature
  
  >total 32% director column null values
   
   >data types:- 2 int64 & 10 object
   
   >total numbers of unique show_id :- 6234 & movies:-4265 about 68% , Tv shows:- 1969 about 32%
   
   >More than 600 numbers of shows add in year 2016 to 2018
   
   > United states & India had more than 800 numbers of shows 


### Top content available in indian shows 
 >By Show type :-  Movies 
 
 >By Rating :- TV-14
 
 >By Genres :- Dramas, International Movies 
 
 >By Director :- Raúl Campos, Jan Suter
 
 > by cast :- David Attenborough 

### Future analysis in  next report 
 >Actors&Directors correlation to genres 
 
 >suggesting movies&Tv shows based on genres
 

## Thank you          (¬‿¬)