In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


# **Loading the dataset**

In [None]:
netflix_data = pd.read_csv('/kaggle/input/netflix-original-films-imdb-scores/NetflixOriginals.csv')

In [None]:
netflix_data.head()

In [None]:
netflix_data.info()

There are 6 columns in the dataset: Title, Genre, Premiere, Runtime, IMDB Score and Language.

There are no null values present but we do need to convert Premier to a date time

In [None]:
netflix_data.describe()

In [None]:
netflix_data['Premiere'] = pd.to_datetime(netflix_data['Premiere'])
netflix_data['Premiere'] = netflix_data['Premiere'].dt.year

In [None]:
netflix_data.head()

**We have the premieres by year**

**Now there are a some things we can do:**
1. Find out the top rated Originals of all time 
2. See which genre, language and year do they belong to and what is their runtime
3. Find out what genre has the most originals
4. What other languages other than English are in more numbers
5. Group them by year to find out the best originals of that year and find trends 

# Finding Top Rated Originals of all time

Top rated movies on IMDB start with a score of 8.0 so thats what we'll be doing as well

In [None]:
top_rated = netflix_data[netflix_data['IMDB Score'] >= 8.0]
len(top_rated)

In [None]:
top_rated.tail(17)

As we can see that most of them belong to the Documentary genre and are in English with a few exceptions

In [None]:
sns.kdeplot(top_rated['Runtime'])
mean_runtime = top_rated['Runtime'].mean()

print(f"Average Runtime is: {mean_runtime}")

# Finding what genre has the most originals

In [None]:
netflix_data['Genre'].value_counts()

**Netflix mostly makes Documentaries which confirms our analysis above that the top rated ones were mostly Documentaries**

# Top Languages other than English

In [None]:
netflix_data['Language'].value_counts().head(10)

# Grouping by Year

In [None]:
netflix_data['Premiere'].unique()

Fact: Netflix started making orginals from 2013

In [None]:
netflix_data.groupby('Premiere').describe()

In [None]:
sns.jointplot(x='Premiere', y='Runtime', data=netflix_data)

# Final Conclusion (Summary)

This netflix data set is comprised of all the originals they have released from 2014 to 2021

Most of the originals were Documentaries, which also had the most top scores on IMDB

The average runtime had a range from 90 - 100 minutes

English was the most common Language followed by Hindi and Spanish, one Portugese and one Spanish Documentary were in the top rated on IMDB



**In short: Netflix has focused on making more documentaries which has proven to be a success for them, and there will be many more in the coming years**