#  Netflix Movies Dataset

## Problem Statement
Sometimes we need to watch a new movie, but we don't know how to find the movie we like as content from movies, be it romantic, funny, or action. The goal here is to find data on the Netflix website and use it to give me recommendations about movies of the genre that I prefer.
Here we will use the Content-based recommendation systems machine learning , the background knowledge of the products and customer information are taken into consideration. Based on the content that you have viewed on Netflix, it provides you with similar suggestions. For example, if you have watched a film that has a sci-fi genre, the content-based recommendation system will provide you with suggestions for similar films that have the same genre.

 
---

### Dataset Description 
|Columns|The Meaning |
|:-|:-|
|movie_name|Th Movie Name|
|Duration|The Duration of the movie|
|year|The Year of production of the movie|
|genre|The genre of the movie|
|director|The director of the movie|
|actors|The actors of the movie|
|country| The country of production of the movie|
|rating|The rating of the movie|
|enter_in_netflix|The date of the entry of the movie on the Netflix|


## Importing

In [None]:
import requests
import re
import pandas as pd
import numpy as np
# from bs4 import BeautifulSoup
import json


#Set styles
%matplotlib inline

#ploting
import seaborn as sns
import matplotlib.pyplot as plt

import os 

In [None]:
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
nRowsRead = 3000 

df1 = pd.read_csv('/kaggle/input/netflix-movies/Netflix_movies.csv', delimiter=',', nrows = nRowsRead)
df1.dataframeName = 'Netflix_movies.csv'
nRow, nCol = df1.shape
print(f'There are {nRow} rows and {nCol} columns')

In [None]:
Netflix_movies =pd.read_csv('/kaggle/input/netflix-movies/Netflix_movies.csv')

In [None]:
Netflix_movies.head()

<br>

---

<br>

# EDA

In [None]:
Netflix_movies.info()

1. Some columns needs type change.
2. netflix column must be dropped, because it's useless.

In [None]:
Netflix_movies

In [None]:
Netflix_movies.shape

In [None]:
#Find all empty cells
empty_cells = np.where(Netflix_movies == '')[0]
len(Netflix_movies.iloc[empty_cells])

- I'll rename the needed columns.
- Some columns need to fix the empty cells.
- We have 732 empty cells across the entire dataframe.

In [None]:
# check the null 
Netflix_movies.isna().sum()

In [None]:
fig, ax = plt.subplots(figsize = (18, 10))
 
sns.heatmap(Netflix_movies.isnull(), yticklabels=False, cbar=False)
ax.set_title('Netflix_movies');

In [None]:
#Features
Netflix_movies.columns

In [None]:
# show the value in each column..
for col in Netflix_movies.columns:
    print(col)
    print(Netflix_movies[col].value_counts())
    print('===============================')

In [None]:
#Explore my data
Netflix_movies.describe().T

---

# Feature engineering

In [None]:
# Delet the empty_cells
Netflix_movies1=Netflix_movies.drop(empty_cells)
Netflix_movies1

In [None]:
Netflix_movies1.columns

In [None]:
# Rename the some columns
Netflix_movies1 = Netflix_movies1.rename(columns={'time':"Duration",
                                                 'enter_in':'enter_in_netflix',
                                    })
Netflix_movies1.head() # checking

In [None]:
# check the types of all columns
Netflix_movies1.dtypes

In [None]:
# chaning data type from object to int
Netflix_movies1['Duration'] = Netflix_movies1['Duration'].astype(int)
Netflix_movies1['year'] = Netflix_movies1['year'].astype(int)

In [None]:
# chaning data type from object to float
Netflix_movies1['rating'] = Netflix_movies1['rating'].astype(float)

In [None]:
# change the type of some colmns 
Netflix_movies1.dtypes

In [None]:
Netflix_movies1['rating'].unique()

In [None]:
# to show the outlier 
sns.boxplot(Netflix_movies1['rating'])

In [None]:
Netflix_movies1['year'].unique()

In [None]:
# to show the outlier 
sns.boxplot(Netflix_movies1['year'])


In [None]:
# show the movies name and rating
Netflix_movies_ = Netflix_movies1[['movie_name', 'rating']]
Netflix_movies_.head(10)

In [None]:
# to show the genre and rating.
Netflix_movies_ = Netflix_movies1[['genre', 'rating']]
Netflix_movies_.head(10)

In [None]:
# The highest rating in the genre of movies
Netflix_movies_ = Netflix_movies1[['genre', 'rating']]
Netflix_movies_.max()


In [None]:
# Count the number of ratings:
ratings_count = Netflix_movies_.groupby('rating', as_index=False)['rating'].count()
ratings_count

In [None]:
Netflix_movies1

----------------------------

# Visualization

In [None]:
fig, ax7 = plt.subplots(figsize=(16,8))


sns.heatmap(Netflix_movies1.corr(), annot=True,ax=ax7, cmap=sns.light_palette('pink'))

ax7.set_title('Netflix_movies dataset correlation ', fontsize = 20);

#### There are no strong corr between the features

In [None]:
# checking degree of correlation between time, year and rating..  
Netflix_movies1[['Duration','year','rating']].corr()

In [None]:
# Visualizing year vs rating 
plt.figure(figsize=(8, 5))
sns.scatterplot(Netflix_movies1["year"], Netflix_movies1["rating"])
plt.title("year vs rating")

In [None]:
# plotting time vs rating 
plt.figure(figsize=(8, 5))
sns.set(font_scale=1.2)
sns.scatterplot(Netflix_movies1["Duration"], Netflix_movies1["rating"])
plt.title("Duration vs year");

In [None]:
# Visualizing dsitrbution of rating.
plt.figure(figsize =(10,5))
sns.lineplot(x = Netflix_movies1.rating.value_counts().index,y = Netflix_movies1.rating.value_counts().values)
plt.title('Rating-type Distribution')