# Netflix Recommender System

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Objective-of-this-notebook" data-toc-modified-id="Objective-of-this-notebook-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Objective of this notebook</a></span></li><li><span><a href="#Problem-we-seek-to-address" data-toc-modified-id="Problem-we-seek-to-address-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Problem we seek to address</a></span></li><li><span><a href="#Datasources-we-seek-to-base-our-models-on" data-toc-modified-id="Datasources-we-seek-to-base-our-models-on-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Datasources we seek to base our models on</a></span></li></ul></li><li><span><a href="#Importing-the-necessary-packages" data-toc-modified-id="Importing-the-necessary-packages-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Importing the necessary packages</a></span><ul class="toc-item"><li><span><a href="#Installing-the-necessary-packages" data-toc-modified-id="Installing-the-necessary-packages-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Installing the necessary packages</a></span></li><li><span><a href="#Importing-all-the-data-wrangling-packages" data-toc-modified-id="Importing-all-the-data-wrangling-packages-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Importing all the data-wrangling packages</a></span></li><li><span><a href="#Importing-all-the-visualization-pacakges" data-toc-modified-id="Importing-all-the-visualization-pacakges-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Importing all the visualization pacakges</a></span></li><li><span><a href="#Importing-all-the-model-building-packages" data-toc-modified-id="Importing-all-the-model-building-packages-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Importing all the model building packages</a></span></li></ul></li><li><span><a href="#Exploring-the-Data" data-toc-modified-id="Exploring-the-Data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Exploring the Data</a></span><ul class="toc-item"><li><span><a href="#Displaying-the-heads-of-the-tables" data-toc-modified-id="Displaying-the-heads-of-the-tables-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Displaying the heads of the tables</a></span></li><li><span><a href="#Checking-for-null-values,-shapes-and-datatypes" data-toc-modified-id="Checking-for-null-values,-shapes-and-datatypes-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Checking for null-values, shapes and datatypes</a></span></li></ul></li><li><span><a href="#Feature-Engineering" data-toc-modified-id="Feature-Engineering-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Feature Engineering</a></span></li><li><span><a href="#Splitting-the-data" data-toc-modified-id="Splitting-the-data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Splitting the data</a></span></li><li><span><a href="#Text-Pre-processing" data-toc-modified-id="Text-Pre-processing-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Text Pre-processing</a></span><ul class="toc-item"><li><span><a href="#Standardizing-the-train-set" data-toc-modified-id="Standardizing-the-train-set-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Standardizing the train set</a></span></li></ul></li><li><span><a href="#Model-Building,-training-and-assessment" data-toc-modified-id="Model-Building,-training-and-assessment-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Model Building, training and assessment</a></span><ul class="toc-item"><li><span><a href="#Model-1:-Building-and-training" data-toc-modified-id="Model-1:-Building-and-training-7.1"><span class="toc-item-num">7.1&nbsp;&nbsp;</span>Model 1: Building and training</a></span></li><li><span><a href="#Model-1:-Assessing-the-model" data-toc-modified-id="Model-1:-Assessing-the-model-7.2"><span class="toc-item-num">7.2&nbsp;&nbsp;</span>Model 1: Assessing the model</a></span></li><li><span><a href="#Model-2:-Building-and-training" data-toc-modified-id="Model-2:-Building-and-training-7.3"><span class="toc-item-num">7.3&nbsp;&nbsp;</span>Model 2: Building and training</a></span></li><li><span><a href="#Model-2:-Assessing-the-model" data-toc-modified-id="Model-2:-Assessing-the-model-7.4"><span class="toc-item-num">7.4&nbsp;&nbsp;</span>Model 2: Assessing the model</a></span></li></ul></li><li><span><a href="#Model-Deployment" data-toc-modified-id="Model-Deployment-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Model Deployment</a></span><ul class="toc-item"><li><span><a href="#Exporting-the-best-model-to-the-Kaggle-leaderboard" data-toc-modified-id="Exporting-the-best-model-to-the-Kaggle-leaderboard-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Exporting the best model to the Kaggle leaderboard</a></span></li><li><span><a href="#Exporting-our-finding-via-a-streamlit-app" data-toc-modified-id="Exporting-our-finding-via-a-streamlit-app-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Exporting our finding via a streamlit app</a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Conclusion</a></span></li></ul></div>

## Introduction

### Objective of this notebook

Our objective with this notebook is to build a movie recommender algorithm that can accurately predict how a user will rate a movie based on their historical preferences. The algorithm should either make use of collaborative filtering or content based filtering.

This forms part of a public Kaggle competition and will be evaluated by the best RMSE score.

We will also build a streamlit app hosted in AWS EC2 to communicate our findings to a board of executive stakeholders.

### Problem we seek to address

Viewers of movies want to watch movies that they will like. This will increase their satisfaction and will make them more loyal to the Netflix platform which will extend the duration of their subscription, and cause them to recommend this platform to others over against the platforms of their competition (like Showmax). This will result in more loyal customers which will result in greater income.

### Datasources we seek to base our models on

MovieLens has gathered millions of movie ratings together with the movieId, UserId, and timestamp when the movie was rated. We also have the following information available on each movie from other public sources: 1. Movie-titles, 2. Movie-genres, 3. Title-cast, 4. Movie-budget, 5. Plot-keywords, 6. Director, 7. Runtime, 8. genome-scores.

We will seek to use all of the data above to train our unsupervised machine learning models on to build the best recommender system within our abilities.

## Importing the necessary packages

Before we jump in, it is necessary to install and import some python packages that will help us in our quest. We are indebted to those who have contributed to building these packages and making them freely available. Note there might be some packages that you first need to install on your local machine before it will run. Some of the common ones we have commented below and you can just remove the hashtag to install them on your local machine and then comment it out again after it has been installed on your local machine.

### Installing the necessary packages

### Importing all the data-wrangling packages

### Importing all the visualization pacakges

### Importing all the model building packages

## Exploring the Data

Before we can build our models it is important to just get a basic understanding of the data we are working with so that we can make use of feature engineering, pre-process the data accurately and choose the right models. In this section we will display and draw insights from the data we are working with by using both visual and non-visual display methods.

### Displaying the heads of the tables

### Checking for null-values, shapes and datatypes

## Feature Engineering

## Splitting the data

In order to have a valid validation set free from data leakage we will split the train set so that 20% of the data is not processed or trained on, but is kept solely for the purpose of validation.

## Text Pre-processing

In order for the models to train effectively the data needs to be cleaned up and rightly categorized. The data also needs to be standardized so that the differences in the range of numerical metrics do not distort the weights the metrics should contribute to the training of the model.

### Standardizing the train set

## Model Building, training and assessment

### Model 1: Building and training

### Model 1: Assessing the model

### Model 2: Building and training

### Model 2: Assessing the model

## Model Deployment

### Exporting the best model to the Kaggle leaderboard

### Exporting our finding via a streamlit app

## Conclusion