Netflix Prize - Rating Prediction

The Netflix Prize was an open competition for the best algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.
This way of predicting new ratings of (user, movie) pair is called Collaborative filtering
the grand prize was $1,000,000 and was won by BellKor's Pragmatic Chaos team. Our dataset is the dataset that was used in that competition.
TRAINING DATASET
- No. of Ratings : 100 Million
- No. of Users : 480k users
- No of Movies : 17k movies
We can't handle this much data at once. Our System will crash or We will run out of memory.
What we did is:
- Because we are given with timestamps also.., we can sort the data with the timestamp.
- Since we don't have actual ratings of test data, we will divide the Original Train Data into train and Test data.
- We will perform different models on the New_Train_Data and evaluate our model with New_Test_Data (both are atually from Original_Train_Data)
NEW_TRAIN_DATA:
- It has around 80 Million ratings
NEW_TEST_DATA :
- It has around 20 Million ratings
COLD START CASES :
- New Users : 75148(15.65 %)
- New Movies : 346(1.95 %)

My Approach of solving:

Since we can't do any experiments on such huge data.., we sampled few points from the New_Train and New_Test, and we will analyse different models.
It was done in 3 stages.
- First with small sample
- Next with Medium sample
- Finally, we will analyse the whole data

Train Data :
- Ratings : 8038440532
- Users : 405041
- Movies : 17424
Test Data :
- Ratings : 20096102
- Users : 349312
- Movies : 17757
Cold Start Cases:
- With movies : 1.95 % (346)
- With users : 15.65 % (75148)
Least_RMSE : 0.986832 (using just SVD)

GAvg : Average rating of all the ratings
Similar users rating of this movie:
- sur1, sur2, sur3, sur4, sur5 ( top 5 simiular users who rated that movie.. )
Similar movies rated by this user:
- smr1, smr2, smr3, smr4, smr5 ( top 5 simiular movies rated by this movie.. )
UAvg : User AVerage rating
MAvg : Average rating of this movie
rating : Rating of this movie by this user.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
01_EDA_on_whole_data_and_models_on_small_sample.ipynb		01_EDA_on_whole_data_and_models_on_small_sample.ipynb
README.md		README.md
sampling_large.ipynb		sampling_large.ipynb
sampling_medium.ipynb		sampling_medium.ipynb
surprise.ipynb		surprise.ipynb