# Recommender System
Comprehensive and fully implemented recommendation systems are highly intricate and require significant resources.

## Movies Recommendation
This notebook demonstrates how to build a movie recommendation system using the MovieLens dataset.  
The dataset is provided by GroupLens Research and contains user ratings for movies.  

- MovieLens Dataset: [MovieLens Recommendation Data](http://www.grouplens.org/node/73)

### **Dataset Description:**
The dataset contains movie rating information from multiple users. Each row represents a single user’s rating of a specific movie, along with a timestamp indicating when the rating was given.

The data is typically used for tasks such as:
- Collaborative filtering
- Recommendation system development
- Exploring user preferences and behavior

---

### **Columns Explanation:**
1. **`user_id`**:  
   - **Description**: A unique identifier for each user in the dataset.  
   - **Example**: `0`, `1`, `2`  
   - **Purpose**: Allows us to track which user rated a particular movie.

2. **`item_id`**:  
   - **Description**: A unique identifier for each movie (or item) in the dataset.  
   - **Example**: `50`, `133`, `31`  
   - **Purpose**: Refers to the specific movie being rated by the user. These IDs can be mapped to actual movie titles if you have an additional dataset with item details (e.g., `u.item`).

3. **`rating`**:  
   - **Description**: The rating given by the user for the specific movie.  
   - **Range**: Typically ranges from `1` (worst) to `5` (best).  
   - **Example**: `5`, `4`, `1`  
   - **Purpose**: Indicates the user’s level of preference or opinion about the movie.

4. **`timestamp`**:  
   - **Description**: The UNIX timestamp (seconds since January 1, 1970) when the rating was given.  
   - **Example**: `881250949`  
   - **Purpose**: Helps track when the rating was provided, which can be useful for time-based analysis or trends.  


---

### **Use Cases of the Dataset:**
1. **Recommendation Systems**:
   - Train collaborative filtering or content-based filtering models.
   - Understand user preferences based on their ratings.
   
2. **Exploratory Data Analysis**:
   - Analyze rating distributions.
   - Explore the most rated or highest-rated movies.

3. **Time-Based Analysis**:
   - Study trends over time using the `timestamp` column.
   - Analyze how user behavior changes over the years.

---

In [94]:
import numpy as np
import pandas as pd

In [117]:
# df=pd.read_csv('u.data',sep="\t", header=None)
# df=pd.read_csv('u.data',sep="\t", names=['col1','col2','col3','col4'])
# \t: tab separater item
df=pd.read_csv('u.data',sep="\t", names=['user_id', 'item_id', 'rating', 'timestamp'])


In [98]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [100]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100003 entries, 0 to 100002
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype
---  ------     --------------   -----
 0   user_id    100003 non-null  int64
 1   item_id    100003 non-null  int64
 2   rating     100003 non-null  int64
 3   timestamp  100003 non-null  int64
dtypes: int64(4)
memory usage: 3.1 MB


### Dataset Analysis

What we actually wanna do is to grab movie titles

In [103]:
import datetime
readable_date = datetime.datetime.fromtimestamp(881250949)
print(readable_date)

1997-12-04 15:55:49


In [105]:
movie_titles=pd.read_csv('Movie_Id_Titles')

In [107]:
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [109]:
df['item_id'].sort_values()

57284       1
57582       1
19044       1
57252       1
3414        1
         ... 
75326    1678
67305    1679
80397    1680
92332    1681
95379    1682
Name: item_id, Length: 100003, dtype: int64

In [111]:
df=pd.merge(df,movie_titles,on='item_id',how='inner')

In [113]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,0,172,5,881250949,"Empire Strikes Back, The (1980)"
2,0,133,1,881250949,Gone with the Wind (1939)
3,196,242,3,881250949,Kolya (1996)
4,186,302,3,891717742,L.A. Confidential (1997)


In [115]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100003 entries, 0 to 100002
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    100003 non-null  int64 
 1   item_id    100003 non-null  int64 
 2   rating     100003 non-null  int64 
 3   timestamp  100003 non-null  int64 
 4   title      100003 non-null  object
dtypes: int64(4), object(1)
memory usage: 3.8+ MB


In [119]:
df=pd.merge(df,movie_titles,on='item_id',how='left')

In [121]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100003 entries, 0 to 100002
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    100003 non-null  int64 
 1   item_id    100003 non-null  int64 
 2   rating     100003 non-null  int64 
 3   timestamp  100003 non-null  int64 
 4   title      100003 non-null  object
dtypes: int64(4), object(1)
memory usage: 3.8+ MB


In [123]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,0,172,5,881250949,"Empire Strikes Back, The (1980)"
2,0,133,1,881250949,Gone with the Wind (1939)
3,196,242,3,881250949,Kolya (1996)
4,186,302,3,891717742,L.A. Confidential (1997)


**Best rated movies**

In [133]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [135]:
sns.set_style('white')

In [141]:
df.groupby('title')['rating'].mean()

title
'Til There Was You (1997)                2.333333
1-900 (1994)                             2.600000
101 Dalmatians (1996)                    2.908257
12 Angry Men (1957)                      4.344000
187 (1997)                               3.024390
                                           ...   
Young Guns II (1990)                     2.772727
Young Poisoner's Handbook, The (1995)    3.341463
Zeus and Roxanne (1997)                  2.166667
unknown                                  3.444444
Á köldum klaka (Cold Fever) (1994)       3.000000
Name: rating, Length: 1664, dtype: float64

In [147]:
df.groupby('title')['rating'].mean().sort_values(ascending=False).head()

title
They Made Me a Criminal (1939)                5.0
Marlene Dietrich: Shadow and Light (1996)     5.0
Saint of Fort Washington, The (1993)          5.0
Someone Else's America (1995)                 5.0
Star Kid (1997)                               5.0
Name: rating, dtype: float64

In [155]:
df.groupby('title')['rating'].count().sort_values(ascending=False).head()

title
Star Wars (1977)             584
Contact (1997)               509
Fargo (1996)                 508
Return of the Jedi (1983)    507
Liar Liar (1997)             485
Name: rating, dtype: int64

In [161]:
df.groupby(['title', 'timestamp'])['rating'].count().sort_values(ascending=False).head()

title                      timestamp
Chasing Amy (1997)         884012614    2
Kull the Conqueror (1997)  883758119    2
                           881432735    2
Desperate Measures (1998)  886952246    2
Kull the Conqueror (1997)  883248595    2
Name: rating, dtype: int64

In [163]:
ratings=pd.DataFrame(df.groupby('title')['rating'].mean())

In [167]:
ratings.head()


Unnamed: 0_level_0,rating
title,Unnamed: 1_level_1
'Til There Was You (1997),2.333333
1-900 (1994),2.6
101 Dalmatians (1996),2.908257
12 Angry Men (1957),4.344
187 (1997),3.02439


In [169]:
ratings['Number of rating']=pd.DataFrame(df.groupby('title')['rating'].count())

In [171]:
ratings.head()

Unnamed: 0_level_0,rating,Number of rating
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41
