# Recommender System
Comprehensive and fully implemented recommendation systems are highly intricate and require significant resources.


The dataset you are working with is the **MovieLens dataset** (`u.data`), which is commonly used for building and testing recommendation systems.

### **Dataset Description:**
The dataset contains movie rating information from multiple users. Each row represents a single user’s rating of a specific movie, along with a timestamp indicating when the rating was given.

The data is typically used for tasks such as:
- Collaborative filtering
- Recommendation system development
- Exploring user preferences and behavior

---

### **Columns Explanation:**
1. **`user_id`**:  
   - **Description**: A unique identifier for each user in the dataset.  
   - **Example**: `0`, `1`, `2`  
   - **Purpose**: Allows us to track which user rated a particular movie.

2. **`item_id`**:  
   - **Description**: A unique identifier for each movie (or item) in the dataset.  
   - **Example**: `50`, `133`, `31`  
   - **Purpose**: Refers to the specific movie being rated by the user. These IDs can be mapped to actual movie titles if you have an additional dataset with item details (e.g., `u.item`).

3. **`rating`**:  
   - **Description**: The rating given by the user for the specific movie.  
   - **Range**: Typically ranges from `1` (worst) to `5` (best).  
   - **Example**: `5`, `4`, `1`  
   - **Purpose**: Indicates the user’s level of preference or opinion about the movie.

4. **`timestamp`**:  
   - **Description**: The UNIX timestamp (seconds since January 1, 1970) when the rating was given.  
   - **Example**: `881250949`  
   - **Purpose**: Helps track when the rating was provided, which can be useful for time-based analysis or trends.  


---

### **Use Cases of the Dataset:**
1. **Recommendation Systems**:
   - Train collaborative filtering or content-based filtering models.
   - Understand user preferences based on their ratings.
   
2. **Exploratory Data Analysis**:
   - Analyze rating distributions.
   - Explore the most rated or highest-rated movies.

3. **Time-Based Analysis**:
   - Study trends over time using the `timestamp` column.
   - Analyze how user behavior changes over the years.

---

In [2]:
import numpy as np
import pandas as pd

In [34]:
# df=pd.read_csv('u.data',sep="\t", header=None)
# df=pd.read_csv('u.data',sep="\t", names=['col1','col2','col3','col4'])
# \t: tab separater item
df=pd.read_csv('u.data',sep="\t", names=['user_id', 'item_id', 'rating', 'timestamp'])


In [36]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


Index(['user_id', 'item_id', 'rating', 'timestamp'], dtype='object')


### Dataset Analysis