#Accuracy Matrices in Machine Learning
In this section, let us understand how to measure the accuracy of collaborative filtering.



## Step 1: Import Required Libraries and Check How It Is Evaluated

- Import package pandas 
- Import package surprise
- Import package collections

In [None]:
import pandas as pd
!pip install scikit-surprise
# !conda install -y -c conda-forge scikit-surprise # If you use conda on a non-Colab environment
from surprise import SVD
from surprise import KNNBaseline
from surprise.model_selection import train_test_split
from surprise.model_selection import LeaveOneOut
from surprise import Reader
from surprise import Dataset
from surprise import accuracy
from collections import defaultdict

## Step 2: Load and Merge the Datasets

- Read two datasets: movies and ratings
- Merge the datasets and check the head of the data


In [None]:
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
df = pd.merge(movies, ratings, on ='movieId', how='inner')
df.head()

__Observation:__
- Here, we can see the head of the merged dataset.

## Step 3: Prepare the Data for the Model

- Create a Reader object with a rating scale from 0.5 to 5
- Load the data into a dataset object
- Split the data into train and test sets


In [None]:
reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=0.25, random_state=0)

## Step 4: Use SVD Algorithm, Train on the Train Set, and Predict on the Test Set

- Now let's build a model of singular value decomposition with a random state.
- Fit the model with the train set
- Make predictions with the test set


In [None]:
algo = SVD(random_state=0)
algo.fit(trainset)
predictions = algo.test(testset)

## Step 5: Calculate RMSE and MAE

- Define a function to check the MAE and RMSE

In [None]:
def MAE(predictions):
    return accuracy.mae(predictions, verbose=False)

In [None]:
def RSME(predictions):
    return accuracy.rmse(predictions, verbose=False)

In [None]:
print("RMSE: ", RSME(predictions))
print("MAE :", MAE(predictions))

__Observations:__
- The RMSE score is 0.89 and the MAE is 0.68.
- This is one method of evaluation.


## Step 6: Define GetTopN Function

- There is also another method for TopN evaluation, and N may be any value.
- Let's consider n = 10 and a minimum rating = 4.0.

*   To create an empty dictionary of TopN, let’s append the movie ID, estimated rating, and their respective user ID.
*    For topN items, we will sort the rating and return the topN.





In [None]:
def GetTopN(predictions, n=10, minimumRating=4.0):
    topN = defaultdict(list)
    for userid, movieid, actualRating, estimatedRating, _ in predictions:
        if (estimatedRating >= minimumRating):
            topN[int(userid)].append((int(movieid), estimatedRating))
            
    for userid, ratings in topN.items():
        ratings.sort(key=lambda x: x[1], reverse=True)
        topN[int(userid)] = ratings[:n]
        
    return topN

## Step 7: Perform Leave-One-Out Cross Validation

- To get topN, let’s use Leave-One-Out Cross Validation (LOOCV).
-  Apply LOOCV to the train set and test set, fit the algo model to the train set, and predict using the test set
- Based on the prediction, we will have the topN for the 10 values.


In [None]:
LOOCV = LeaveOneOut(n_splits=1, random_state=1)

for trainset, testset in LOOCV.split(data):
    algo.fit(trainset)
    leftoutpredictions = algo.test(testset)
    bigTestset = trainset.build_anti_testset()
    allpredictions = algo.test(bigTestset)
    topNpredicted = GetTopN(allpredictions, n=10)

In [None]:
topNpredicted

__Observation:__
- Here, we have the top 10 values for each userid.

## Step 8: Calculate HitRate

- The HitRate function can be defined as the number of hits divided by the number of test users, representing the system's overall hit rate. A higher value indicates that we can propose a rating removal more frequently.
- Calculate the HitRate with the top N predicted ratings and the left-out predictions.
- We generate a user ID and a left-out movie ID using left-out prediction.
- Using this left-out movie ID, compare it with the movie ID and compute the hit and total.
- Now, let's print the calculated HitRate.


In [None]:
 def HitRate(topNPredicted, leftoutPredictions):
        hits = 0
        total =0
        
        for leftout in leftoutpredictions:
            userid =  leftout[0]
            leftoutmovieid = leftout[1]
            
            hit = False
            for movieid, predictedRating in topNpredicted[int(userid)]:
                if (int(leftoutmovieid)  == int(movieid)):
                    hit = True
                    break
            if (hit) :
                hits += 1
            
            total += 1
            
        return hits/total

Now, let's check the HitRate.

In [None]:
print("\nHit Rate : ", HitRate(topNpredicted, leftoutpredictions))

**Observation:**
- The hit frequency is 0.0245.