# Precision and Recall

Understanding Precision and Recall
Precision and Recall are two fundamental metrics used to evaluate the performance of a recommendation system. Here's what they mean in the context of your recommendation system:

Precision:

Definition: Precision is the ratio of relevant items (true positives) to the total number of recommended items.
Interpretation: Precision measures the accuracy of the recommendations, i.e., how many of the recommended items are actually relevant to the user.
Formula: 
Precision
=
Number of relevant recommended items
Total number of recommended items
Precision= 
Total number of recommended items
Number of relevant recommended items
​
 
Example: If you recommend 5 books to a user and 2 of them are relevant, the precision is 
2
5
=
0.4
5
2
​
 =0.4.
Recall:

Definition: Recall is the ratio of relevant items (true positives) to the total number of relevant items that should have been recommended.
Interpretation: Recall measures the completeness of the recommendations, i.e., how many of the relevant items were actually recommended to the user.
Formula: 
Recall
=
Number of relevant recommended items
Total number of relevant items
Recall= 
Total number of relevant items
Number of relevant recommended items
​
 
Example: If there are 10 relevant books for a user and you recommend 5 books, out of which 2 are relevant, the recall is 
2
10
=
0.2
10
2
​
 =0.2.
Results Interpretation
Your current results:

Books >= 2: Mean Precision@5: 0.13, Mean Recall@5: 0.32
Books >= 5: Mean Precision@5: 0.18, Mean Recall@5: 0.33
Books >= 10: Mean Precision@5: 0.23, Mean Recall@5: 0.32
Analysis:
Precision increases as the minimum number of books per user increases. This suggests that filtering out users with fewer interactions helps improve the accuracy of the recommendations.
Recall remains relatively stable. This indicates that the system's ability to find all relevant items does not significantly change with different thresholds.
Improving the Model
Since you're using a sparse matrix and cosine similarity approach, here are a few additional suggestions to improve your model:

Content-Based Filtering:

Use additional features (e.g., genres, authors) to enrich the similarity computation.
Hybrid Models:

Combine collaborative filtering (your current approach) with content-based filtering to leverage the strengths of both methods.
Parameter Tuning:

Experiment with different values for the number of recommendations (top_n).
Normalization:

Normalize the interaction values before computing similarities to account for different user behaviors.


Improving Your Recommendation Model
Given your current results and the goal to improve your model, here are several strategies:

Data Enrichment:

User and Item Metadata: Incorporate additional features such as user demographics, item metadata (genre, author, etc.) to provide more context for recommendations.
Interaction Types: Differentiate between types of user interactions (e.g., clicks, purchases, ratings).
Advanced Algorithms:

Matrix Factorization: Use techniques like Singular Value Decomposition (SVD) or Alternating Least Squares (ALS) for collaborative filtering.
Content-Based Filtering: Combine collaborative filtering with content-based filtering.
Hybrid Models: Integrate multiple recommendation techniques to leverage their strengths.
Parameter Tuning:

Experiment with different values of k for recommendations.
Use cross-validation to find optimal parameters for your model.
Evaluation Metrics:

Besides precision and recall, consider using metrics like Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG).
Implementing Matrix Factorization using SVD
Here’s how you can implement a basic SVD-based recommendation system


In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from scipy import sparse
from scipy.sparse.linalg import svds
import time

top_n = 5
prec_k = 5
filePath = 'data/purchase_history_10.csv'

def run_rec_precision():
    total_start_time = time.time()
    print(f"Total recommendation process started at: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(total_start_time))}")
    print("Running recommendation script...")

    # Fetch purchase history as DataFrame from file
    purchase_history = fetch_purchase_history_from_file(filePath)

    # Split the data into training and testing sets
    train_data, test_data = train_test_split(purchase_history, test_size=0.2, random_state=42)
    print(f"Training data size: {len(train_data)}, Testing data size: {len(test_data)}")

    # Create the user-item matrix for the training set
    purchase_counts = train_data.groupby(['user_id', 'book_id']).size().unstack(fill_value=0)
    sparse_purchase_counts = sparse.csr_matrix(purchase_counts)
    
    # Perform SVD
    U, sigma, Vt = svds(sparse_purchase_counts, k=50)
    sigma = np.diag(sigma)

    # Reconstruct the user-item matrix
    reconstructed_matrix = np.dot(np.dot(U, sigma), Vt)
    reconstructed_df = pd.DataFrame(reconstructed_matrix, columns=purchase_counts.columns, index=purchase_counts.index)

    def recommend_items(user_id, n=top_n):
        if user_id not in reconstructed_df.index:
            return []

        user_predictions = reconstructed_df.loc[user_id].sort_values(ascending=False)
        user_history = purchase_counts.loc[user_id].to_numpy().nonzero()[0]
        recommendations = [book_id for book_id in user_predictions.index if book_id not in user_history][:n]
        return recommendations

    # Fetch book details from file
    book_details = fetch_books_from_file('data/books.csv')

    # Evaluate the model on the testing set
    test_user_ids = test_data['user_id'].unique()
    all_recommendations = []

    for user_id in test_user_ids:
        recommendations = recommend_items(user_id)
        all_recommendations.extend([(user_id, book_id) for book_id in recommendations])

    # Calculate precision and recall
    precision, recall = calculate_precision_recall(test_data, all_recommendations, k=prec_k)
    print(f"Mean Precision@5: {precision:.2f}")
    print(f"Mean Recall@5: {recall:.2f}")

def calculate_precision_recall(test_data, recommendations, k=prec_k):
    test_set = set((row['user_id'], row['book_id']) for _, row in test_data.iterrows())
    user_recommendations = {}

    for user_id, book_id in recommendations:
        if user_id not in user_recommendations:
            user_recommendations[user_id] = []
        user_recommendations[user_id].append(book_id)

    precisions = []
    recalls = []

    for user_id in test_data['user_id'].unique():
        true_positives = 0
        recommended_books = user_recommendations.get(user_id, [])
        relevant_books = test_data[test_data['user_id'] == user_id]['book_id'].tolist()

        for book_id in recommended_books[:k]:
            if book_id in relevant_books:
                true_positives += 1

        precision = true_positives / k
        recall = true_positives / len(relevant_books) if len(relevant_books) > 0 else 0

        precisions.append(precision)
        recalls.append(recall)

    mean_precision = np.mean(precisions)
    mean_recall = np.mean(recalls)

    return mean_precision, mean_recall

# Fetch purchase history from file
def fetch_purchase_history_from_file(file_path):
    print("Fetching purchase history from file")
    start_time = time.time()
    df = pd.read_csv(file_path)
    end_time = time.time()
    fetch_time = end_time - start_time
    print(f"Fetch time: {fetch_time:.2f} seconds")
    print(f"Size: {len(df)}")
    return df

# Fetch book details from file
def fetch_books_from_file(file_path):
    print("Fetching books from file...")
    start_time = time.time()
    df = pd.read_csv(file_path)
    end_time = time.time()
    fetch_time = end_time - start_time
    print(f"Fetch time: {fetch_time:.2f} seconds")
    print(f"Size: {len(df)}")
    return df

# Example usage
run_rec_precision()
