## Experimentation and Data Extraction Notebook

This notebook is used to extract the appropriate data to feed into the Neural Network (Hybrid model that uses NCF for collaborative filtering and content filtering).

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load the dataset
df = pd.read_csv('../data/metacritic_user_data_full.csv')

  df = pd.read_csv('../data/metacritic_user_data_full.csv')


In [3]:
# Create a new dataframe with the required columns
new_df = pd.DataFrame({
    'user_id': df['Reviewer Name'],
    'game_title': df['Game Title'],
    'rating': df['Rating Given By The Reviewer']
})

# Normalize the ratings to be between 0 and 10
# First, we need to check what scale the original ratings are in
# Based on the sample data, it seems ratings are on a 0-100 scale

# Function to normalize ratings
def normalize_rating(rating):
    if pd.isna(rating):
        return np.nan
    
    # Otherwise, normalize from 0-100 to 0-10
    return rating / 10

# Apply normalization
new_df['rating'] = new_df['rating'].apply(normalize_rating)

# Drop rows with missing ratings if needed
# new_df = new_df.dropna(subset=['rating'])

# Save to a new CSV file
new_df.to_csv('../data/metacritic_user_data.csv', index=False)

# Display the first few rows of the new dataframe
print(new_df.head())

# Provide some statistics
print("\nDataset statistics:")
print(f"Number of records: {len(new_df)}")
print(f"Number of unique users: {new_df['user_id'].nunique()}")
print(f"Number of unique games: {new_df['game_title'].nunique()}")
print(f"Rating range: {new_df['rating'].min()} to {new_df['rating'].max()}")
print(f"Average rating: {new_df['rating'].mean():.2f}")

         user_id             game_title  rating
0           ZTGD  .hackG.U. Last Recode     8.5
1        RPGamer  .hackG.U. Last Recode     8.0
2   COGconnected  .hackG.U. Last Recode     7.5
3  Worth Playing  .hackG.U. Last Recode     7.0
4     CGMagazine  .hackG.U. Last Recode     7.0

Dataset statistics:
Number of records: 513250
Number of unique users: 219209
Number of unique games: 5445
Rating range: 0.0 to 10.0
Average rating: 2.03
