    Loading the Dataset:
        We start by loading a dataset containing information about video games. This dataset is stored in a CSV file named 'Cleaned_games.csv'.

    Processing Tags:
        Each game in the dataset has associated tags, describing its genre, features, etc.
        We convert the string representation of these tags into a list of strings using the ast.literal_eval function.
        Then, we perform one-hot encoding on these tags. One-hot encoding is a way to represent categorical data (like tags) as binary vectors, where each element indicates the presence or absence of a tag.
        Finally, we group the one-hot encoded tags by game to create a DataFrame where each row represents a game and each column represents a tag.

    Normalizing Numerical Features:
        We scale the 'positive_ratio' column, which represents the percentage of positive reviews for each game, using Min-Max scaling. This ensures that all numerical features are on a similar scale, which is important for training the neural network.

    Neural Network Architecture:
        We define a neural network architecture using Keras, a high-level deep learning library.
        The neural network consists of multiple layers:
            Input layer with 256 neurons (units), using ReLU activation function.
            Dropout layer with a dropout rate of 0.5, which helps prevent overfitting by randomly dropping some neurons during training.
            Hidden layers with 128 and 64 neurons respectively, also using ReLU activation function.
            Output layer with 1 neuron, using linear activation function. This neuron predicts the scaled rating (positive_ratio) for each game.
        We compile the model, specifying the loss function (mean squared error), optimizer (Adam), and evaluation metric (mean absolute error).

    Training the Model:
        We split the dataset into training and testing sets using the train_test_split function from scikit-learn.
        Then, we train the neural network on the training data for 20 epochs (iterations), with a batch size of 128. During training, the model learns to predict the positive_ratio based on the one-hot encoded tags.

    Saving Model Weights:
        Once the model is trained, we save its weights to a file named 'model_weights.h5'. These weights represent the learned parameters of the neural network.

    Predicting Positive Ratio:
        We define a function called predict_positive_ratio to predict the positive_ratio for a given set of input tags.
        The input tags are preprocessed to match the format used during training (one-hot encoding).
        Using the trained model and the preprocessed input tags, we make a prediction for the positive_ratio.
        Finally, we print out the predicted positive_ratio.

Overall, this code demonstrates how to train a neural network to predict the positive_ratio of video games based on their associated tags, and how to use the trained model to make predictions for new sets of tags.

In [9]:
import pandas as pd
import numpy as np
import ast
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Load dataset
data = pd.read_csv('Cleaned_games.csv')

# Convert string representation of tags to list of strings
tags_list = data['tags'].apply(ast.literal_eval)

# One-hot encode tags
tags = pd.get_dummies(tags_list.apply(pd.Series).stack()).groupby(level=0).sum()

# Reset index of tags
tags.reset_index(drop=True, inplace=True)

# Filter out rows with no tags
data_with_tags = data.iloc[tags.index]

# Normalize numerical features
scaler = MinMaxScaler()
data_with_tags['rating_scaled'] = scaler.fit_transform(data_with_tags['positive_ratio'].values.reshape(-1, 1))

# Combine features
X = tags  # Features are just the one-hot encoded tags for rows with tags
y = data_with_tags['rating_scaled'].values  # Target variable is the scaled rating

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Neural Network Architecture
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dropout(0.5))  # Dropout layer to prevent overfitting
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='linear'))  # Output layer predicts the scaled rating

# Compile model
model.compile(loss='mse', optimizer='adam', metrics=['mae'])

# Train model
model.fit(X_train, y_train, epochs=20, batch_size=128, validation_split=0.2)

# Save weights
model.save_weights('model_weights.h5')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_with_tags['rating_scaled'] = scaler.fit_transform(data_with_tags['positive_ratio'].values.reshape(-1, 1))


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [13]:
# Function to predict positive_ratio based on input tags array
def predict_positive_ratio(input_tags):
    # Preprocess input tags array to match the format used for training
    input_tags_df = pd.DataFrame(np.zeros((1, tags.shape[1])), columns=tags.columns)
    for tag in input_tags:
        if tag in input_tags_df.columns:
            input_tags_df[tag] = 1
    # Make prediction
    prediction = model.predict(input_tags_df)
    return (prediction[0][0] * 100)

# Example input tags array
input_tags = ['Action', 'Adventure', 'Open World', 'RPG', 'Singleplayer']

# Predict positive_ratio based on input tags array
predicted_positive_ratio = predict_positive_ratio(input_tags)
print("Predicted Positive Ratio:", predicted_positive_ratio)

Predicted Positive Ratio: 80.9302270412445
