# Steam

Write an AI that recommends video games to Steam users using matrix factorization

## Variables

*int* `user_id`: numerical ID to identify while anonymizing Steam users  
*str* `game_title`: name of the game with which the user made an interaction  
*str* `behavior`: type of behavior exhibited by user; can only be either "purchase" or "play"  
*int* `value`: if the `behavior` value is "purchase", the `value` is always 1; otherwise, it specifies the number of hours the game has been played by the user

## Setup

Import the libraries and functions to be used.

In [1]:
%reset -f

import csv

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

from typing import Dict, Text

## Loading the dataset

Use the `DictReader()` function from the `csv` library to read the file and append each purchase as a dictionary mapping to a list.

In [2]:
data = []
with open('steam-200k.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['behavior'] == 'purchase':
            data.append(row)

for item in data[:3]:
    print(item)

{'user_id': '151603712', 'game_title': 'The Elder Scrolls V Skyrim', 'behavior': 'purchase', 'value': '1'}
{'user_id': '151603712', 'game_title': 'Fallout 4', 'behavior': 'purchase', 'value': '1'}
{'user_id': '151603712', 'game_title': 'Spore', 'behavior': 'purchase', 'value': '1'}


## Formatting the data

After selecting the relevant features, convert the data to a format `tensorflow` can read and process.

In [3]:
purchases = tf.data.Dataset.from_tensor_slices({
        'user_id': list(map(lambda x: x['user_id'], data)),
        'game_title': list(map(lambda x: x['game_title'], data))
    }
)

games = tf.data.Dataset.from_tensor_slices(list(set(map(
            lambda x: x['game_title'], data)
        )
    )
)

## Preprocessing

Map the features to integer indices for embedding.

In [4]:
user_ids_vocabulary = tf.keras.layers.StringLookup()
user_ids_vocabulary.adapt(purchases.map(lambda x: x['user_id']))

game_titles_vocabulary = tf.keras.layers.StringLookup()
game_titles_vocabulary.adapt(games)

## Model design

Define a class specifying the `compute_loss` function.

In [5]:
class SteamModel(tfrs.Model):

    def __init__(
        self,
        user_model: tf.keras.Model,
        game_model: tf.keras.Model,
        task: tfrs.tasks.Retrieval
    ):

        super().__init__()

        self.user_model = user_model
        self.game_model = game_model

        self.task = task

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

        user_embeddings = self.user_model(features["user_id"])
        game_embeddings = self.game_model(features["game_title"])

        return self.task(user_embeddings, game_embeddings)

Add the embedding layers to the user and game models and define the factorized retrieval task

In [6]:
user_model = tf.keras.Sequential([
        user_ids_vocabulary,
        tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), 64)
    ]
)

game_model = tf.keras.Sequential([
        game_titles_vocabulary,
        tf.keras.layers.Embedding(game_titles_vocabulary.vocabulary_size(), 64)
    ]
)

task = tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        games.batch(128).map(game_model)
    )
)

Initialize and train the retrieval model.

In [7]:
model = SteamModel(user_model, game_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))
model.fit(purchases.batch(4096), epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1c3b09d94c0>

Get video game recommendations from the model.

In [8]:
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    games.batch(100).map(lambda id: (id, model.game_model(id)))
)

<tensorflow_recommenders.layers.factorized_top_k.BruteForce at 0x1c3b12935e0>

In [11]:
USERS = ['151603712', '187131847', '59945701']

print(f"Top 3 games to play for users")
for user in USERS:
    _, ids = index(np.array([user]))
    print(f"    {user}: {ids[0, :3]}")

Top 3 games to play for users
    151603712: [b'Stonehearth' b'Guild of Dungeoneering' b'Legend of Grimrock 2']
    187131847: [b'Dota 2' b'Special Forces Team X' b'iRacing']
    59945701: [b'GUILTY GEAR XX ACCENT CORE PLUS R' b'Guilty Gear X2 #Reload'
 b'Cities in Motion 2']


<br>

- - -

#### Code authorship

2021 © Jessan Rendell G. Belenzo

<br>

#### Terms of use

Licensed under the GNU General Public License v3.0. See [LICENSE](https://github.com/jessanrendell/steam/blob/main/LICENSE).

<br>

## Acknowledgments

The Tamber Team (2017). Steam Video Games, version 3. Retrieved October 29, 2021 from https://www.kaggle.com/tamber/steam-video-games.