# Steam Game Recommendation

This is the first notebook in the series to build a steam game recommendation engine based on game tags. This notebook focuses on obtaining the data from [steamspy](https://steamspy.com/). The data is available via a publically exposed [API](https://steamspy.com/api.php).

## Downloading Data

Import the needed modules

In [1]:
import json
import os
import time
from urllib.request import urlopen

In [2]:
os.chdir('..')

I've created a `download` module to facilitate the downloading of the data.

In [3]:
import download

There are two ways of obtaining the data:
1. Cycle through all the tags and get the games for each tag
2. Cycle through all the games and get the tags for each game

Option 1 is significantly faster (only ~400 tags) but misses out on a vital piece of information that option 2 provides. Option 2 along with the tags provides the number of votes attributed for a tag for a given game. This will help remove tags that have a low frequency for games and thus reduce the noise in the data set.

Going with option 2, we first need to get all the games on steam via the `download.download_all_json_data` method. With obtains all of the game data on steamspy and returns it as a json. To save space and query time the game data does not contain tag information this was intentionally done by the developers.

In [5]:
all_data = download.download_all_json_data()

The game-tag data is placed in the `/data/games/` folder. If for whatever reason the download fails e.g. connection drop, running the cell below will obtain a list of all the game we have data for. Therefore, when restarting the download we can skip the game we have data for.

In [6]:
all_games = os.listdir('./data/games')

The section of code below downloads the tag data for each game. We iterate through the `all_data` JSON object which has the following schema: 
- key: game ID
- value: game data

We then check if we have data for this game if we do we skip it and move on to the next game. In the case of no data, a call to the steamspy API is made via the `download.get_game_data` method which takes the game ID as an argument. Once the data is obtained it is saved into the folder mentioned above with the filename schema: `[game ID].json`. Finally, we then wait 0.333 seconds as there is a request limit implemented on the server side.

In [14]:
for i, (game_id, game_data) in enumerate(all_data.items()):
    game_file = f'{game_id}.json'
    if game_file in all_games:
        continue
    tag_data = download.get_game_data(game_data['appid'])
    with open(f'./data/games/{game_file}', 'w') as json_out:
        json.dump(tag_data, json_out)
    if i % 10000 == 0:
        print(i)
    time.sleep(0.333)

30000


That's it. You should now have all the data you need to build a recommendation ending. The next step is to do some analysis of the data and determining if some cleaning is required.