# Sentiment Analysis For Steam Games

Data is collected by scrapy web spider.

We need to organize the data and remove unnecessary elements such as punctuation and new line characters. 

In [7]:
import numpy as np
import tensorflow
import json

## Reading and Preparing Data

Creating genre combination as one-hot vector so that classification would be correct.

* Can differentiate between action and action adventure as genre. 

In [33]:
with open('Steam-Data/steam_genres.json', 'r') as f:
    genres = f.read()

In [34]:
genres = json.loads(genres)

In [52]:
genre_list = [genre['genre'] for genre in genres]
genre_list.sort()

In [53]:
# Usually games combines two or three genres together, but in some rare instances games combine 5 genres
# Let's set the maximum value for the genre to 5
# We would have some managable genre labels.
import itertools
genre_length = len(genre_list) if genre_list == 5 else 5

In [54]:
total_labels = []
for i in range(1, genre_length):
    combination_of_genres = list(itertools.combinations(genre_list, i))
    for combined_genre in combination_of_genres:
        label = ' '.join(combined_genre)
        total_labels.append(label)

In [57]:
labels_length = len(total_labels)
labels_length

793

In [58]:
# One-hot encoded labels would be equal to identity matrix with length of the total_labels
output = np.identity(labels_length)

In [59]:
output

array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])

In [60]:
# Preparing the game data
with open('Steam-Data/steam_game_data.json', 'r') as f:
    games = f.read()

In [61]:
games = json.loads(games)

In [65]:
game_url = [game['url'] for game in games]

In [77]:
# Extracting appid from the url
import re
game_app_id = []
for i, url in enumerate(game_url):
    match = re.search("http://store.steampowered.com/app/(.+?)/", url)
    if match:
        game_app_id.append(match.group(1))
    else:
        # Removing sub products
        del games[i]

In [79]:
len(games) == len(game_app_id)

True

In [83]:
len(games)

14776

In [84]:
game_length = len(games)

In [87]:
game_description_information = [game['description'] for game in games]
game_about_information = [game['about'] for game in games]

joined_about_array = []
for game_about_array in game_about_information:
    striped_elements = [element.strip() for element in game_about_array]
    joined_about_array.append(' '.join(striped_elements))

combine_words = []
for i in range(game_length):
    stripped_game_description = str(game_description_information[i]).strip()
    combine_words.append(stripped_game_description + ' ' + joined_about_array[i])

combine_words[0]

"PLAYERUNKNOWN'S BATTLEGROUNDS is a last-man-standing shooter being developed with community feedback. Players must fight to locate weapons and supplies in a massive 8x8 km island to be the lone survivor. This is BATTLE ROYALE.   is a last-man-standing shooter being developed with community feedback. Starting with nothing, players must fight to locate weapons and supplies in a battle to be the lone survivor. This realistic, high tension game is set on a massive 8x8 km island with a level of detail that showcases Unreal Engine 4's capabilities. aka Brendan Greene, is a pioneer of the Battle Royale genre. As the creator of the Battle Royale game-mode found in the ARMA series and H1Z1 : King of the Kill, Greene is co-developing the game with veteran team at Bluehole to create the most diverse and robust Battle Royale experience to date "

In [None]:
game_tag_information = [game['tags'] for game in games]
game_genre_information = [game['genres'] for game in games]