![logo](./images/OPTIMISE.%20Logo%20(green).png)

# Optimise.
BUSINESS INTELLIGENCE SOLUTIONS

Optimise. uses data analysis to provide businesses a vision of their present operations and provides them with actionable advise based on meticulous analysis that produces tangible results.   

The analysis focuses on these main areas:     
- Product Analysis
    - Performance
    - Classification
    - Pricing
- Customer Analysis
    - Customer Profile
    - Customer Trends
    - Customer Lifetime Value
- Sales Analysis
    - Date/Time Overview
    - Discount Effeciency
    - Projections
    
The deliverables to be expected are a comprehensive report with useful visualizations, combined with specific recommendations based on the results obtained from the analysis.

## Steam Business Analysis
In this project we are going to be executing the analysis on Steam.    

Steam is a video game digital distribution service by Valve. The Steam platform is the largest digital distribution platform for PC gaming, holding around 75% of the market space in 2013. By 2017, users purchasing games through Steam totaled roughly US$4.3 billion, representing at least 18% of global PC game sales. By 2019, the service had over 34,000 games with over 95 million monthly active users. 

The data for the analysis is going to be obtaining in two ways:
1. Steam Store Games - https://www.kaggle.com/nikdavis/steam-store-games
2. Steam Api - https://steamcommunity.com/dev

## General Overview
### Import Data

In [15]:
import pandas as pd
import numpy as np

In [16]:
s = pd.read_csv("./data/steam.csv")

In [17]:
s.head()

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


In [20]:
s.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   appid             27075 non-null  int64  
 1   name              27075 non-null  object 
 2   release_date      27075 non-null  object 
 3   english           27075 non-null  int64  
 4   developer         27075 non-null  object 
 5   publisher         27075 non-null  object 
 6   platforms         27075 non-null  object 
 7   required_age      27075 non-null  int64  
 8   categories        27075 non-null  object 
 9   genres            27075 non-null  object 
 10  steamspy_tags     27075 non-null  object 
 11  achievements      27075 non-null  int64  
 12  positive_ratings  27075 non-null  int64  
 13  negative_ratings  27075 non-null  int64  
 14  average_playtime  27075 non-null  int64  
 15  median_playtime   27075 non-null  int64  
 16  owners            27075 non-null  object

### Uniques

In [23]:
print(len(s["developer"].unique()))
s["developer"].unique()

17113


array(['Valve', 'Gearbox Software', 'Valve;Hidden Path Entertainment',
       ..., 'SHEN JIAWEI', 'Semyon Maximov', 'Adept Studios GD'],
      dtype=object)

In [24]:
print(len(s["genres"].unique()))
s["genres"].unique()

1552


array(['Action', 'Action;Free to Play', 'Action;Free to Play;Strategy',
       ...,
       'Action;Adventure;Indie;Massively Multiplayer;RPG;Strategy;Early Access',
       'Action;Adventure;Casual;Free to Play;Indie;RPG;Simulation;Sports;Strategy',
       'Casual;Free to Play;Massively Multiplayer;RPG;Early Access'],
      dtype=object)

**Observations**: I need to unpack the genres column. I also observe a disparity among genre types where some relate to the actual genre of the game (ex: action) and some to the playability style (ex: multiplayer).

In [25]:
print(len(s["categories"].unique()))
s["categories"].unique()

3333


array(['Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled',
       'Multi-player;Valve Anti-Cheat enabled',
       'Single-player;Multi-player;Valve Anti-Cheat enabled', ...,
       'Online Multi-Player;Steam Achievements;Full controller support;In-App Purchases;Steam Cloud',
       'Multi-player;Local Multi-Player;Co-op;Local Co-op;Shared/Split Screen',
       'Multi-player;Online Multi-Player;Cross-Platform Multiplayer;Stats'],
      dtype=object)

In [26]:
print(len(s["steamspy_tags"].unique()))
s["steamspy_tags"].unique()

6423


array(['Action;FPS;Multiplayer', 'FPS;World War II;Multiplayer',
       'FPS;Action;Sci-fi', ..., 'Casual;Adventure;Arcade',
       'Free to Play;Visual Novel',
       'Early Access;Adventure;Sexual Content'], dtype=object)

In [27]:
print(len(s["publisher"].unique()))
s["publisher"].unique()

14354


array(['Valve', 'Mark Healey', 'Tripwire Interactive', ..., 'MonteCube',
       'Velvet Paradise Games', 'SHEN JIAWEI'], dtype=object)

**Observations:** Given that the number of unique values in this column is too high, I could unover the top publishers and assign the rest as `Other`.

In [29]:
print(len(s["platforms"].unique()))
s["platforms"].unique()

7


array(['windows;mac;linux', 'windows;mac', 'windows', 'windows;linux',
       'mac', 'mac;linux', 'linux'], dtype=object)

In [31]:
# I confirm that the "appid" column is an unique identifier
print(len(s["appid"].unique()))

27075


**Observations:** Given that the number of unique values in this column is too high, I could unover the top publishers and assign the rest as `Other`.