# Steam videogames platform project

In this project, we conduct a study of Steam's game marketplace to understand the videogames market, and determine the most influencial factors for games popularity. We are provided a dataset containing information about the videogames available on Steam platform: names, description, developer, release date, number of owners, ratings, etc. The data corresponds to what is accessible to a customer of the platform. In the perspective of a real market analysis, it is nevertheless missing important information such as the production costs and duration, production means (eg use of generative AI), more precise info on the game mechanics, geographical/age distribution of players, etc.

For our analysis, we will follow the guidelines proposed in the project description.


Contents
--------
1. [Data loading and preprocessing](#loading)
2. [Macro-level analysis](#macro)
3. [Genres analysis](#genres)
4. [Platform analysis](#macro)
2. [Conclusion and perspectives](#conclusion)


Download the data from [https://full-stack-bigdata-datasets.s3.amazonaws.com/Big_Data/Project_Steam/steam_game_output.json](https://full-stack-bigdata-datasets.s3.amazonaws.com/Big_Data/Project_Steam/steam_game_output.json)

## <a name="loading"></a> Data loading and preprocessing



## <a name="macro"></a> Macro-level analysis

We begin our study by analyzing the market globally. We will focus on the following:
- Games popularity and revenues
- Publishers and developers popularity and revenues
- Evolution of the market in terms of game releases
- Games public and availability


### Games popularity and revenues

There are interesting patterns in game usage, price and revenues which we highlight here.

The basic statistics shown here reveal interesting characteristics of the game market.
- The average game price is below 10$. Actually, 7780 games are free. Such games actually generate a major part of their revenue in-game microtransactions. This is an important aspect of the video game industry that we will not be able to analyze here.
- Concerning game owners and revenues, the mean and the median differ by 3 orgers of magnitude. The market is thus extremely inhomogeneous.

Among the most popular games, we recognize well known games and franchises (Elden Ring, Grand Theft Auto, The Witcher, etc). We also note that among them, more than half are free to play. This shows that microtransactions are an important part of games revenues from Steam's platform. We will not be able to analyze this source of revenues with this dataset.

### Games publishers and developers

We now study how the market is shared among game publishers and developers. We first analyze things in terms of games released. One important metric in the game industry is the number of units sold, which we analyze in second.


#### Games producers by games releases

There are about 30000 publishers/developers. We recognize some well known companies among top publishers (SEGA, Square Enix). The developer names are less familiar, but we note the presence of individuals (Laush Dmitriy Sergeevich, Artur Smiarowski). These are independent developers, that actually take a large part of the released games.

#### Games producers by units owned

We consider the total number of games sold by (or downloaded from) a publisher/developer.

Here we sort the publishers/developers by number of owners of their games. The top names are all well known companies. The inhomogeneity in terms of owner base is large, ranging over 7 orders of magnitude. This is a combined effect: the largest companies also attract the largest public.

### Evolution of game releases

A company can take more risks in terms of game production in the context of a healthy game market. The situation of the latter is therefore an important for our analysis. In this section, we study it through the evolution of game releases.

A better indication of the market health would be the evolution of the number of players, which is not available in this dataset.

### General game availability

We conclude this section by analyzing games availability on the platform. This should help getting insights about the target audience for a new game. We focus here on age restrictions and language availabilty.

Although most games have no age requirements, some superproductions do have an age limit. Age-restricted games represent an estimated 4.3% of the global market share. Producing an age-restricted game has both advantages and disadvantages. On the one hand, the potential customer basis is mechanically reduced. On the other hand, it allows producers to introduce more features and mechanics in their games.

Almost all games are available in english. There is a large gap in language availability between english and other languages. Only about 20% of them are also available in other european languages (french, german, italian, etc). However, those games that attract many players also are available in many languages. About 40% of games downloaded are available in more than 10 languages.

It is important to note that language availability does not necessarily means translation of in-game voices, the availability may be limited to menu translation and subtitles.

## <a name="genres"></a> Genres analysis

Some game genres are more popular than others, or more expensive to develop. Some developers/publishers have more expertise in some specific genres. The choice of the genre for a game release must take these factors into account. We now analyze the market from the games genres perspective.

Most games are from independent developers/publishers (`'Indie'` genre). This is expected since those games are in general faster to produce, and their development is acessible to many people. THe most proeminent genres are the action-adventure games. Casual games, aiming at a broad public rather than the hobbyist player, are also proeminent. This is explained by the fact those, like independent games, are smaller and thus faster to produce.

## <a name="platform"></a> Platform analysis

The choice of platform availability is important in game production. Being available in different platforms brings more public, but this may not be worth the extra costs (eg licences and software adaptation). We therefore conclude our study of the game market by focusing on the importance of the different platform available: Windows, Mac and Linux.

Almost all games are available on Windows, while only 23% and 15% of the games are available on Mac and linux, respectively. This makes Windows a mandatory platform for computer games.

Although only 15/23% of the games are available to Linux and Mac users, the availability rises to 33/41% in terms of downloaded games. This means that popular games tend to be available on Mac and Linux.

## <a name="conclusion"></a> Conclusion and perspectives

We can summarize the results of our study of Steam's game marketplace with the following points:
- In terms of shares, the market is dominated by a few superproductions from major publisher companies.
These games are played by many players and generate the largest revenues. This market structure is actually
quite general and occurs accross the whole entertainment inductry (music, movies, series, etc).
- About half publishers/developers released only one game. These are independent people or very small companies.
Similarly to the revenues, the game release-by-company distribution is fat tailed,
with only about 10 companies having released more than 100 games. 
- This structure translates to the distribution of the company's customer base. The amplitude of variation here is even larger,
considering that games released by large comanies also have more players.
- The game releases seem stable since early 2020 with about 700-800 monthly releases.
- The genre and platform availability of released games is stable over time


In terms of strategy for a new videogame from an important company such as Ubisoft, we can provide the following advices:
- The company has enough ressources to make a superproduction, which is definitely an option to consider.
However, the associated production costs are also large and so are the financial risks.
- For a video game with a completely new gameplay or concept, the risks might be unacceptable. For
such a game, one could consider being less ambitious and test the concept on a smaller scale, for instance by
publishing a game with limited production costs.
- In terms of genres, one should focus on the main genres: action adventure, strategy, RPG, etc. and choose from
those in which the company has expertise.
- For a large production, translating the game is a necessity. Adapting the game to many platforms is also
strategic since the associated costs are low as compared to the other production costs.
The gains in players likely outweights the adaptation costs.


Our analysis has some limitations, some of which we mentioned in the introduction. We recall the most important ones:
- We have very limited information on the actual game usage. The number of game owners is provided as a rather braod range
which forced us to construct a very rough estimate. We do not now whether people actually play to the game, nor how
many time they spend playing.
- The number of game owners corresponds to the values at the time when the datset was built. We have no information
about its evolution in time, which would be useful to make prospects.
- Our data are limited to computer platforms. it would be very beneficial to a market analysis to include video games usage in phones or consoles.
- Even in the scope of computer platforms, we are only considering Steam's marketplace.
We miss information for video games not directly available on Steam (for instance, Fortnite, one on the most played games).