<a href="https://colab.research.google.com/github/lucav22/Game-Popularity-Predictor/blob/main/Game_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding Game Population Dynamics: A Data Science Approach
*An in-depth analysis of player retention patterns in video games*

## Introduction
In the modern gaming industry, understanding player retention and population dynamics has become crucial for game developers, publishers, and analysts. This notebook presents a comprehensive approach to analyzing player count data to identify patterns that distinguish healthy games from those experiencing concerning population decline.

Our analysis encompasses a diverse dataset including both successful games that maintained healthy player bases (such as Destiny 2, Counter-Strike: Global Offensive, and Rainbow Six Siege) and games that struggled with player retention (like Babylon's Fall and Battlefield 2042). This variety allows us to identify key patterns and metrics that could predict population trends.

In the following sections, we'll explore:
1. Data loading and preprocessing techniques
2. Statistical analysis of player retention
3. Pattern recognition in population changes
4. Recovery attempt analysis
5. Comparative studies across different game types

## Setting Up Our Analysis Environment

Before diving into the data analysis, we need to set up our Python environment with the necessary libraries. We'll be using:
- pandas: For data manipulation and analysis
- numpy: For numerical computations
- matplotlib: For creating visualizations
- seaborn: For enhanced statistical visualizations

The combination of these libraries will give us powerful tools to process and visualize our game population data. We're also setting matplotlib to display plots inline within our notebook for better readability.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import os

## Understanding Data Loading and Preprocessing

Data preparation is crucial for accurate analysis. Our game population data is organized in a specific structure where each game has its own directory containing player count data. The data loading process needs to handle several important aspects:

1. File Path Management: We need to properly navigate our directory structure to access each game's data.
2. Date Handling: Converting timestamp information into proper datetime objects for temporal analysis.
3. Data Consistency: Ensuring all games' data follows the same format and structure.

The function we're about to create will handle these aspects while providing flexibility for different data formats. This standardization is crucial because it allows us to perform comparative analysis across different games fairly and accurately.

In [7]:
# First, let's test a single URL to make sure we can access the data
def test_github_access(game_name):
    """
    Test access to a single game's data file on GitHub.
    Prints detailed information about what's happening.
    """
    base_url = "https://raw.githubusercontent.com/lucav22/Game-Popularity-Predictor/main/data/"
    url = f"{base_url}{game_name}/player_counts.csv"
    print(f"Attempting to access: {url}")

    try:
        df = pd.read_csv(url)
        print(f"Successfully loaded data for {game_name}")
        print(f"Found {len(df)} rows of data")
        return True
    except Exception as e:
        print(f"Error accessing {game_name} data: {str(e)}")
        return False

# Let's test with one game first
test_github_access('battlefield2042')


Attempting to access: https://raw.githubusercontent.com/lucav22/Game-Popularity-Predictor/main/data/battlefield2042/player_counts.csv
Error accessing battlefield2042 data: HTTP Error 404: Not Found


False

## Initial Data Visualization and Understanding

Visualization is a powerful tool for understanding our data. In this section, we'll create two important visualizations:

1. Raw Player Counts: This shows us the absolute numbers of players for each game. This visualization is important because it helps us understand:
   - The scale of each game's player base
   - The dramatic differences between AAA and smaller titles
   - Initial launch populations versus long-term player counts

2. Normalized Player Counts: This is crucial because it allows us to compare patterns across games of different sizes. By converting all player counts to percentages of their peak, we can:
   - Identify similar decline patterns across games of different scales
   - Compare retention rates more fairly
   - Spot unusual patterns that might indicate problems or successes

Pay special attention to the shape of these curves, as they often tell us important stories about a game's lifecycle.

In [None]:
# Raw player count visualization
plt.figure(figsize=(15, 10))

for game_name, df in games_data.items():
    plt.plot(range(len(df)), df['player_count'], label=game_name)

plt.title('Raw Player Counts Over Time')
plt.xlabel('Days Since Launch')
plt.ylabel('Player Count')
plt.legend()
plt.grid(True)
plt.show()

# Normalized player count visualization
plt.figure(figsize=(15, 10))

for game_name, df in games_data.items():
    normalized_counts = (df['player_count'] / df['player_count'].max()) * 100
    plt.plot(range(len(normalized_counts)), normalized_counts, label=game_name)

plt.title('Normalized Player Counts Over Time')
plt.xlabel('Days Since Launch')
plt.ylabel('Percentage of Peak Players')
plt.legend()
plt.grid(True)
plt.show()