# < Player Count By Game >

### Data Cleaning & Preprocessing
- Identify and handle missing values (imputation, removal, or flagging)
- Detect and fix inconsistencies (duplicate rows, incorrect data types, formatting issues)
- Normalize or standardize numerical data for better analysis
- Handle categorical variables (one-hot encoding, label encoding)
- Identify and remove outliers using statistical methods or visualization

In [2]:
import pandas as pd
import numpy as np 

In [10]:
df = pd.read_csv("data/Game Player Count.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Name,Month,Average_Monthly_Players,Monthly_Gain_Loss,Monthly_Gain_Loss_percentage,Max_Players_per_Day
0,0,v rising,"June 30, 2022",57411,-38209,-40.0,146500
1,1,v rising,"May 30, 2022",95620,95620,0.0,152330
2,2,forza horizon 5,"June 30, 2022",23650588,1000547,4.0,7095176
3,3,forza horizon 5,"May 30, 2022",22650041,2961537,15.0,6795012
4,4,forza horizon 5,"April 30, 2022",19688504,983063,5.0,5906551


In [11]:
print("Unique: ", df['Name'].unique())

Unique:  ['v rising' 'forza horizon 5' 'payday 2' 'elden ring' 'lost ark'
 'splitgate' 'rust' 'yu gi oh master duel' 'monster hunter series'
 'team fortress 2 2']


In [None]:
date_series = df['Month']

for date in date_series:
    date_split = str(date).split(",")
    month_split = str(date_split[0]).split(" ")
    print("Month: ", month_split[0])
    print("Day: ", month_split[1])
    print("Year: ", date_split[1])


In [24]:
# str(df['Month']).split(",")[1]
df['Month'].apply(lambda x : x.split(",").count())

TypeError: list.count() takes exactly one argument (0 given)

In [6]:
df['Year'] = str(df['Month']).split(",")[1]

In [8]:
df['Day'] = str(df['Month']).split(",")[0].split(" ")[1]
df['Month'] = str(df['Month']).split(",")[0].split(" ")[0]

In [9]:
df.head()

Unnamed: 0.1,Unnamed: 0,Name,Month,Average_Monthly_Players,Monthly_Gain_Loss,Monthly_Gain_Loss_percentage,Max_Players_per_Day,Year,Day
0,0,v rising,0,57411,-38209,-40.0,146500,2022\n1 May 30,
1,1,v rising,0,95620,95620,0.0,152330,2022\n1 May 30,
2,2,forza horizon 5,0,23650588,1000547,4.0,7095176,2022\n1 May 30,
3,3,forza horizon 5,0,22650041,2961537,15.0,6795012,2022\n1 May 30,
4,4,forza horizon 5,0,19688504,983063,5.0,5906551,2022\n1 May 30,


### Exploratory Data Analysis (EDA) & Visualization
- Generate summary statistics and describe key features of the datasets
- Create histograms, boxplots, and scatter plots to analyze distributions and relationships
- Use correlation heatmaps to understand feature relationships
- Identify trends and patterns using time series analysis (if applicable)
- Compare different segments of data using group-by operations and visualizations

### Statistical Analysis & Hypothesis
- Perform t-tests, chi-square tests, or ANOVA to compare groups
- Compute confidence intervals for key metrics
- Use regression analysis to explore relationships between variables
- Check for normality and apply transformations if necessary
- Conduct A/B testing simulations (if applicable)

### Machine Learning Models
- Train a linear regression model for prediction tasks
- Build classification models (Logistic Regression, Decision Trees, Random Forest, SVM)
- Train a cluster model (K-Means, DBSCAN) to segment data
- Perform feature selection to improve model performance
- Evaluate models using cross-validation, precision, recall, and F1-score

### (Optional) Automation & Pipeline Building
- Automate EDA and preprocessing steps using custom functions
- Implement a pipeline for data transformation and modeling
- Store processed data in a database or export it in different formats (CSV, JSON)