# Data Cleaning and Preparation | Limpeza de Dados e Preparação

## Objective | Objetivo

- Load raw data from Lipstick Casino | Carregar os dados brutos do Lipstick Casino
- Perform necessary cleaning transformations | Realziar limpeza e transformações necessárias
- Save processed data for later analysis | Salvar dados processados para análise posterior


## 1. Loading and Initial Inspection | Carregamento e Inspeção Inicial

In [6]:
import pandas as pd
import numpy as np
import os
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Configure Paths | Configurar Paths
current_dir = Path.cwd()
data_dir = current_dir.parent / 'data'
raw_data_path = data_dir / 'Lipstick_casino_data.xlsx'
processed_data_path = data_dir / 'processed' / 'casino_data_processed.csv'

# Create directory for processed data | Criar diretório para dados processados
(data_dir / 'processed').mkdir(parents=True, exist_ok=True)

In [7]:
# Load raw data | Carregar dados brutos
print("Loading raw data...")
raw_data = pd.read_excel(raw_data_path, sheet_name='Casino Data')

# Show basic infos | Mostrar informações básicas
print("\nInfos abouth raw data:")
print(f"Total records: {len(raw_data)}")
print(f"Total columns: {len(raw_data.columns)}")
print("\nFirst 5 rows:")
display(raw_data.head())

Loading raw data...

Infos abouth raw data:
Total records: 3142
Total columns: 12

First 5 rows:


Unnamed: 0,"Month, Year",Casino Reference,Country,Game category,Table Type Commercial,Table name,user_currency,Month of first_bet_date,GGR Ucur,Wager Ucur,Player Game Count,Bet Spot Count
0,2020-07-01,Lipstick Casino,Estonia,Baccarat,Other,Baccarat A,EUR,January,2,85,4,4
1,2020-07-01,Lipstick Casino,Estonia,Baccarat,Other,Baccarat B,EUR,January,-78,184,3,3
2,2020-07-01,Lipstick Casino,Estonia,Baccarat,Other,Baccarat B,EUR,March,-35,35,1,1
3,2020-07-01,Lipstick Casino,Estonia,Baccarat,Other,Baccarat C,EUR,January,-25,25,1,1
4,2020-07-01,Lipstick Casino,Estonia,Baccarat,Other,First Person Baccarat,EUR,January,2,2,2,2


In [8]:
# Checking missing values | Verificar valores ausentes
print("\nMissing values per column:")
print(raw_data.isnull().sum())


Missing values per column:
Month, Year                0
Casino Reference           0
Country                    0
Game category              0
Table Type Commercial      0
Table name                 0
user_currency              0
Month of first_bet_date    0
GGR Ucur                   0
Wager Ucur                 0
Player Game Count          0
Bet Spot Count             0
dtype: int64


In [9]:
# Data preprocessing | Pré-processamento dos dados
print("\nStarting preprocessing...")

# Convert date column| Converter coluna de data
raw_data['Month, Year'] = pd.to_datetime(raw_data['Month, Year'])

# Create new temporal features | Criar novas features temporais
raw_data['Year'] = raw_data['Month, Year'].dt.year
raw_data['Quarter'] = raw_data['Month, Year'].dt.quarter
raw_data['Month_Name'] = raw_data['Month, Year'].dt.month_name()

# Normalize categories | Normalizar categorias
raw_data['Game category'] = raw_data['Game category'].str.title().str.strip()
raw_data['Country'] = raw_data['Country'].str.title().str.strip()
raw_data['Table Type Commercial'] = raw_data['Table Type Commercial'].str.strip()

# Calculate Hold % (GGR / Bet Volume) | Calcular Hold % (GGR / Volume de Apostas)
raw_data['Hold_Pct'] = raw_data['GGR Ucur'] / raw_data['Wager Ucur']
raw_data['Hold_Pct'] = raw_data['Hold_Pct'].replace([np.inf, -np.inf], np.nan)

# Handling extreme negative values | Tratar valores negativos extremos
raw_data.loc[raw_data['GGR Ucur'] < -10000, 'GGR Ucur'] = np.nan

# Consolidate table types | Consolidar tipos de mesa
raw_data['Table_Type_Simplified'] = np.where(
    raw_data['Table Type Commercial'].str.contains('High Stakes', case=False),
    'High Stakes',
    'Regular'
)


Starting preprocessing...


In [10]:
# Save processed data |  Salvar dados processados
raw_data.to_csv(processed_data_path, index=False)
print(f"\nProcessed data saved in: {processed_data_path}")
print("Pre-processing completed successfully!")


Processed data saved in: c:\Users\lucas\Documents\Github Project\iGaming-Analytics\data\processed\casino_data_processed.csv
Pre-processing completed successfully!
