# Exploratory Data Analysis
I am using this notebook to learn how gamblers behaviors are similar to those of investors.

## Define Libraries

In [1]:
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interactive, fixed, IntSlider, HBox, Layout, VBox


# Getting rid of the SettingWithCopyWarning: 
pd.options.mode.chained_assignment = None

## Upload Data

In [2]:
# Set working directory
path = '/Users/mau/Library/CloudStorage/Dropbox/Mac/Documents/Dissertation/Chapter 2/Data'
os.chdir(path)

# Load data into a DataFrame
dtf = pd.read_parquet("dtf_with_change_col.parquet")

print(dtf.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 89996 entries, 0 to 90273
Data columns (total 35 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   playercashableamt             89996 non-null  float64
 1   wageredamt                    89996 non-null  float64
 2   casino_grosswin               89996 non-null  float64
 3   playerkey                     89996 non-null  int64  
 4   age                           89996 non-null  int64  
 5   maxbet                        89996 non-null  int64  
 6   assetnumber                   89996 non-null  int64  
 7   theoreticalpaybackpercent     89996 non-null  float64
 8   player_loss                   89996 non-null  float64
 9   player_wins                   89996 non-null  float64
 10  percent_return                89996 non-null  float64
 11  playercashableamt_pct_change  89819 non-null  float64
 12  time                          89996 non-null  int64  
 13  s

In [None]:
# Count how many players increase_slotdeno_lossing_2 in their first 20 sessions 

# Create a new column with the number of sessions



# Count how many players increase_slotdeno_lossing_2 in their first 20 gabling sessions



## Slicing DataFrames per Matched Players and Visualizing Outcomes

By this point we have several lists of individuals who increased or decreased their bet by changing slot denomination or increasing their min bet. 

The first set of lists are (Generic):
* _players_increase_slot_
* _players_decrease_slot_
* _players_increase_maxbet_
* _players_decrease_maxbet_

The second set of lists are (While Lossing):
* _players_inc_slot_lossing_2_: players who change to a higher denomination slot while lossing their previous 2 sessions.
* _players_inc_slot_lossing_3_: players who change to a higher denomination slot while lossing their previous 3 sessions.
* _players_inc_slot_lossing_4_: players who change to a higher denomination slot while lossing their previous 4 sessions.
* _players_inc_maxbet_lossing_2_: players who increase their maxbet while lossing their previous 2 sessions.
* _players_inc_maxbet_lossing_3_: players who increase their maxbet while lossing their previous 3 sessions.
* _players_inc_maxbet_lossing_4_: players who increase their maxbet while lossing their previous 4 sessions.

The third set of lists are (While winning):
* _players_inc_slot_winning_2_: players who change to a higher denomination slot while winning their previous 2 sessions.
* _players_inc_slot_winning_3_: players who change to a higher denomination slot while winning their previous 3 sessions.
* _players_inc_slot_winning_4_: players who change to a higher denomination slot while winning their previous 4 sessions.
* _players_inc_maxbet_winning_2_: players who increase their maxbet while winning their previous 2 sessions.
* _players_inc_maxbet_winning_3_: players who increase their maxbet while winning their previous 2 sessions.

Let's define a slicing function for our lists to be used in the general dataframe.

In [16]:
from ipynb.fs.full.functions import filter_match

First set of dataframes from generic lists

In [18]:
# Set rolling window (17 will set the rolling window to 17, 8 obersevations before and 8 observations after):
rolling = 17

# Create a new DataFrame with only the players that appear in players_increase_slot 
dtf_all_inc_slot, dtf_inc_slot = filter_match(df=dtf, players_match=players_increase_slot, match_column="increase_slotdeno", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_decrease_slot 
dtf_all_dec_slot, dtf_dec_slot = filter_match(df=dtf, players_match=players_decrease_slot, match_column="decrease_slotdeno", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_increase_maxbet 
dtf_all_inc_maxbet, dtf_inc_maxbet = filter_match(df=dtf, players_match=players_increase_maxbet, match_column="increase_maxbet", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_decrease_maxbet 
dtf_all_dec_maxbet, dtf_dec_maxbet = filter_match(df=dtf, players_match=players_decrease_maxbet, match_column="decrease_maxbet", rolling_window=rolling, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_generic = [dtf_inc_slot, dtf_dec_slot, dtf_inc_maxbet, dtf_dec_maxbet]

Second set of dataframes from lossing previous rounds:

In [19]:
# Set rolling window (17 will set the rolling window to 17, 8 obersevations before and 8 observations after):
rolling = 17

# Create a new DataFrame with only the players that appear in players_inc_slot_lossing_2
dtf_all_inc_slot_lossing_2, dtf_inc_slot_lossing_2 = filter_match(df=dtf, players_match=players_inc_slot_lossing_2, match_column="increase_slotdeno_lossing_2", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_slot_lossing_3
dtf_all_inc_slot_lossing_3, dtf_inc_slot_lossing_3 = filter_match(df=dtf, players_match=players_inc_slot_lossing_3, match_column="increase_slotdeno_lossing_3", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_slot_lossing_4
dtf_all_inc_slot_lossing_4, dtf_inc_slot_lossing_4 = filter_match(df=dtf, players_match=players_inc_slot_lossing_4, match_column="increase_slotdeno_lossing_4", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_maxbet_lossing_2
dtf_all_inc_maxbet_lossing_2, dtf_inc_maxbet_lossing_2 = filter_match(df=dtf, players_match=players_inc_maxbet_lossing_2, match_column="increase_maxbet_lossing_2", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_maxbet_lossing_3
dtf_all_inc_maxbet_lossing_3, dtf_inc_maxbet_lossing_3 = filter_match(df=dtf, players_match=players_inc_maxbet_lossing_3, match_column="increase_maxbet_lossing_3", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_maxbet_lossing_4
dtf_all_inc_maxbet_lossing_4, dtf_inc_maxbet_lossing_4 = filter_match(df=dtf, players_match=players_inc_maxbet_lossing_4, match_column="increase_maxbet_lossing_4", rolling_window=rolling, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_lossing = [dtf_inc_slot_lossing_2, dtf_inc_slot_lossing_3, dtf_inc_slot_lossing_4, dtf_inc_maxbet_lossing_2, dtf_inc_maxbet_lossing_3, dtf_inc_maxbet_lossing_4]

Third dataframes from winning previous rounds

In [20]:
# Set rolling window (17 will set the rolling window to 17, 8 obersevations before and 8 observations after):
rolling = 17

# Create a new DataFrame with only the players that appear in players_inc_slot_winning_2
dtf_all_inc_slot_winning_2, dtf_inc_slot_winning_2 = filter_match(df=dtf, players_match=players_inc_slot_winning_2, match_column="increase_slotdeno_winning_2", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_slot_winning_3
dtf_all_inc_slot_winning_3, dtf_inc_slot_winning_3 = filter_match(df=dtf, players_match=players_inc_slot_winning_3, match_column="increase_slotdeno_winning_3", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_slot_winning_4
dtf_all_inc_slot_winning_4, dtf_inc_slot_winning_4 = filter_match(df=dtf, players_match=players_inc_slot_winning_4, match_column="increase_slotdeno_winning_4", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_maxbet_winning_2
dtf_all_inc_maxbet_winning_2, dtf_inc_maxbet_winning_2 = filter_match(df=dtf, players_match=players_inc_maxbet_winning_2, match_column="increase_maxbet_winning_2", rolling_window=rolling, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_maxbet_winning_3
dtf_all_inc_maxbet_winning_3, dtf_inc_maxbet_winning_3 = filter_match(df=dtf, players_match=players_inc_maxbet_winning_3, match_column="increase_maxbet_winning_3", rolling_window=rolling, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_winning = [dtf_inc_slot_winning_2, dtf_inc_slot_winning_3, dtf_inc_slot_winning_4, dtf_inc_maxbet_winning_2, dtf_inc_maxbet_winning_3]

## Interactive Plots

The following section would be used to explore the data in an interactive way. These plots allow for user interaction, such as zooming, panning, and selecting data points. Users can customize the plot by choosing different variables to plot, adjusting axes ranges, and selecting data subsets. The interactive plots provide a dynamic way to visually explore the data and can reveal patterns or relationships that might not be apparent from static plots alone. By using interactive plots, we can gain a deeper understanding of the data and make more informed decisions during the data analysis process.

### First Plot Generic dataframes:

List of DataFrames that contains all generinc dataframes
* dtf_generic = [dtf_inc_slot, dtf_dec_slot, dtf_inc_maxbet, dtf_dec_maxbet]

In [21]:
import matplotlib.pyplot as plt
import ipywidgets as widgets

# Make a list of all the dataframes that are match and slice
dtf_lists = dtf_generic

# Calculate the max and min values for the 'time' column for each DataFrame
time_max = max([df["time"].max() for df in dtf_lists])
time_min = min([df["time"].min() for df in dtf_lists])

print(time_max, time_min)

# Create a scatter plot of the players wins for only player with key 3
def plot_scatters(player_ID, df_index, x="time", y="percent_return", y_2=None, x_min=None, x_max=None, show_line=False, shade_area=False):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_df = df[df["playerkey"] == players[player_ID]]
    
    fig, ax1 = plt.subplots()
    ax1.set_xlabel(x)
    ax1.set_ylabel(y, color='royalblue')
    if x_min is not None and x_max is not None:
        player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
    ax1.scatter(x=player_df[x], y=player_df[y], color='royalblue')
    
    if y_2 is not None:
        ax2 = ax1.twinx()
        ax2.set_ylabel(y_2, color='r')
        if x_min is not None and x_max is not None:
            player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
        ax2.scatter(x=player_df[x], y=player_df[y_2], color='orangered', marker='s')
        ax2.tick_params(axis='y', labelcolor='orangered')
        # Add a line to the plot if show_line is True
        if show_line:
            ax2.plot(player_df[x], player_df[y_2], color='black', linewidth=0.8, linestyle='--')
        if shade_area:
            ax2.fill_between(player_df[x], player_df[y_2], color='lightcoral', alpha=0.5)
    
    if show_line:
        ax1.plot(player_df[x], player_df[y], color='black', linewidth=0.8)
        
    if shade_area:
        ax1.fill_between(player_df[x], player_df[y], color='lightblue', alpha=0.5)

    ax1.tick_params(axis='y', labelcolor='black')
    ax1.grid()
    plt.title(f"Player {players[player_ID]}")
    plt.show()

# Create widgets for playerkey, df_index, x, y, y_2, x_min, and x_max
df_index_widget = widgets.Dropdown(options=[(f"DataFrame {i}", i) for i in range(len(dtf_lists))])
x_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="time")
y_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="percent_return")
y_2_widget = widgets.Dropdown(options=[None]+list(dtf_lists[0].columns), value=None)
x_min_widget = widgets.FloatText(description="x_min", value=time_min)
x_max_widget = widgets.FloatText(description="x_max", value=time_max)
show_line_widget = widgets.Checkbox(description='Show line', value=False)
shade_area_widget = widgets.Checkbox(description='Shade area', value=False)

# Create a function to update the players_widget based on the selected df_index
def update_players_widget(df_index):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_key_widget.options = [(p, i) for i, p in enumerate(players)]

# Create a players_widget for the initial df_index value
initial_df_index = df_index_widget.value
initial_df = dtf_lists[initial_df_index]
initial_players = initial_df["playerkey"].unique().tolist()
player_key_widget = widgets.Dropdown(options=[], value=None)

# Call update_players_widget with the initial_df_index value to set the options for player_key_widget
update_players_widget(initial_df_index)

widgets.interact(plot_scatters, player_ID=player_key_widget, df_index=df_index_widget,
                 x=x_widget, y=y_widget, y_2=y_2_widget, x_min=x_min_widget, x_max=x_max_widget,
                 show_line=show_line_widget, shade_area=shade_area_widget)

# Update the player_key_widget options when df_index changes
def on_df_index_change(change):
    update_players_widget(change.new)

df_index_widget.observe(on_df_index_change, names='value')
update_players_widget(initial_df_index)  # update the player_key_widget options initially


8468 1


interactive(children=(Dropdown(description='player_ID', options=((2, 0), (3, 1), (4, 2), (6, 3), (7, 4), (8, 5…

### Second Plot Lossing Dataframes:

List of DataFrames that contains all lossing dataframes
* dtf_lossing = [dtf_inc_slot_lossing_2, dtf_inc_slot_lossing_3, dtf_inc_slot_lossing_4, dtf_inc_maxbet_lossing_2, dtf_inc_maxbet_lossing_3, dtf_inc_maxbet_lossing_4]

In [22]:
# Make a list of all the dataframes that are match and slice
dtf_lists = dtf_lossing

# Calculate the max and min values for the 'time' column for each DataFrame
time_max = max([df["time"].max() for df in dtf_lists])
time_min = min([df["time"].min() for df in dtf_lists])

print(time_max, time_min)

# Create a scatter plot of the players wins for only player with key 3
def plot_scatters(player_ID, df_index, x="time", y="percent_return", y_2=None, x_min=None, x_max=None, show_line=False, shade_area=False):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_df = df[df["playerkey"] == players[player_ID]]
    
    fig, ax1 = plt.subplots()
    ax1.set_xlabel(x)
    ax1.set_ylabel(y, color='royalblue')
    if x_min is not None and x_max is not None:
        player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
    ax1.scatter(x=player_df[x], y=player_df[y], color='royalblue')
    
    if y_2 is not None:
        ax2 = ax1.twinx()
        ax2.set_ylabel(y_2, color='r')
        if x_min is not None and x_max is not None:
            player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
        ax2.scatter(x=player_df[x], y=player_df[y_2], color='orangered', marker='s')
        ax2.tick_params(axis='y', labelcolor='orangered')
        # Add a line to the plot if show_line is True
        if show_line:
            ax2.plot(player_df[x], player_df[y_2], color='black', linewidth=0.8, linestyle='--')
        if shade_area:
            ax2.fill_between(player_df[x], player_df[y_2], color='lightcoral', alpha=0.5)
    
    if show_line:
        ax1.plot(player_df[x], player_df[y], color='black', linewidth=0.8)
        
    if shade_area:
        ax1.fill_between(player_df[x], player_df[y], color='lightblue', alpha=0.5)

    ax1.tick_params(axis='y', labelcolor='black')
    ax1.grid()
    plt.title(f"Player {players[player_ID]}")
    plt.show()

# Create widgets for playerkey, df_index, x, y, y_2, x_min, and x_max
df_index_widget = widgets.Dropdown(options=[(f"DataFrame {i}", i) for i in range(len(dtf_lists))])
x_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="time")
y_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="percent_return")
y_2_widget = widgets.Dropdown(options=[None]+list(dtf_lists[0].columns), value=None)
x_min_widget = widgets.FloatText(description="x_min", value=time_min)
x_max_widget = widgets.FloatText(description="x_max", value=time_max)
show_line_widget = widgets.Checkbox(description='Show line', value=False)
shade_area_widget = widgets.Checkbox(description='Shade area', value=False)

# Create a function to update the players_widget based on the selected df_index
def update_players_widget(df_index):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_key_widget.options = [(p, i) for i, p in enumerate(players)]

# Create a players_widget for the initial df_index value
initial_df_index = df_index_widget.value
initial_df = dtf_lists[initial_df_index]
initial_players = initial_df["playerkey"].unique().tolist()
player_key_widget = widgets.Dropdown(options=[], value=None)

# Call update_players_widget with the initial_df_index value to set the options for player_key_widget
update_players_widget(initial_df_index)

widgets.interact(plot_scatters, player_ID=player_key_widget, df_index=df_index_widget,
                 x=x_widget, y=y_widget, y_2=y_2_widget, x_min=x_min_widget, x_max=x_max_widget,
                 show_line=show_line_widget, shade_area=shade_area_widget)

# Update the player_key_widget options when df_index changes
def on_df_index_change(change):
    update_players_widget(change.new)

df_index_widget.observe(on_df_index_change, names='value')
update_players_widget(initial_df_index)  # update the player_key_widget options initially


8005 1


interactive(children=(Dropdown(description='player_ID', options=((2, 0), (3, 1), (4, 2), (8, 3), (11, 4), (12,…

### Third Plot Winning Dataframes:

List of DataFrames that contains all winning dataframes:
* dtf_winning = [dtf_inc_slot_winning_2, dtf_inc_slot_winning_3, dtf_inc_slot_winning_4, dtf_inc_maxbet_winning_2, dtf_inc_maxbet_winning_3]

In [23]:
# Make a list of all the dataframes that are match and slice
dtf_lists = dtf_winning

# Calculate the max and min values for the 'time' column for each DataFrame
time_max = max([df["time"].max() for df in dtf_lists])
time_min = min([df["time"].min() for df in dtf_lists])

print(time_max, time_min)

# Create a scatter plot of the players wins for only player with key 3
def plot_scatters(player_ID, df_index, x="time", y="percent_return", y_2=None, x_min=None, x_max=None, show_line=False, shade_area=False):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_df = df[df["playerkey"] == players[player_ID]]
    
    fig, ax1 = plt.subplots()
    ax1.set_xlabel(x)
    ax1.set_ylabel(y, color='royalblue')
    if x_min is not None and x_max is not None:
        player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
    ax1.scatter(x=player_df[x], y=player_df[y], color='royalblue')
    
    if y_2 is not None:
        ax2 = ax1.twinx()
        ax2.set_ylabel(y_2, color='r')
        if x_min is not None and x_max is not None:
            player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
        ax2.scatter(x=player_df[x], y=player_df[y_2], color='orangered', marker='s')
        ax2.tick_params(axis='y', labelcolor='orangered')
        # Add a line to the plot if show_line is True
        if show_line:
            ax2.plot(player_df[x], player_df[y_2], color='black', linewidth=0.8, linestyle='--')
        if shade_area:
            ax2.fill_between(player_df[x], player_df[y_2], color='lightcoral', alpha=0.5)
    
    if show_line:
        ax1.plot(player_df[x], player_df[y], color='black', linewidth=0.8)
        
    if shade_area:
        ax1.fill_between(player_df[x], player_df[y], color='lightblue', alpha=0.5)

    ax1.tick_params(axis='y', labelcolor='black')
    ax1.grid()
    plt.title(f"Player {players[player_ID]}")
    plt.show()

# Create widgets for playerkey, df_index, x, y, y_2, x_min, and x_max
df_index_widget = widgets.Dropdown(options=[(f"DataFrame {i}", i) for i in range(len(dtf_lists))])
x_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="time")
y_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="percent_return")
y_2_widget = widgets.Dropdown(options=[None]+list(dtf_lists[0].columns), value=None)
x_min_widget = widgets.FloatText(description="x_min", value=time_min)
x_max_widget = widgets.FloatText(description="x_max", value=time_max)
show_line_widget = widgets.Checkbox(description='Show line', value=False)
shade_area_widget = widgets.Checkbox(description='Shade area', value=False)

# Create a function to update the players_widget based on the selected df_index
def update_players_widget(df_index):
    df = dtf_lists[df_index]
    players = df["playerkey"].unique().tolist()
    player_key_widget.options = [(p, i) for i, p in enumerate(players)]

# Create a players_widget for the initial df_index value
initial_df_index = df_index_widget.value
initial_df = dtf_lists[initial_df_index]
initial_players = initial_df["playerkey"].unique().tolist()
player_key_widget = widgets.Dropdown(options=[], value=None)

# Call update_players_widget with the initial_df_index value to set the options for player_key_widget
update_players_widget(initial_df_index)

widgets.interact(plot_scatters, player_ID=player_key_widget, df_index=df_index_widget,
                 x=x_widget, y=y_widget, y_2=y_2_widget, x_min=x_min_widget, x_max=x_max_widget,
                 show_line=show_line_widget, shade_area=shade_area_widget)

# Update the player_key_widget options when df_index changes
def on_df_index_change(change):
    update_players_widget(change.new)

df_index_widget.observe(on_df_index_change, names='value')
update_players_widget(initial_df_index)  # update the player_key_widget options initially


2136 3


interactive(children=(Dropdown(description='player_ID', options=((6, 0), (14, 1), (18, 2), (19, 3), (57, 4), (…