# Exploratory Data Analysis
I am using this notebook to learn how gamblers behaviors are similar to those of investors.

## Define Libraries

In [1]:
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interactive, fixed, IntSlider, HBox, Layout, VBox

# Getting rid of the SettingWithCopyWarning: 
pd.options.mode.chained_assignment = None

## Upload Data

In [2]:
# Set working directory
path = '/Users/mau/Dropbox/Mac/Documents/Dissertation/Safford2018/Data'
os.chdir(path)

# Load data into a DataFrame
dtf = pd.read_csv('40PerSubjectData.csv', header=0, dtype={'Treatment (D)': int, 'Subject': int, 'Year': int})

# Create a columm named 'percent_change' that is the perecent change of the Objective column
dtf['percent_change_EAB'] = dtf.groupby("Subject")['EAB'].pct_change()

print(dtf.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1520 entries, 0 to 1519
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Treatment (D)          1520 non-null   int64  
 1   Condition              1520 non-null   object 
 2   During/Post            1520 non-null   int64  
 3   Subject                1520 non-null   int64  
 4   Year                   1520 non-null   int64  
 5   Blocks                 1520 non-null   int64  
 6   Objective              1520 non-null   float64
 7   PerAllo                1520 non-null   float64
 8   Belief                 1520 non-null   float64
 9   EAB                    1520 non-null   float64
 10  Sex                    1520 non-null   object 
 11  Age                    1520 non-null   int64  
 12  Status                 1520 non-null   int64  
 13  InvExp                 1520 non-null   float64
 14  BPay                   1520 non-null   float64
 15  TBPa

In [3]:
# Print unique values of slotdenominationname
print(dtf['Subject'].unique())

# Count number of unique players
print("Total number of Subjects:", dtf['Subject'].nunique())

[ 41  42  43  44  45  46  47  48  49  50  51  52  54  55  56  57  58  59
  60 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
 119 120]
Total number of Subjects: 38


### Change Recognition
It is time to see which subjects increased or decreased their percent allocation in the risky assest.  

* Define function to look for our desire chage.
* We are going to be using _PerAllo_ to observe the change
* Create a column _increase_allocation_ that is 1 everytime _PerAllo_ increases, 0 otherwise for each subject
* Create a column _decrease_allocation_ that is 1 everytime _PerAllo_ decreases, 0 otherwise for each subject

In [4]:
from ipynb.fs.full.functions import count_increase, count_decrease

In [5]:
players_increase_allocation = count_increase(dtf, "increase_allocation", "Subject", "PerAllo")
players_decrease_allocation = count_decrease(dtf, "decrease_allocation", "Subject", "PerAllo")

Count of players who increase_allocation : 36
Count of times each player increase_allocation : {41: 6, 42: 12, 43: 14, 44: 14, 45: 19, 46: 10, 48: 14, 49: 16, 50: 12, 51: 10, 52: 10, 54: 12, 55: 15, 56: 11, 57: 12, 58: 15, 59: 14, 60: 14, 102: 15, 103: 15, 105: 9, 106: 8, 107: 18, 108: 13, 109: 19, 110: 18, 111: 16, 112: 3, 113: 5, 114: 16, 115: 6, 116: 13, 117: 13, 118: 1, 119: 5, 120: 18}
Player who changes the most: 45
------------------------------------------------------------------------------------------------------------------
Count of players who decrease_allocation : 38
Count of times each player decrease_allocation : {41: 6, 42: 9, 43: 15, 44: 17, 45: 19, 46: 8, 47: 1, 48: 17, 49: 20, 50: 10, 51: 9, 52: 9, 54: 11, 55: 16, 56: 14, 57: 17, 58: 14, 59: 10, 60: 14, 102: 13, 103: 14, 104: 3, 105: 8, 106: 10, 107: 18, 108: 10, 109: 17, 110: 16, 111: 12, 112: 2, 113: 8, 114: 22, 115: 8, 116: 9, 117: 11, 118: 1, 119: 5, 120: 20}
Player who changes the most: 114
---------------------

### Change Recognition Part 2

In this section we are analazying the change of slot denominantion and maxbet when the previous returns were either positive or negative.

In [6]:

from ipynb.fs.full.functions import count_subjects_inc_lossing, count_subjects_inc_winning

Let's define the palyers who increase their slot denomination while lossing the previous sessions:

In [7]:
players_inc_allo_lossing_2 = count_subjects_inc_lossing(dtf, 2, "increase_allocation", "Objective") # Look at previous 2 sessions
players_inc_allo_lossing_3 = count_subjects_inc_lossing(dtf, 3, "increase_allocation", "Objective") # Look at previous 3 sessions
players_inc_allo_lossing_4 = count_subjects_inc_lossing(dtf, 4, "increase_allocation", "Objective") # Look at previous 4 sessions

# players_inc_allo_lossing_2 = count_subjects_inc_lossing(dtf, 2, "increase_allocation", "percent_change_EAB") # Look at previous 2 sessions
# players_inc_allo_lossing_3 = count_subjects_inc_lossing(dtf, 3, "increase_allocation", "percent_change_EAB") # Look at previous 3 sessions
# players_inc_allo_lossing_4 = count_subjects_inc_lossing(dtf, 4, "increase_allocation", "percent_change_EAB") # Look at previous 4 sessions

Count of players that increase their allocation when they were losing the previous 2 sessions: 20
Count of times each player increases their allocation when they were losing the previous 2 sessions: {41: 1, 44: 1, 45: 2, 46: 1, 49: 1, 50: 3, 54: 1, 57: 2, 58: 2, 59: 1, 60: 1, 103: 1, 105: 2, 109: 2, 111: 2, 113: 1, 114: 4, 117: 3, 119: 1, 120: 2}
Player who increases their allocation when they were losing the previous 2 sessions the most: 114
------------------------------------------------------------------------------------------------------------------
Count of players that increase their allocation when they were losing the previous 3 sessions: 14
Count of times each player increases their allocation when they were losing the previous 3 sessions: {45: 1, 46: 1, 49: 1, 50: 1, 54: 1, 57: 1, 58: 2, 59: 1, 103: 1, 105: 1, 111: 1, 113: 1, 114: 3, 117: 2}
Player who increases their allocation when they were losing the previous 3 sessions the most: 114
------------------------------------

Let's define the players who increase their slot denomination while winning their previous sessions:

In [8]:
players_inc_allo_winning_2 = count_subjects_inc_winning(dtf, 2, "increase_allocation", "Objective") # Look at previous 2 sessions
players_inc_allo_winning_3 = count_subjects_inc_winning(dtf, 3, "increase_allocation", "Objective") # Look at previous 3 sessions
players_inc_allo_winning_4 = count_subjects_inc_winning(dtf, 4, "increase_allocation", "Objective") # Look at previous 4 sessions

# players_inc_allo_winning_2 = count_subjects_inc_winning(dtf, 2, "increase_allocation", "percent_change_EAB") # Look at previous 2 sessions
# players_inc_allo_winning_3 = count_subjects_inc_winning(dtf, 3, "increase_allocation", "percent_change_EAB") # Look at previous 3 sessions
# players_inc_allo_winning_4 = count_subjects_inc_winning(dtf, 4, "increase_allocation", "percent_change_EAB") # Look at previous 4 sessions

Count of players that increase their allocation when they were winning the previous 2 sessions: 35
Count of times each player increases their allocation when they were winning the previous 2 sessions: {41: 3, 42: 5, 43: 6, 44: 8, 45: 6, 46: 4, 48: 9, 49: 8, 50: 7, 51: 7, 52: 7, 54: 5, 55: 7, 56: 8, 57: 6, 58: 7, 59: 9, 60: 7, 102: 10, 103: 6, 105: 6, 106: 6, 107: 13, 108: 10, 109: 7, 110: 10, 111: 7, 112: 2, 113: 1, 114: 4, 115: 4, 116: 7, 117: 3, 119: 2, 120: 6}
Player who increases their allocation when they were winning the previous 2 sessions the most: 107
------------------------------------------------------------------------------------------------------------------
Count of players that increase their allocation when they were winning the previous 3 sessions: 34
Count of times each player increases their allocation when they were winning the previous 3 sessions: {41: 2, 42: 4, 43: 2, 44: 4, 45: 4, 46: 1, 48: 4, 49: 3, 50: 4, 51: 2, 52: 3, 54: 2, 55: 2, 56: 4, 57: 4, 58: 4, 59: 

Let's define the players who decrease their slot denomination while lossing their previous sessions:

In [9]:
players_dec_allo_lossing_2 = count_subjects_inc_lossing(dtf, 2, "decrease_allocation", "Objective") # Look at previous 2 sessions
players_dec_allo_lossing_3 = count_subjects_inc_lossing(dtf, 3, "decrease_allocation", "Objective") # Look at previous 3 sessions
players_dec_allo_lossing_4 = count_subjects_inc_lossing(dtf, 4, "decrease_allocation", "Objective") # Look at previous 4 sessions

# players_dec_allo_lossing_2 = count_subjects_inc_lossing(dtf, 2, "decrease_allocation", "percent_change_EAB") # Look at previous 2 sessions
# players_dec_allo_lossing_3 = count_subjects_inc_lossing(dtf, 3, "decrease_allocation", "percent_change_EAB") # Look at previous 3 sessions
# players_dec_allo_lossing_4 = count_subjects_inc_lossing(dtf, 4, "decrease_allocation", "percent_change_EAB") # Look at previous 4 sessions

Count of players that increase their allocation when they were losing the previous 2 sessions: 32
Count of times each player increases their allocation when they were losing the previous 2 sessions: {42: 3, 43: 3, 44: 3, 45: 3, 47: 1, 48: 4, 49: 3, 50: 2, 51: 2, 52: 1, 54: 2, 55: 5, 56: 3, 57: 1, 58: 2, 59: 2, 60: 3, 102: 3, 103: 3, 105: 1, 106: 3, 107: 5, 108: 1, 109: 3, 110: 4, 111: 2, 112: 1, 113: 1, 114: 1, 116: 4, 119: 1, 120: 3}
Player who increases their allocation when they were losing the previous 2 sessions the most: 55
------------------------------------------------------------------------------------------------------------------
Count of players that increase their allocation when they were losing the previous 3 sessions: 29
Count of times each player increases their allocation when they were losing the previous 3 sessions: {42: 3, 43: 2, 44: 2, 45: 2, 48: 3, 49: 1, 50: 2, 51: 2, 52: 1, 54: 1, 55: 3, 56: 1, 57: 1, 59: 2, 60: 2, 102: 2, 103: 1, 105: 1, 106: 3, 107: 3, 108:

Let's define the players who decrease their slot denomination while winning their previous sessions:

In [10]:
players_dec_allo_winning_2 = count_subjects_inc_winning(dtf, 2, "decrease_allocation", "Objective") # Look at previous 2 sessions
players_dec_allo_winning_3 = count_subjects_inc_winning(dtf, 3, "decrease_allocation", "Objective") # Look at previous 3 sessions
players_dec_allo_winning_4 = count_subjects_inc_winning(dtf, 4, "decrease_allocation", "Objective") # Look at previous 4 sessions

# players_dec_allo_winning_2 = count_subjects_inc_winning(dtf, 2, "decrease_allocation", "percent_change_EAB") # Look at previous 2 sessions
# players_dec_allo_winning_3 = count_subjects_inc_winning(dtf, 3, "decrease_allocation", "percent_change_EAB") # Look at previous 3 sessions
# players_dec_allo_winning_4 = count_subjects_inc_winning(dtf, 4, "decrease_allocation", "percent_change_EAB") # Look at previous 4 sessions

Count of players that increase their allocation when they were winning the previous 2 sessions: 32
Count of times each player increases their allocation when they were winning the previous 2 sessions: {41: 2, 42: 1, 43: 5, 44: 4, 45: 9, 46: 6, 48: 4, 49: 6, 50: 1, 51: 1, 52: 2, 54: 3, 55: 5, 56: 2, 57: 6, 58: 4, 59: 1, 60: 3, 102: 2, 103: 5, 104: 3, 105: 3, 107: 1, 109: 7, 110: 3, 111: 4, 113: 4, 114: 10, 117: 6, 118: 1, 119: 2, 120: 8}
Player who increases their allocation when they were winning the previous 2 sessions the most: 114
------------------------------------------------------------------------------------------------------------------
Count of players that increase their allocation when they were winning the previous 3 sessions: 31
Count of times each player increases their allocation when they were winning the previous 3 sessions: {41: 1, 42: 1, 43: 5, 44: 4, 45: 6, 46: 6, 48: 4, 49: 6, 51: 1, 52: 2, 54: 2, 55: 5, 56: 2, 57: 5, 58: 3, 59: 1, 60: 2, 102: 2, 103: 5, 104: 2, 

## Slicing DataFrames per Matched Players and Visualizing Outcomes

Stuff is working perfectly!


In [11]:
from ipynb.fs.full.functions import filter_sub_match

In [14]:
# Create a new DataFrame with only the players that appear in players_increase_slot (17 will set the rolling window to 17, 8 obersevations before and 8 observations after)
r_window = 9
# Create a new DataFrame with only the players that appear in players_inc_allo_lossing_2
dtf_all_inc_allo_lossing_2, dtf_inc_allo_lossing_2 = filter_sub_match(df=dtf, players_match=players_inc_allo_lossing_2, match_column="increase_allocation_lossing_2", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_allo_lossing_3
dtf_all_inc_allo_lossing_3, dtf_inc_allo_lossing_3 = filter_sub_match(df=dtf, players_match=players_inc_allo_lossing_3, match_column="increase_allocation_lossing_3", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_allo_lossing_4
dtf_all_inc_allo_lossing_4, dtf_inc_allo_lossing_4 = filter_sub_match(df=dtf, players_match=players_inc_allo_lossing_4, match_column="increase_allocation_lossing_4", rolling_window=r_window, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_list_inc_lossing = [dtf_inc_allo_lossing_2, dtf_inc_allo_lossing_3, dtf_inc_allo_lossing_4]

In [None]:
# Create a new DataFrame with only the players that appear in players_inc_allo_winning_2
dtf_all_inc_allo_winning_2, dtf_inc_allo_winning_2 = filter_sub_match(df=dtf, players_match=players_inc_allo_winning_2, match_column="increase_allocation_winning_2", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_allo_winning_3
dtf_all_inc_allo_winning_3, dtf_inc_allo_winning_3 = filter_sub_match(df=dtf, players_match=players_inc_allo_winning_3, match_column="increase_allocation_winning_3", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_inc_allo_winning_4
dtf_all_inc_allo_winning_4, dtf_inc_allo_winning_4 = filter_sub_match(df=dtf, players_match=players_inc_allo_winning_4, match_column="increase_allocation_winning_4", rolling_window=r_window, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_list_inc_winning = [dtf_inc_allo_winning_2, dtf_inc_allo_winning_3, dtf_inc_allo_winning_4]

In [None]:
# Create a new DataFrame with only the players that appear in players_dec_allo_lossing_2
dtf_all_dec_allo_lossing_2, dtf_dec_allo_lossing_2 = filter_sub_match(df=dtf, players_match=players_dec_allo_lossing_2, match_column="decrease_allocation_lossing_2", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_dec_allo_lossing_3
dtf_all_dec_allo_lossing_3, dtf_dec_allo_lossing_3 = filter_sub_match(df=dtf, players_match=players_dec_allo_lossing_3, match_column="decrease_allocation_lossing_3", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_dec_allo_lossing_4
dtf_all_dec_allo_lossing_4, dtf_dec_allo_lossing_4 = filter_sub_match(df=dtf, players_match=players_dec_allo_lossing_4, match_column="decrease_allocation_lossing_4", rolling_window=r_window, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_list_dec_lossing = [dtf_dec_allo_lossing_2, dtf_dec_allo_lossing_3, dtf_dec_allo_lossing_4]

In [None]:
# Create a new DataFrame with only the players that appear in players_dec_allo_winning_2
dtf_all_dec_allo_winning_2, dtf_dec_allo_winning_2 = filter_sub_match(df=dtf, players_match=players_dec_allo_winning_2, match_column="decrease_allocation_winning_2", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_dec_allo_winning_3
dtf_all_dec_allo_winning_3, dtf_dec_allo_winning_3 = filter_sub_match(df=dtf, players_match=players_dec_allo_winning_3, match_column="decrease_allocation_winning_3", rolling_window=r_window, fill_value=False)

# Create a new DataFrame with only the players that appear in players_dec_allo_winning_4
dtf_all_dec_allo_winning_4, dtf_dec_allo_winning_4 = filter_sub_match(df=dtf, players_match=players_dec_allo_winning_4, match_column="decrease_allocation_winning_4", rolling_window=r_window, fill_value=False)

# Create a list of DataFrames that contains all the DataFrames that we want to plot
dtf_list_dec_winning = [dtf_dec_allo_winning_2, dtf_dec_allo_winning_3, dtf_dec_allo_winning_4]

## Interactive Plots

The following section would be used to explore the data in an interactive way. These plots allow for user interaction, such as zooming, panning, and selecting data points. Users can customize the plot by choosing different variables to plot, adjusting axes ranges, and selecting data subsets. The interactive plots provide a dynamic way to visually explore the data and can reveal patterns or relationships that might not be apparent from static plots alone. By using interactive plots, we can gain a deeper understanding of the data and make more informed decisions during the data analysis process.

In [15]:
import matplotlib.pyplot as plt
import ipywidgets as widgets

# Make a list of all the dataframes that are match and slice
dtf_lists = dtf_list_inc_lossing

# Calculate the max and min values for the 'time' column for each DataFrame
time_max = max([df["Year"].max() for df in dtf_lists])
time_min = min([df["Year"].min() for df in dtf_lists])

# Create a scatter plot of the players wins for only player with key 3
def plot_scatters(player_ID, df_index, x="Year", y="PerAllo", y_2=None, x_min=None, x_max=None, show_line=False, shade_area=False):
    df = dtf_lists[df_index]
    players = df["Subject"].unique().tolist()
    player_df = df[df["Subject"] == players[player_ID]]
    
    fig, ax1 = plt.subplots()
    ax1.set_xlabel(x)
    ax1.set_ylabel(y, color='royalblue')
    if x_min is not None and x_max is not None:
        player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
    ax1.scatter(x=player_df[x], y=player_df[y], color='royalblue')
    
    if y_2 is not None:
        ax2 = ax1.twinx()
        ax2.set_ylabel(y_2, color='r')
        if x_min is not None and x_max is not None:
            player_df = player_df[(player_df[x] >= x_min) & (player_df[x] <= x_max)]
        ax2.scatter(x=player_df[x], y=player_df[y_2], color='orangered', marker='s')
        ax2.tick_params(axis='y', labelcolor='orangered')
        # Add a line to the plot if show_line is True
        if show_line:
            ax2.plot(player_df[x], player_df[y_2], color='black', linewidth=0.8, linestyle='--')
        if shade_area:
            ax2.fill_between(player_df[x], player_df[y_2], color='lightcoral', alpha=0.5)
    
    if show_line:
        ax1.plot(player_df[x], player_df[y], color='black', linewidth=0.8)
        
    if shade_area:
        ax1.fill_between(player_df[x], player_df[y], color='lightblue', alpha=0.5)

    ax1.tick_params(axis='y', labelcolor='black')
    ax1.grid()
    plt.title(f"Subject {players[player_ID]}")
    plt.show()

# Create widgets for playerkey, df_index, x, y, y_2, x_min, and x_max
df_index_widget = widgets.Dropdown(options=[(f"DataFrame {i}", i) for i in range(len(dtf_lists))])
x_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="Year")
y_widget = widgets.Dropdown(options=list(dtf_lists[0].columns), value="PerAllo")
y_2_widget = widgets.Dropdown(options=[None]+list(dtf_lists[0].columns), value=None)
x_min_widget = widgets.FloatText(description="x_min", value=time_min)
x_max_widget = widgets.FloatText(description="x_max", value=time_max)
show_line_widget = widgets.Checkbox(description='Show line', value=False)
shade_area_widget = widgets.Checkbox(description='Shade area', value=False)

# Create a function to update the players_widget based on the selected df_index
def update_players_widget(df_index):
    df = dtf_lists[df_index]
    players = df["Subject"].unique().tolist()
    player_key_widget.options = [(p, i) for i, p in enumerate(players)]

# Create a players_widget for the initial df_index value
initial_df_index = df_index_widget.value
initial_df = dtf_lists[initial_df_index]
initial_players = initial_df["Subject"].unique().tolist()
player_key_widget = widgets.Dropdown(options=[], value=None)

# Call update_players_widget with the initial_df_index value to set the options for player_key_widget
update_players_widget(initial_df_index)

widgets.interact(plot_scatters, player_ID=player_key_widget, df_index=df_index_widget,
                 x=x_widget, y=y_widget, y_2=y_2_widget, x_min=x_min_widget, x_max=x_max_widget,
                 show_line=show_line_widget, shade_area=shade_area_widget)

# Update the player_key_widget options when df_index changes
def on_df_index_change(change):
    update_players_widget(change.new)

df_index_widget.observe(on_df_index_change, names='value')
update_players_widget(initial_df_index)  # update the player_key_widget options initially


interactive(children=(Dropdown(description='player_ID', options=((41, 0), (44, 1), (45, 2), (46, 3), (49, 4), …