# Exploratory Data Analysis
I am using this notebook to learn how gamblers behaviors are similar to those of investors.

## Upload and Transform Data

## Filter Columns and Inspect

In [75]:
import pandas as pd
import os

# Set working directory
path = '/Users/mau/Library/CloudStorage/Dropbox/Mac/Documents/Dissertation/Chapter 2/Data'
os.chdir(path)

# Load data into a DataFrame
dtf = pd.read_parquet("slot_data_sample.parquet")

# Select only specific columns
filter = ['playercashableamt', 'wageredamt', 'casino_grosswin', 'playerkey',
       'slotdenominationname', 'slotthemekey']

# Load just specific colums data into a DataFrame
df = pd.read_parquet("slot_data_sample.parquet", columns=filter)

# Print column names
print(df.columns)

# Print general information about the DataFrame
print(df.info())

# Count unique players
print(df['playerkey'].nunique())

Index(['playercashableamt', 'wageredamt', 'casino_grosswin', 'playerkey',
       'slotdenominationname', 'slotthemekey'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90274 entries, 0 to 90273
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   playercashableamt     90274 non-null  float64
 1   wageredamt            90274 non-null  float64
 2   casino_grosswin       90274 non-null  float64
 3   playerkey             90274 non-null  int64  
 4   slotdenominationname  90274 non-null  object 
 5   slotthemekey          90274 non-null  int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 4.1+ MB
None
98


## Calculate Foundamental Variables

The following variables were calculated using existing data:
* _player_loss_: how much money each player has lost in each gamble.
* _player_wins_: equals the amount of money they bet plus how much they won.
* _percent_return_: the return in player's bets for each gamble played. 

$$\text{percent return} = (\frac{df[wins] - df[wageredamt]}{df[wageredamt]})*100$$

* _playercashableamt_pct_change_: calculates the rate of change of player's outstanding gambling amount. 

$$\text{playercashableamt \% change} = (\frac{df[playercashableamt_{t+1}] - df[playercashableamt_{t}]}{df[playercashableamt_{t}]})*100$$

In [76]:
# Crate a new colum that is the inverse of casino_grosswin, named "player_loss"
df["player_loss"] = df["casino_grosswin"] * -1

df['player_wins'] = df['wageredamt'] + df['player_loss']
# Calculate percentage return for each gamble and add it as a new column
df["percent_return"] = (df["player_wins"] - df["wageredamt"]) / df["wageredamt"] * 100

# Calculate the percent rate of change of playerscashableamt per playerkey
df["playercashableamt_pct_change"] = df.groupby("playerkey")["playercashableamt"].pct_change()
# Print the first 5 rows of the DataFrame
print(df.head())

   playercashableamt  wageredamt  casino_grosswin  playerkey  \
0               80.0        10.0             10.0          2   
1               70.0        10.0             10.0          2   
2               60.0        10.0             10.0          2   
3               50.0        10.0            -90.0          2   
4              140.0        10.0              0.0          2   

  slotdenominationname  slotthemekey  player_loss  player_wins  \
0               $5.00            390        -10.0          0.0   
1               $5.00            390        -10.0          0.0   
2               $5.00            390        -10.0          0.0   
3               $5.00            390         90.0        100.0   
4               $5.00            390         -0.0         10.0   

   percent_return  playercashableamt_pct_change  
0          -100.0                           NaN  
1          -100.0                     -0.125000  
2          -100.0                     -0.142857  
3           900.0 

## Patterns in Return Stream

In this section, I am looking for return stream patterns that are similar to the market returns given to subjects in Saffort et.al 2008 experiemnt. These market returns followed historical returns from the DJIA from 1925 to 1964:

$$\text{\% return DJIA 1925-1964}: [30.0, 0.3, 28.8, 48.2, -17.2, -33.8, -52.7, -23.1, 66.7, 4.1, 38.5, 24.8, -32.8, 28.1, -2.9, -12.7, -15.4, 7.6, 13.8, 12.1, 26.6,\\ -8.1, 2.2, -2.1, 12.9, 17.6, 14.4, 8.4, -3.8, 44.0, 20.8, 2.3, -12.8, 34.0, 16.4, -9.3, 18.7, -10.8, 17.0, 14.0]$$

$$\text{pattern of returns: [1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, 1, 1]}$$

### Filter Data
We need to select players who have at least 40 gambles to be able to compare it to the 40 investing periods of Safford's subjects.

In [77]:
# Create a list of players that appear at least 40 times
players40 = df["playerkey"].value_counts()[df["playerkey"].value_counts() >= 40].index.tolist()
print(players40)
print(len(players40))

# Create a list of players that appear less than 40 times
players40less = df["playerkey"].value_counts()[df["playerkey"].value_counts() < 40].index.tolist()
print(players40less)
print(len(players40less))

# Create a new DataFrame with only the players that appear at least 40 times
df40 = df[df["playerkey"].isin(players40)]
print(df40.shape)

[33, 93, 95, 94, 20, 73, 48, 66, 44, 18, 38, 76, 29, 100, 62, 6, 99, 14, 89, 90, 8, 79, 13, 9, 69, 23, 40, 4, 61, 87, 19, 43, 72, 98, 16, 2, 36, 92, 17, 63, 12, 46, 97, 54, 91, 86, 84, 3, 85, 52, 30, 74, 35, 7, 96, 56, 65, 68, 83, 27, 71, 37, 70, 47, 11, 53, 77, 49, 39, 25, 88, 82, 42, 21, 22, 41, 80, 57, 51, 81, 78, 75, 10]
83
[55, 5, 50, 34, 64, 31, 45, 32, 58, 24, 59, 67, 26, 28, 15]
15
(90000, 10)


### Patter Recognition
Now that we have filter the data, we can procede to find a patter similar to those of Safford's experiemnt. 

* Difine Pattern to look for.
* Create patter variable in our dataframe _sign_
* Conduct a hard match.
* Conduct a rough match.

In [82]:
# Define pattern to search for
pattern = [1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, 1, 1]

# Convert percent_return column to sign column
df40.loc[:, "sign"] = df40["percent_return"].apply(lambda x: 1 if x >= 0 else -1)

# Use rolling function to search for pattern in sign column
df40["pattern_match"] = df40["sign"].rolling(len(pattern)).apply(lambda x: (x == pattern).all())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df40.loc[:, "sign"] = df40["percent_return"].apply(lambda x: 1 if x >= 0 else -1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df40["pattern_match"] = df40["sign"].rolling(len(pattern)).apply(lambda x: (x == pattern).all())
