# Preprocessing
**Copyright 2023 (c) Naomi Chaix-Echel & Nicolas P Rougier**  
Released under a BSD 2-clauses license

This notebook reads and process the original dataset to ensure that tasks are named properly.
The original dataset is untouched and the processed dataset is saved using an alternative filename.

| Name           | Type     | Signification                 |
| :------------- | :------- | :---------------------------- |
| **subject_id** | string   | Identification of the subject |
| **date**       | datetime | Date whe then trial was made   | 
| **task_id**    | integer  | Identification of the task    | 
| **P_left**     | float    | Reward probability of the left stimulus |
| **V_left**     | float    | Reward amount of the left stimulus |
| **P_right**    | float    | Reward probability of the right stimulus |
| **P_right**    | float    | Reward amount of the right stimulus |
| **response**   | int      | Response (0: left, 1: right) |
| **reward**     | int      | Reward delivered (1) or not (0) |
| **RT**         | int      | Response time (ms) |


## Lottery description

For all the following types of lottery, we consider a choice between (x1, p1) and (x2, p2), xi being the value and pi being the probability:
* xi can be positive, null or negative: -3, -2, -1, 0, +1, +2, +3
* pi can be: 0.25, 0.50, 0.75 or 1.00


### Type 1 : x1 > 0 and x2 < 0, p1 = p2

* Lottery pairs containing one lottery with potential losses (LPL) and on lottery with potential gains (LPG)
* assess the discrimination of losses from the gains
* 36 different lottery pairs.

### Type 2 : p1 = p2 and x1 > x2 > 0

* LPG with a stochastic dominant option differentiating only by the x values
* assess the discrimination of positive x-values
* 12 different lottery pairs

### Type 3 : p1 = p2 and x1 < x2 < 0

* LPL with a stochastic dominant option differentiating only by the x values;
* assess the discrimination of negative x-values
* 12 different lottery pairs

### Type 4 : p1 > p2 and x1 = x2 > 0

* LPG with a stochastic dominant option differentiating only by the p values
* assess the discrimination of p-values associated to positive x-values
* 12 different lottery pairs 

### Type 5 : p1 < p2 and x1 = x2 < 0

* LPL with a stochastic dominant option differentiating only by the p values
* assess the discrimination of probabilities associated to negative quantities
* 18 different lottery pairs

### Type 6 : p1 < p2 and x1 > x2 > 0

* LPG with no stochastic dominant option
* 18 different lottery pairs.

### Type 7 : p1 < p2 and x1 < x2 < 0

* LPL with no stochastic dominant option
* 18 different lottery pairs.

## Import packages

In [None]:
import datetime                 # Time operations
import numpy as np              # Array operations
import pandas as pd             # Database operations
import matplotlib.pyplot as plt # Figures

## Load data

In [None]:
print("Loading data... ", end="")
data_filename = "./data/data.csv"
original_data = pd.read_csv(data_filename)
print("done!")

## Filter, rename & retype fields

In [None]:
# Keep only relevant fields
data = original_data[["date",
                      "monkey",
                      "Type",
                      "stim_left_p",
                      "stim_left_x0",
                      "stim_right_p",
                      "stim_right_x0",
                      "choice",
                      "stim_dice_output",
                      "time_response"]].copy()

# Rename fields
data = data.rename(columns={"monkey" :           "subject_id",
                            "date" :             "date",
                            "Type" :             "task_id",
                            "stim_left_p" :      "P_left",
                            "stim_left_x0" :     "V_left",
                            "stim_right_p" :     "P_right",
                            "stim_right_x0" :    "V_right",
                            "choice" :           "response",
                            "stim_dice_output" : "reward",
                            "time_response" :    "RT"})

# Convert task_id type (from string to int)
data["task_id"] = 0
data["task_id"] = pd.to_numeric(data["task_id"])

# Convert date type (from string to datetime64)
data["date"] = pd.to_datetime(data["date"])

## Enrich data with actual gain or loss

In [None]:
I_left = (data["reward"] == 1) & (data["response"] == 0)
I_right = (data["reward"] == 1) & (data["response"] == 1)
   
data["gain"] = 0
data.loc[I_left & (data["V_left"] > 0), "gain"] = data["V_left"]
data.loc[I_right & (data["V_right"] > 0), "gain"] = data["V_right"]
    
data["loss"] = 0
data.loc[I_left & (data["V_left"] < 0), "loss"] = data["V_left"]
data.loc[I_right & (data["V_right"] < 0), "loss"] = data["V_right"]


## Assign task id

In [None]:
# We assign task ids based on probablities and values
p1, x1 = data["P_left"], data["V_left"]
p2, x2 = data["P_right"], data["V_right"]

data.loc[(p1 == p2) & (x1 <  0) & (x2 >   0), "task_id"] = 1
data.loc[(p1 == p2) & (x2 <  0) & (x1 >   0), "task_id"] = 1

data.loc[(p1 == p2) & (x1 >  0) & (x2 >   0), "task_id"] = 2

data.loc[(p1 == p2) & (x1 <  0) & (x2 <   0), "task_id"] = 3

data.loc[(p1 >  p2) & (x1 >  0) & (x1 == x2), "task_id"] = 4
data.loc[(p2 >  p1) & (x1 >  0) & (x1 == x2), "task_id"] = 4

data.loc[(p1 >  p2) & (x1 <  0) & (x1 == x2), "task_id"] = 5
data.loc[(p2 >  p1) & (x1 <  0) & (x1 == x2), "task_id"] = 5

data.loc[(p1 <  p2) & (x1 > x2) & (x2 >   0), "task_id"] = 6
data.loc[(p2 <  p1) & (x2 > x1) & (x1 >   0), "task_id"] = 6

data.loc[(p1 <  p2) & (x1 < x2) & (x2 <   0), "task_id"] = 7
data.loc[(p2 <  p1) & (x2 < x1) & (x1 <   0), "task_id"] = 7

## Save new dataset

In [None]:
import os
filename, extension = os.path.splitext(data_filename)
filename = f"{filename}-processed{extension}"

print("Saving new dataset... ", end="")
data.to_csv(filename)
print("done!")
print("New dataset:", filename)