# Preprocessing Only Workflow

In this workflow, PyCFRL takes in an offline trajectory and then preprocesses the offline trajectory 
using `SyntheticPreprocessor`. The final output of the workflow is the preprocessed (debiased) 
offline trajectory. This workflow is appropriate when the user does not want to train policies using 
PyCFRL. Instead, the user can take the preprocessed trajectory to train a counterfactually fair policy 
using another reinforcement learning library or application that better fits their needs.

We begin by importing the libraries needed for this demonstration.

In [None]:
import sys
sys.path.append("E:/learning/university/MiSIL/CFRL Python Package/CFRL")

In [3]:
pip install pycfrl

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.




[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import pandas as pd
import numpy as np
import torch
from sklearn.model_selection import train_test_split
from pycfrl.reader import read_trajectory_from_dataframe, convert_trajectory_to_dataframe
from pycfrl.preprocessor import SequentialPreprocessor
np.random.seed(10) # ensure reproducibility
torch.manual_seed(10) # ensure reproducibility

<torch._C.Generator at 0x1d2f28de510>

## Data Loading

In this demonstration, we use an offline trajectory generated from a `SyntheticEnvironment` using some pre-specified transition rules. Although it is actually synthesized, we treat it as if it is from some unknown environment for pedagogical convenience in this demonstration.

The trajectory contains 500 individuals (i.e. $N=500$) and 10 transitions (i.e. $T=10$). The actions are binary ($0$ or $1$) and were sampled using a random policy that selects $0$ or $1$ randomly with equal probability. It is stored in a tabular format in a `.csv` file. The sensitive attribute variable is univariate, stored in the column `z1`. The legit values of the sensitive attribute are $0$ and $1$. The state variable is also univariate, stored in the column `state1`. The actions are stored in the column `action` and rewards in the column `reward`. The tabular data also includes an extra irrelevant column `timestamp`. 

We can load and view the tabular data.

In [5]:
trajectory = pd.read_csv('../data/sample_data_large_uni.csv')
trajectory

Unnamed: 0.1,Unnamed: 0,ID,timestamp,z1,action,reward,state1
0,0,1.0,1.0,0.0,,,1.324345
1,1,1.0,2.0,0.0,1.0,1.524345,-0.813722
2,2,1.0,3.0,0.0,1.0,-0.613722,-0.526683
3,3,1.0,4.0,0.0,1.0,-0.326683,-0.464447
4,4,1.0,5.0,0.0,1.0,-0.264447,-2.075518
...,...,...,...,...,...,...,...
5495,5495,500.0,7.0,1.0,1.0,-2.468460,-0.941954
5496,5496,500.0,8.0,1.0,1.0,-1.430345,-2.536595
5497,5497,500.0,9.0,1.0,0.0,-1.068298,-0.946557
5498,5498,500.0,10.0,1.0,0.0,-0.273278,-0.709017


We now read the trajectory from the tabular format into Trajectory Arrays.

In [6]:
zs, states, actions, rewards, ids = read_trajectory_from_dataframe(
                                                data=trajectory, 
                                                z_labels=['z1'], 
                                                state_labels=['state1'], 
                                                action_label='action', 
                                                reward_label='reward', 
                                                id_label='ID', 
                                                T=10
                                                )

## Preprocessor Training

Before preprocessing the trajectory, we need to first train a preprocessor. To mitigate overfitting, we use a random subset of 250 individuals in the trajectory to train the preprocessor. The remaining 250 individuals will be actually preprocessed. We now form these two sets.

In [7]:
(
    zs_train, zs_prepro, 
    states_train, states_prepro, 
    actions_train, actions_prepro, 
    rewards_train, rewards_prepro, 
    ids_train, ids_prepro
) = train_test_split(zs, states, actions, rewards, ids, test_size=0.5)

We now use the training set to train a `SequentialPreprocessor`.

In [8]:
sp = SequentialPreprocessor(z_space=[[0], [1]], 
                            num_actions=2, 
                            cross_folds=1, 
                            mode='single', 
                            reg_model='nn')
sp.train_preprocessor(zs=zs_train, xs=states_train, actions=actions_train, rewards=rewards_train)

100%|██████████| 1000/1000 [00:18<00:00, 54.42it/s]

The fluctuation in the loss is not small enough in at least one of the final 10 epochs during neural network training


(array([[[-0.32461696,  0.57096795],
         [ 0.22408737,  1.34364977],
         [-0.75577441,  1.3940598 ],
         ...,
         [-0.65848739, -0.12798007],
         [-0.64196535,  0.70769511],
         [-0.40729995,  1.56825744]],
 
        [[-0.2082495 ,  0.6873354 ],
         [ 1.26062998,  2.39690117],
         [-0.63231817,  1.84835638],
         ...,
         [-0.81515169,  0.43440456],
         [-1.39260415,  0.45079704],
         [-2.79777832, -1.66740244]],
 
        [[ 1.20425904,  2.09984394],
         [-0.71320386,  1.66238533],
         [ 0.96579847,  3.91839007],
         ...,
         [ 0.53464998,  2.41311652],
         [-1.10176242,  0.37004053],
         [ 0.5821277 ,  2.51716889]],
 
        ...,
 
        [[ 1.26530747,  2.16089238],
         [-2.15263285,  0.22994996],
         [-0.98369375,  1.14478895],
         ...,
         [-2.50899961, -1.07172801],
         [-0.83936513,  0.45569243],
         [-0.8893273 ,  0.26598686]],
 
        [[ 0.93616403,  1.831

## Data Preprocessing

We now preprocess the remaining data that are not in the training set.

In [9]:
states_tilde, rewards_tilde = sp.preprocess_multiple_steps(zs=zs_prepro, 
                                                           xs=states_prepro, 
                                                           actions=actions_prepro, 
                                                           rewards=rewards_prepro)

## Data Exporting

Finally, we convert the preprocessed trajectory back into the tabular format so that it is easier to store and manage. Note that the state variable now becomes bivariate because it includes both the counterfactual state under $Z=0$ and the counterfactual state under $Z=1$. For simplicity, we call the new concatenated counterfactual states as `state1` and `state2`, which is the default option provided by `convert_trajectory_to_dataframe()` (so we do not need to specify the `state_labels` argument).

In [10]:
preprocessed_trajectory = convert_trajectory_to_dataframe(
                                        zs=zs_prepro, 
                                        states=states_tilde, 
                                        actions=actions_prepro, 
                                        rewards=rewards_tilde, 
                                        ids=ids_prepro, 
                                        z_labels=['z1'], 
                                        action_label='action', 
                                        reward_label='reward', 
                                        id_label='ID', 
                                        T_label='time_step'
                                        )
preprocessed_trajectory

Unnamed: 0,ID,time_step,z1,action,reward,state1,state2
0,152.0,1.0,1.0,,,-2.630423,-1.734838
1,152.0,2.0,1.0,0.0,-0.889842,-0.082420,0.435083
2,152.0,3.0,1.0,0.0,0.027992,-0.115028,0.846006
3,152.0,4.0,1.0,1.0,0.403396,0.767605,2.372561
4,152.0,5.0,1.0,0.0,0.646033,0.850248,2.133603
...,...,...,...,...,...,...,...
2745,278.0,7.0,0.0,0.0,-0.308423,-2.123443,-1.532006
2746,278.0,8.0,0.0,0.0,-0.752341,-1.777098,-1.059334
2747,278.0,9.0,0.0,0.0,-0.597660,-1.132943,-0.191109
2748,278.0,10.0,0.0,1.0,-0.751431,-3.606803,-1.930229


## Alternative: Preprocessing All Individuals

Sometimes, the number of individuals in the trajectory is small. In this case, if we only preprocess a subset of individuals, the resulting preprocessed trajectory might be too small to be useful for policy learning. In this case, we can directly preprocess all individuals using the `train_preprocessor()` function when we set `cross_folds` to a relatively large number.

When `cross_folds=K` where `K` is greater than 1, `train_preprocessor()` will internally divide the training data into `K` folds. For each $i=1,\dots,K$, it trains a transition dynamics model based on all the folds other than the $i$-th one, and this model is then used to preprocess data in the $i$-th fold. This results in `K` folds of preprocessed data, each of which is processed using a model that is trained on the other folds. These `K` folds of preprocessed data are then combined and returned by `train_preprocessor()`. This method allows us to preprocess all individuals in the trajectory while reducing overfitting.

To use this functionality, we first initialize a `SequentialPreprocessor` with `cross_folds` greater than 1. We use `cross_folds=5` here.

In [11]:
sp_cf5 = SequentialPreprocessor(z_space=[[0], [1]], 
                                num_actions=2, 
                                cross_folds=5, 
                                mode='single', 
                                reg_model='nn')

We now simultaneously train the preprocessor and preprocess all individuals in the trajectory using the precedure described above.

In [12]:
states_tilde_cf5, rewards_tilde_cf5 = sp_cf5.train_preprocessor(zs=zs, 
                                                                xs=states, 
                                                                actions=actions, 
                                                                rewards=rewards)

100%|██████████| 1000/1000 [00:30<00:00, 33.12it/s]
100%|██████████| 1000/1000 [00:34<00:00, 28.82it/s]
100%|██████████| 1000/1000 [00:31<00:00, 31.87it/s]
100%|██████████| 1000/1000 [00:32<00:00, 31.12it/s]

The fluctuation in the loss is not small enough in at least one of the final 10 epochs during neural network training
100%|██████████| 1000/1000 [00:32<00:00, 31.07it/s]


We can now convert the preprocessed trajectory into the tabular format.

In [13]:
preprocessed_trajectory_cf5 = convert_trajectory_to_dataframe(
                                            zs=zs, 
                                            states=states_tilde_cf5, 
                                            actions=actions, 
                                            rewards=rewards_tilde_cf5, 
                                            ids=ids, 
                                            z_labels=['z1'], 
                                            action_label='action', 
                                            reward_label='reward', 
                                            id_label='ID', 
                                            T_label='time_step'
                                            )
preprocessed_trajectory_cf5

Unnamed: 0,ID,time_step,z1,action,reward,state1,state2
0,1.0,1.0,0.0,,,1.324345,2.287555
1,1.0,2.0,0.0,1.0,1.986853,-0.813722,1.722688
2,1.0,3.0,0.0,1.0,0.579322,-0.526683,2.660096
3,1.0,4.0,0.0,1.0,1.296837,-0.464447,2.882683
4,1.0,5.0,0.0,1.0,1.448898,-2.075518,1.461430
...,...,...,...,...,...,...,...
5495,500.0,7.0,1.0,1.0,-2.710126,-2.113400,-0.941954
5496,500.0,8.0,1.0,1.0,-1.654227,-4.019493,-2.536595
5497,500.0,9.0,1.0,0.0,-1.278619,-1.877994,-0.946557
5498,500.0,10.0,1.0,0.0,-0.542704,-1.552651,-0.709017
