<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Check-Data" data-toc-modified-id="Check-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Check Data</a></span></li><li><span><a href="#Prepare-Data" data-toc-modified-id="Prepare-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Prepare Data</a></span></li><li><span><a href="#Create-A-Test-Case" data-toc-modified-id="Create-A-Test-Case-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create A Test Case</a></span></li><li><span><a href="#Develop-Trx-Data-Transformation-Functions" data-toc-modified-id="Develop-Trx-Data-Transformation-Functions-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Develop Trx Data Transformation Functions</a></span><ul class="toc-item"><li><span><a href="#Remove-invalid-Redemptions-(no-Purchase-on-same-Date)" data-toc-modified-id="Remove-invalid-Redemptions-(no-Purchase-on-same-Date)-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Remove invalid Redemptions (no Purchase on same Date)</a></span></li></ul></li></ul></div>

<div class='alert alert-block alert-info'>
<b>Note:</b> This notebooks documents the development and testing of the trx data manipulation code. Once everything ran smoothly on the defined test case, fhe functions were outfactored into the final script `transform_data.py`.
</div>

In [2]:
import datetime as dt
import sys
from pathlib import Path

import codebook.EDA as EDA
import codebook.clean as clean
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
%load_ext autoreload
%autoreload 2

%matplotlib inline
plt.style.use('raph-base')

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_columns', 30)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', 800)

np.random.seed(666)

In [4]:
print(sys.executable)
print(sys.version)
print(f'Pandas {pd.__version__}')

C:\Users\r2d4\miniconda3\envs\py3\python.exe
3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)]
Pandas 1.1.3


## Check Data


In [5]:
# Load clean data from parquet file
data_clean = pd.read_parquet("data/1_trx_data_clean.parquet")

In [6]:
data_clean.head()
data_clean.info()

Unnamed: 0,member,date,trx_type,device,value,discount
0,1,2018-01-10,Purchase,Payment,83.1,60.8
1,1,2018-01-19,Purchase,Payment,146.3,0.0
2,1,2018-02-05,Activation,Loyalty Voucher,5.0,0.0
3,1,2018-02-16,Purchase,Payment,57.1,0.0
4,1,2018-02-16,Redemption,Loyalty Voucher,-5.0,0.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1612298 entries, 0 to 1612297
Data columns (total 6 columns):
 #   Column    Non-Null Count    Dtype         
---  ------    --------------    -----         
 0   member    1612298 non-null  object        
 1   date      1612298 non-null  datetime64[ns]
 2   trx_type  1612298 non-null  object        
 3   device    1612298 non-null  object        
 4   value     1612298 non-null  float32       
 5   discount  1612298 non-null  float32       
dtypes: datetime64[ns](1), float32(2), object(3)
memory usage: 61.5+ MB


**Findings:**
- No missing values
- Object dtype columns could be transformed to category dtype

In [8]:
EDA.display_value_counts(data_clean[["device", "trx_type"]])

Unnamed: 0,counts,prop,cum_prop
Payment,1188433,73.7%,73.7%
Loyalty Voucher,423865,26.3%,100.0%


Unnamed: 0,counts,prop,cum_prop
Purchase,1188433,73.7%,73.7%
Activation,262115,16.3%,90.0%
Redemption,161750,10.0%,100.0%


**Findings:**
- Device == "Payment" and trx_type == "Purchase" correspond. About 3/4 of trx are purchases
- Device == "Loyalty Voucher" can either be activated or redeemed. There are less Redemption trx than Activations

## Prepare Data

**Important:** For the functions below to work properly, we have to make sure the trx_type chronology is in the right order when grouped by member and date. This means: Activation --> Purchase --> Redemption

In [11]:
def prepare_trx_df(df):
    """Transform dtype object to dtype category, and - most
    important - make sure trx_types are in the correct 
    chronological order when groupbed by member and date. 
    (Conveniently this is the alphabetical order.) Return
    a copy of the original dataframe.
    """
    df = df.copy()
    for col in df.select_dtypes(include=["object", "string"]):
        df[col] = df[col].astype("category")
    df.sort_values(["member", "date", "trx_type"], inplace=True)
    return df

In [13]:
data_prep = prepare_trx_df(data_clean)

# Pass tests
assert "object" not in data_prep.dtypes

## Create A Test Case

**Important:** Think about edge cases whe creating test data, e.g. 
- There are a few same day activations / redemptions
- There are redemptions without purchase transaction (I won't consider them a purchase)

In [19]:
# Choose a suitable "Test Member"
test_member = data_clean[data_clean["member"] == "102318"].copy()

# Add an edge case: Same-date Activation and redemption
test_member.iloc[5, 1] = dt.datetime.strptime("2018-10-22", "%Y-%m-%d")

# Add an edge case: Redemption without purchase
test_member = test_member.append(test_member.iloc[14, :], ignore_index=True)
test_member.iloc[-1, 1] = dt.datetime.strptime("2018-4-01", "%Y-%m-%d")

# Sort again and re-index
test_member.sort_values(["member", "date", "trx_type"], inplace=True)
test_member.reset_index(drop=True, inplace=True)

test_member

Unnamed: 0,member,date,trx_type,device,value,discount
0,102318,2018-04-01,Redemption,Loyalty Voucher,-5.0,0.0
1,102318,2018-05-12,Purchase,Payment,107.7,0.0
2,102318,2018-05-12,Redemption,Loyalty Voucher,-10.0,0.0
3,102318,2018-07-09,Purchase,Payment,127.7,0.0
4,102318,2018-08-11,Purchase,Payment,20.0,49.9
5,102318,2018-08-31,Purchase,Payment,31.8,8.0
6,102318,2018-10-22,Activation,Loyalty Voucher,5.0,0.0
7,102318,2018-10-22,Purchase,Payment,49.8,0.0
8,102318,2018-10-22,Redemption,Loyalty Voucher,-5.0,0.0
9,102318,2019-01-03,Purchase,Payment,8.95,8.95


## Develop Trx Data Transformation Functions


1. Remove "invalid" Redemptions (no purchase on same date), few edge cases

Flags:
-    Flag purchase with redemption
-    Flag purchase without redemption when v_sum = 0
-    Flag puchase without redemption when v_sum > 0
-    Flag purchase by device
-    Flag purchase with discount, considering a threshold
-   _Flag purchase with value < 0 as return (not implemented)_

Intervals:
-   purchase intervals
-   interval from activation to next purchase (not considering activation)

### Remove invalid Redemptions (no Purchase on same Date)

In [66]:
# TODO - write a function for this

# Identify the stand-alone-Redemptions
# is this the best way? do i have to make a dataframe or could i work with the gp object directly?
test = test_member.groupby(["member", "date"]).agg({"trx_type": str})  # list does not work in the next steps - 

test = test[test["trx_type"].str.contains("Redemption")]
test = test[~test["trx_type"].str.contains("Purchase")]


test["trx_type"] = "Redemption"

test_member = pd.merge(test_member, test, on=["member", "date", "trx_type"], how="left", indicator=True)
test_member = test_member[test_member["_merge"] == "left_only"]
test_member.drop(columns="_merge", inplace=True)
test_member


Unnamed: 0,member,date,trx_type,device,value,discount
1,102318,2018-05-12,Purchase,Payment,107.7,0.0
2,102318,2018-05-12,Redemption,Loyalty Voucher,-10.0,0.0
3,102318,2018-07-09,Purchase,Payment,127.7,0.0
4,102318,2018-08-11,Purchase,Payment,20.0,49.9
5,102318,2018-08-31,Purchase,Payment,31.8,8.0
6,102318,2018-10-22,Activation,Loyalty Voucher,5.0,0.0
7,102318,2018-10-22,Purchase,Payment,49.8,0.0
8,102318,2018-10-22,Redemption,Loyalty Voucher,-5.0,0.0
9,102318,2019-01-03,Purchase,Payment,8.95,8.95
10,102318,2019-03-25,Purchase,Payment,44.9,0.0


In [38]:
list(g_prep.groups.items())
# g_prep.get_group([('00001', Timestamp('2018-01-10 00:00:00'))])

[(('102318', Timestamp('2018-04-01 00:00:00')),
  Int64Index([0], dtype='int64')),
 (('102318', Timestamp('2018-05-12 00:00:00')),
  Int64Index([1, 2], dtype='int64')),
 (('102318', Timestamp('2018-07-09 00:00:00')),
  Int64Index([3], dtype='int64')),
 (('102318', Timestamp('2018-08-11 00:00:00')),
  Int64Index([4], dtype='int64')),
 (('102318', Timestamp('2018-08-31 00:00:00')),
  Int64Index([5], dtype='int64')),
 (('102318', Timestamp('2018-10-22 00:00:00')),
  Int64Index([6, 7, 8], dtype='int64')),
 (('102318', Timestamp('2019-01-03 00:00:00')),
  Int64Index([9], dtype='int64')),
 (('102318', Timestamp('2019-03-25 00:00:00')),
  Int64Index([10], dtype='int64')),
 (('102318', Timestamp('2019-04-25 00:00:00')),
  Int64Index([11], dtype='int64')),
 (('102318', Timestamp('2019-06-05 00:00:00')),
  Int64Index([12], dtype='int64')),
 (('102318', Timestamp('2019-09-07 00:00:00')),
  Int64Index([13], dtype='int64')),
 (('102318', Timestamp('2019-09-14 00:00:00')),
  Int64Index([14, 15], dty

In [11]:
# This can be applied to whole df in one go

def create_basic_voucher_cols(df):
    """Create separate columns containing the values for 
    voucher activations ("v_a"), voucher redemptions ("v_r")
    and all of them combined.
    """
    df = df.copy()
    df["v_act"] = np.where(
        (df["device"] == "Financial Voucher") & (df["value"] > 0),
        df["value"], 
        0)
    df["v_red"] = np.where(
        (df["device"] == "Financial Voucher") & (df["value"] < 0),
        df["value"],
        0)
    df["v_all"] = np.where(
        df["device"] == "Financial Voucher",
        df["value"],
        0)
    return df

In [12]:
test = create_basic_voucher_cols(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,0.0,0.0
1,199189,2018-05-12,Redemption,Financial Voucher,-10.0,0.0,0.0,-10.0,-10.0
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,0.0,0.0
7,199189,2018-10-22,Redemption,Financial Voucher,-5.0,0.0,0.0,-5.0,-5.0
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0


In [13]:
def _calculate_voucher_sums(v):
    """Helper function to calculate the accumulated sum of
    voucher "credit" a customer has at any given time. We cannot 
    know about the remaining credit from earlier periods but we 
    make sure that the sum never becomes negative.
    """
    v_sum = np.array(list(itertools.accumulate(v)))
    # Make sure that v_sum never has a negative value
    v_min = np.min(v_sum)
    if v_min < 0:
        top_up_value = v_min
        v_sum = v_sum - top_up_value
    return v_sum

def create_voucher_sum_col(df):
    """Use a groupby "window function" to insert the
    voucher sums into a new column "v_sum".
    """
    df = df.copy()
    df = df.assign(v_sum = df.groupby(
        ["member"])["v"].transform(_calculate_voucher_sums))
    df.drop("v", axis=1, inplace=True)
    return df
    

(Check this stackoverflow post for [apply vs. transform on groupby objects](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object))

In [14]:
test = create_voucher_sum_col(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,0.0,10.0
1,199189,2018-05-12,Redemption,Financial Voucher,-10.0,0.0,0.0,-10.0,0.0
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,0.0,5.0
7,199189,2018-10-22,Redemption,Financial Voucher,-5.0,0.0,0.0,-5.0,0.0
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0


In [15]:
def shift_and_drop_redemptions(df):
    """Shift redemtion values in "v_r" column one row up, so they
    end up in the row of the corresponing transaction. This makes
    it possible to flag the respective transactions as ones with 
    redemption in a later step. Then delete all redemption rows, as 
    they are no longer needed.
    """
    df = df.copy()
    df = df.assign(v_r = df.groupby(
        ["member"])["v_r"].shift(-1)
    )
    df = df[~df["trx_type"].isin(["Redemption"])]
    data_trans["v_r"] = data_trans["v_r"].replace(np.nan, 0)
    
    # Remove "Redemption" Category
    df["trx_type"].cat.remove_unused_categories(inplace=True)  
    return df

In [16]:
test = shift_and_drop_redemptions(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,-10.0,10.0
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,-5.0,5.0
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0
10,199189,2019-04-25,Purchase,Loyalty,246.9,61.7,0.0,0.0,0.0
11,199189,2019-06-05,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0


In [17]:
def calculate_interval_activation_to_next_purchase(df):
    """Create a new col "delta_a" containing the interval from each
    activation to the next purchase as int (for days). Values of
    all non-activation rows are set to NaN.
    """
    df = df.copy()
    df = df.assign(delta_a=df.groupby(
        ["member"])["date"].diff(-1) * -1)
    
    df["delta_a"] = np.where(
        df["trx_type"] == "Activation", 
        df["delta_a"].dt.days,
        np.NaN
    )
    return df

In [18]:
def calculate_purchase_interval(df):
    """Create a new col "delta_p" containing the interval from each
    purchase to the next as int (for days). Values of all non-purchase
    rows are set to NaN.
    """
    df = df.copy()
    df = df.assign(delta_p=df.groupby(
        ["member", "trx_type"])["date"].diff(-1) * -1)
    
    df["delta_p"] = np.where(
        df["trx_type"] == "Purchase", 
        df["delta_p"].dt.days,
        np.NaN
    )
    return df

In [19]:
test = calculate_interval_activation_to_next_purchase(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum,delta_a
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,-10.0,10.0,
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0,
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0,
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0,
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,0.0
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,-5.0,5.0,
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0,
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0,
10,199189,2019-04-25,Purchase,Loyalty,246.9,61.7,0.0,0.0,0.0,
11,199189,2019-06-05,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,94.0


In [20]:
test = calculate_purchase_interval(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum,delta_a,delta_p
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,-10.0,10.0,,58.0
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0,,33.0
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0,,20.0
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0,,52.0
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,0.0,
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,-5.0,5.0,,73.0
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0,,81.0
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0,,31.0
10,199189,2019-04-25,Purchase,Loyalty,246.9,61.7,0.0,0.0,0.0,,135.0
11,199189,2019-06-05,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,94.0,


In [21]:
def flag_purchases_depending_on_vouchers(df):
    """Create three new boolean columns to classify purchases into
    each of the following three categories: "p_v_red" = purchase with
    redemption, "p_v_miss" = purchase without redemption (but voucher
    credit would have been available), "p_v_empty" = no voucher credit
    available.
    """
    df = df.copy()
    df["p_v_red"] = np.where(
        (df["trx_type"] == "Purchase") & (df["v_r"] < 0), 1, 0
    ).astype("bool")
    df["p_v_miss"] = np.where(
        (df["trx_type"] == "Purchase") & (df["v_r"] == 0) & (df["v_sum"] > 0), 1, 0
    ).astype("bool")
    df["p_v_empty"] = np.where(
        (df["trx_type"] == "Purchase") & (df["v_sum"] == 0), 1, 0
    ).astype("bool")
    return df

In [22]:
test = flag_purchases_depending_on_vouchers(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum,delta_a,delta_p,p_v_red,p_v_miss,p_v_empty
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,-10.0,10.0,,58.0,True,False,False
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0,,33.0,False,False,True
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0,,20.0,False,False,True
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0,,52.0,False,False,True
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,0.0,,False,False,False
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,-5.0,5.0,,73.0,True,False,False
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0,,81.0,False,False,True
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0,,31.0,False,False,True
10,199189,2019-04-25,Purchase,Loyalty,246.9,61.7,0.0,0.0,0.0,,135.0,False,False,True
11,199189,2019-06-05,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,94.0,,False,False,False


In [23]:
def calculate_discount_pct(df):
    """Caclulate a column "discount_pct" denoting the relative
    value of discounts. This value will be used to control for a
    threshold when setting a discount flag in the next step. (Note
    the calculation is such that discounts on returns wont reach
    the threshold.)
    """
    df = df.copy()
    df["gross_value"] = df["value"] + df["discount"]
    df["discount_pct"] = df["discount"] / df["gross_value"]
    return df


def flag_purchases_depending_on_discounts(df, threshold_pct=0.1):
    """Create a boolean columns to classify purchases having a
    discount whose relative value to the gross transaction price 
    reaches a certain threshold.
    """
    df = df.copy()
    df["p_discount"] = np.where(
        df["discount_pct"] >= threshold_pct, 1, 0
    ).astype("bool")
    
    df.drop(["gross_value", "discount_pct"], axis=1, inplace=True)
    return df

In [24]:
test = calculate_discount_pct(test)
test = flag_purchases_depending_on_discounts(test, threshold_pct=0.1)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum,delta_a,delta_p,p_v_red,p_v_miss,p_v_empty,p_discount
0,199189,2018-05-12,Purchase,Loyalty,107.7,0.0,0.0,-10.0,10.0,,58.0,True,False,False,False
2,199189,2018-07-09,Purchase,Loyalty,127.7,0.0,0.0,0.0,0.0,,33.0,False,False,True,False
3,199189,2018-08-11,Purchase,Payment,20.0,49.9,0.0,0.0,0.0,,20.0,False,False,True,True
4,199189,2018-08-31,Purchase,Loyalty,31.8,8.0,0.0,0.0,0.0,,52.0,False,False,True,True
5,199189,2018-10-22,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,0.0,,False,False,False,False
6,199189,2018-10-22,Purchase,Loyalty,49.8,0.0,0.0,-5.0,5.0,,73.0,True,False,False,False
8,199189,2019-01-03,Purchase,Loyalty,8.95,8.95,0.0,0.0,0.0,,81.0,False,False,True,True
9,199189,2019-03-25,Purchase,Loyalty,44.9,0.0,0.0,0.0,0.0,,31.0,False,False,True,False
10,199189,2019-04-25,Purchase,Loyalty,246.9,61.7,0.0,0.0,0.0,,135.0,False,False,True,True
11,199189,2019-06-05,Activation,Financial Voucher,5.0,0.0,5.0,0.0,5.0,94.0,,False,False,False,False


In [25]:
# Not implemented - just an idea

def prettify_remaining_cols(df):
    """Finally transform the trx_type into a boolean column
    for purchase / non purchase. And just to save some space,
    shorten the string for vouchers in the device column.
    """
    df = df.copy()
    df["trx_type"] = np.where(df["trx_type"] == "Purchase", 1, 0).astype("bool")
    df["device"] = np.where(df["device"] == "Financial Voucher", "Voucher", df["device"])
    return df

In [26]:
test = prettify_remaining_cols(test)
test

Unnamed: 0,member,date,trx_type,device,value,discount,v_a,v_r,v_sum,delta_a,delta_p,p_v_red,p_v_miss,p_v_empty,p_discount
0,199189,2018-05-12,True,Loyalty,107.7,0.0,0.0,-10.0,10.0,,58.0,True,False,False,False
2,199189,2018-07-09,True,Loyalty,127.7,0.0,0.0,0.0,0.0,,33.0,False,False,True,False
3,199189,2018-08-11,True,Payment,20.0,49.9,0.0,0.0,0.0,,20.0,False,False,True,True
4,199189,2018-08-31,True,Loyalty,31.8,8.0,0.0,0.0,0.0,,52.0,False,False,True,True
5,199189,2018-10-22,False,Voucher,5.0,0.0,5.0,0.0,5.0,0.0,,False,False,False,False
6,199189,2018-10-22,True,Loyalty,49.8,0.0,0.0,-5.0,5.0,,73.0,True,False,False,False
8,199189,2019-01-03,True,Loyalty,8.95,8.95,0.0,0.0,0.0,,81.0,False,False,True,True
9,199189,2019-03-25,True,Loyalty,44.9,0.0,0.0,0.0,0.0,,31.0,False,False,True,False
10,199189,2019-04-25,True,Loyalty,246.9,61.7,0.0,0.0,0.0,,135.0,False,False,True,True
11,199189,2019-06-05,False,Voucher,5.0,0.0,5.0,0.0,5.0,94.0,,False,False,False,False


---