
#UK Single Use Product (SUP): Eligibility Expansion Impact Analysis
## AUthor Joby George (jobyg)
## 4/10/2025

## Context: Current Operating SUP Eligibility Strategy
- Customer Risk Segmentation:
  - **Low Risk**: ~90% of customer base
    - Full access to all SUP merchants (including Amazon/eBay)
  
  - **High Risk**: ~10% of customer base
    - Limited SUP merchant access
    - Cannot access Amazon/eBay

## Experiment

## Test Design (Launch: April 16, 2025)
### Population Split
1. **Control Group** (10% of all customers)
   - Maintains current risk-based restrictions
   - High risk users continue to have limited merchant access

2. **Treatment Group A** (80% of all customers)
   - All customers get full merchant access
   - Standard transaction controls

3. **Treatment Group B** (10% of all customers)
   - All customers get full merchant access
   - Enhanced transaction privileges

## Smoke Test (Launch April 9th, 2025)

On April 9th, a proof of concept was done to confirm the test roll-out would work as intended. The Smoke-Test kept 80% of customers as BAU, with 20% given full merchant access. Out of the enabled 20%, 10% of them were also given trusted layer priviledges.



# Script Goals:

This script looks to measure the experiment effects on consumer purchasing behavior looking at the following:

## 1. Core Impact Analysis - High Risk Population
### Primary Metrics
- **Attempt Rate**
  - High Risk Control vs.
  - High Risk Expanded Eligibility vs.
  - High Risk Expanded Eligibility + Reduced Controls
  
- **Fraud Decline Rate**
  - Split by population groups above
  - Control for existing fraud patterns
  
- **GPV (Gross Payment Volume)**
  - Total volume
  - Average transaction size
  - Transaction frequency
  
- **NAC (New Active Customer) Rate**
  - First-time SUP usage
  - Activation patterns by group

### Control Variables
- Existing trusted layer assignments

## 2. Transaction Control Validation
### Technical Verification
- Confirm no overlap between:
  - Reduced transaction control group
  - GB trusted layer rule assignments
  
### Population Segmentation
- Create clean comparison groups:
  1. Base: High Risk Control
  2. Treatment 1: High Risk + Expanded Eligibility (only)
  3. Treatment 2: High Risk + Expanded Eligibility + Reduced Controls
     - Exclude any GB trusted layer assignments

## 3. Risk Tier Analysis for Reduced Controls
### Comparison Framework
Split Treatment 2 (Expanded + Reduced Controls) by:
- Low Risk
  - With existing trusted layer
  - Without existing trusted layer
- High Risk
  - With existing trusted layer
  - Without existing trusted layer

### Metrics by Segment
- Attempt success rate
- Transaction volume
- Fraud rates (takes time to mature)
- Decline rates


#Prepping the data to compare impact of expanded merchant eligibility

In [0]:
#set up

import json
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import pandas as pd
import numpy as np

# from utils import policy_dict, policy_data_dict, find_non_zero_riskweight_rules, find_zero_riskweight_rules
from pandasql import sqldf
from tqdm import tqdm
import logging

# Standard data manipulation and visualization packages
import pandas as pd
import numpy as np


# Display settings
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_rows', 100)      # Show 100 rows max
pd.set_option('display.width', None)        # Auto-detect display width
pd.set_option('display.float_format', lambda x: '%.3f' % x)  # Format floats to 3 decimal places

# Plotting settings
plt.style.use('seaborn')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = [12, 6]  # Set default figure size


from pysnowflake import Session
run_query = lambda query: sqldf(query, globals())




In [0]:
USER_NAME = 'jobyg' #replace it with your ldap name
sess = Session(
   connection_override_args={
       'autocommit': True,
       'authenticator': 'externalbrowser',
       'account': 'square',
       'database': f'PERSONAL_{USER_NAME.upper()}',
       'user': f'{USER_NAME}@squareup.com'
   }
   
)
conn = sess.open()



In [0]:
conn.execute('use warehouse ADHOC__LARGE')
conn.execute('use database AP_CUR_FRDRISK_G')
conn.execute('use schema public')



In [0]:
#event time filter is needed to control for the rule changes made and ensure consistent logic is applied throughout
eligibility_driver = '''create or replace temp table SUP_risk_filter_events AS (

    SELECT 
        a.*,
         parse_json((extra))['transient_features']['holdout_group'] as holdout_group,  
    parse_json((extra))['transient_features']['pre_expansion_risk_tier'] as pre_expansion_Risk_Tier,
    parse_json((extra))['transient_features']['test_key'] as test_key,
        c.first_order_date,
        CASE 
            WHEN c.first_order_date IS NULL THEN 'new'
            WHEN DATEDIFF('day', c.first_order_date, a.par_process_date) < 15 THEN 'new'
            ELSE 'existing'
        END as customer_tenure
    FROM ap_cur_r_featsci.curated_feature_science_red.raw_r_e_rekarma_rl_exec_rslt_rules_rt__jobyg_dsl3_sv a
    LEFT JOIN AP_RAW_GREEN.GREEN.D_CONSUMER c
        ON a.consumer_uuid = c.uuid
    WHERE a.event_time >= '2025-04-09 19:03:38.711'
    and customer_tenure = 'new'
    and a.rule_id = 'gb_fraud_online_sup_eligibility_model_risk_filter_elgibility_experiment'
)'''
conn.execute(eligibility_driver)


In [0]:
#check if a consumer has multiple risk_tiers per day
risk_tier_dupe_check = conn.download('select par_process_Date, consumer_uuid, count(distinct(pre_expansion_risk_tier)) as risk_tier_count from SUP_risk_filter_events group by 1,2 order by 3 desc limit 5')
risk_tier_dupe_check ## so a customer can be evaluated as having both low and high risk tiers on the same day, this will be a bit tricky to find the high risk population, but i'll use aggregates to paint a clearer picture


In [0]:
eligbility_driver_cust_day_level_q = conn.execute('''
   create or replace temp table eligibility_driver_cust_day_deduped as (select 
        par_process_date
        ,consumer_uuid
        ,min(case when pre_expansion_risk_tier = 'L' then 0 
                  when pre_expansion_risk_tier = 'H' then 1 end) as min_daily_risk_tier
        ,max(case when pre_expansion_risk_tier = 'L' then 0 
                  when pre_expansion_risk_tier = 'H' then 1 end) as max_daily_risk_tier
        ,max(holdout_group) as holdout_group
        from SUP_risk_filter_events
        group by 1,2)''')
    


In [0]:
holdout_group = conn.execute('create or replace temp table holdouts_deduped as (select distinct consumer_uuid, holdout_group from SUP_Risk_Filter_Events);')

In [0]:
conn.download('select * from holdouts_deduped')

In [0]:
#check if a consumer has multiple risk_tiers per day
holdout_group_dupe_check = conn.download('select consumer_uuid, count(distinct(holdout_group)) as holdout_group_count from holdouts_deduped group by 1 order by 2 desc limit 5')
holdout_group_dupe_check ## so a customer can be evaluated as having both low and high risk tiers on the same day, this will be a bit tricky to find the high risk population, but i'll use aggregates to paint a clearer picture


In [0]:
#CHECKOUT TIME filter needs to be applied as to make sure holdout group is properly applied
order_token_driver = '''create or replace temp table SUP_attempts AS (
    SELECT 
        a.order_token
        ,a.consumer_id
        ,a.checkout_time
        ,a.par_region
        ,a.bp_is_sup
        ,a.consumer_is_First_order
        ,a.days_since_First_order_date
        ,a.in_flight_order_merchant_name
        ,a.in_flight_order_merchant_id
        ,a.trust
        ,a.in_flight_order_amount
        ,a.order_amount_local
        ,a.whitepages_identity_check_score
        ,a.bp_c_batch_consumer_batch_model_v1
        ,a.MODEL_ONLINE_OD_ABUSIVE_GLOBAL_APRIL_2024_SCORE
        ,a.MODEL_ONLINE_OD_PAYBACK_NON_US_APRIL_2024_SCORE
        ,a.MODEL_ONLINE_OD_GLOBAL_PAYBACK_SUP_MODEL_AUGUST_2023_SCORE
        ,c.holdout_group
        ,min_daily_risk_tier
        ,max_daily_risk_tier
        ,case when b.min_daily_risk_tier = 0 then 'L' else 'H' end as min_daily_risk_tier_fixed
        ,case when b.max_daily_risk_tier = 0 then 'L' else 'H' end as max_daily_risk_tier_fixed
        ,case when bp_c_batch_consumer_batch_model_v1 <= 780 then 'L' else 'H' end as order_risk_tier
        ,case when b.min_daily_risk_tier = b.max_daily_risk_tier then 1 else 0 end as daily_risk_tier_was_consistent
        from  ap_cur_r_frdrisk.curated_fraud_risk_red.unified_feature_datamart_base__jobyg_dsl3_sv a
        left join eligibility_driver_cust_day_deduped b
        on a.consumer_id = b.consumer_uuid
        and a.par_process_Date =b.par_process_date
        left join holdouts_deduped c
        on a.consumer_id = c.consumer_uuid
        where a.par_Region = 'GB'
        and a.par_process_date >= '2025-04-09'
        and a.bp_is_sup = 1
        and coalesce(a.days_since_First_order_date,0) <= 14
        and dedup=1
        and checkout_time >= '2025-04-09 21:03:38.711'        );
'''
conn.execute(order_token_driver)

In [0]:
#confirm each order_token has a holdoutgroup
view = conn.download('select * from SUP_attempts')
print(view.shape)


In [0]:
view.loc[view.holdout_group.isnull()] #edge cases where the eligibility tracker happened in between rule modifications, i can remove them

In [0]:
view = view.loc[~view.holdout_group.isnull()]

In [0]:
view.loc[view.min_daily_risk_tier.isnull()] #ok the min and max daily risk tier are null due to the join logic as they occur right at midnight >.>, i'll remove them for now 

In [0]:
view = view.loc[~view.min_daily_risk_tier.isnull()]

In [0]:
view.shape #removed 14 transactions in total, 1.8% impact but makes my analysis clean


# Let's Compare High Risk BAU Control to High risk expanded elgibility

In [0]:
high_risk_pop = conn.download('''select * from SUP_attempts where (min_daily_risk_tier_fixed = 'H' or max_daily_risk_tier_fixed = 'H' or order_risk_tier = 'H') and holdout_Group is not null and min_daily_risk_tier is not null ''')

In [0]:
high_risk_pop.daily_risk_tier_was_consistent.value_counts(normalize=True) #for the most part, all the high risk attempts were made by consistently high risk customers throughout the day, let's look at where the flag = 0

In [0]:
high_risk_pop.loc[high_risk_pop.daily_risk_tier_was_consistent==0] #since all of the flags where the daily risk tier was consistent had an order_risk_Tier of H, we'll label all of these as high risk tiers, using the max tier

In [0]:
analysis = conn.download('''
        select 
            max_daily_risk_tier_fixed, 
            holdout_group, 
            in_flight_order_merchant_name,
            count(distinct(order_token)) as token_ct,
            sum(in_flight_order_amount)  as attempt_amt
            from
            SUP_attempts where max_daily_risk_tier_fixed = 'H'
            and holdout_Group is not null
            group by 1,2,3
             ''')
analysis.sort_values(by =['holdout_group', 'token_ct'], ascending=[True,False]) #uh oh how are the high risk customers placing orders at amazon

In [0]:
print('a')

In [0]:
amazon = conn.download('''select * from SUP_attempts where in_flight_order_merchant_name ilike 'AMAZON%' ''')

In [0]:
amazon.holdout_group.value_counts(normalize=True)

In [0]:
examination.loc[examination.checkpoint == 'SUP_MERCHANT_RISK_FILTER']