 Implementation vs Execution Analysis
> A data-driven analysis comparing planned implementation against actual execution metrics

- **Author:** Joby George
- **Date:** April 9, 2025
- **Last Updated:** April 9, 2025

# Project Context
### Single Use Product (SUP) Eligibility Expansion Test
This analysis focuses on the preliminary smoke screen testing phase for the SUP eligibility expansion project. The smoke screen test serves as a critical validation step before the full rollout scheduled for April 16, 2025.

**Test Objectives:**
- Validate the expanded eligibility criteria implementation
- Verify system behavior with new eligibility rules
- Identify potential issues before full rollout
- Ensure data consistency and accuracy

**Timeline:**
- Smoke Screen Test: April 9-15, 2025
- Full Rollout: April 16, 2025

### Test Design
The smoke screen test implements a three-way split of the customer population:

| Group | Size | Treatment |
|-------|------|-----------|
| Control | 80% | Current eligibility rules (no changes) |
| Treatment A | 10% | Expanded merchant selection |
| Treatment B | 10% | Expanded merchant selection + reduced transaction declines |

**Treatment Details:**
1. **Control Group (80%)**
   - Maintains current SUP eligibility criteria
   - Serves as baseline for comparison

2. **Treatment A (10%)**
   - Implements expanded merchant selection
   - Tests impact of broader merchant eligibility

3. **Treatment B (10%)**
   - Implements expanded merchant selection
   - Includes reduced transaction decline criteria
   - Tests combined impact of both changes

## Overview
This notebook provides a comprehensive analysis comparing implementation plans against actual execution metrics. The analysis aims to:
- Identify gaps between planned and actual execution


## Table of Contents
1. Data Preparation
2. Validating the Experiment


## Step 1: Data Preparation
### Package Imports and Database Connection Setup
This section initializes our analysis environment by:
1. Importing required Python packages
2. Establishing connection to Snowflake
3. Setting up display configurations for optimal notebook readability



In [0]:
#set up

import json
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import pandas as pd
import numpy as np

# from utils import policy_dict, policy_data_dict, find_non_zero_riskweight_rules, find_zero_riskweight_rules
from pandasql import sqldf
from tqdm import tqdm
import logging

# Standard data manipulation and visualization packages
import pandas as pd
import numpy as np


# Display settings
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_rows', 100)      # Show 100 rows max
pd.set_option('display.width', None)        # Auto-detect display width
pd.set_option('display.float_format', lambda x: '%.3f' % x)  # Format floats to 3 decimal places

# Plotting settings
plt.style.use('seaborn')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = [12, 6]  # Set default figure size


from pysnowflake import Session
run_query = lambda query: sqldf(query, globals())




In [0]:
USER_NAME = 'jobyg' #replace it with your ldap name
sess = Session(
   connection_override_args={
       'autocommit': True,
       'authenticator': 'externalbrowser',
       'account': 'square',
       'database': f'PERSONAL_{USER_NAME.upper()}',
       'user': f'{USER_NAME}@squareup.com'
   }
   
)
conn = sess.open()



In [0]:
conn.execute('use warehouse ADHOC__LARGE')
conn.execute('use database AP_CUR_FRDRISK_G')
conn.execute('use schema public')



#Step 2:
### Validating the Experiment
This section reviews the querying of the data to confirm the execution of the rules is as intended

In [0]:
#Step 1: Create base driver for all eligibility checkpoints, 
driver = '''create or replace temp table base_events AS (
    SELECT 
        a.*,
         parse_json((extra))['transient_features']['holdout_group'] as holdout_group,  
    parse_json((extra))['transient_features']['pre_expansion_risk_tier'] as pre_expansion_Risk_Tier,
    parse_json((extra))['transient_features']['test_key'] as test_key,
        c.first_order_date,
        CASE 
            WHEN c.first_order_date IS NULL THEN 'new'
            WHEN DATEDIFF('day', c.first_order_date, a.par_process_date) <= 14 THEN 'new'
            ELSE 'existing'
        END as customer_tenure
    FROM ap_cur_r_featsci.curated_feature_science_red.raw_r_e_rekarma_rl_exec_rslt_rules_rt__jobyg_dsl3_sv a
    LEFT JOIN AP_RAW_GREEN.GREEN.D_CONSUMER c
        ON a.consumer_uuid = c.uuid
    WHERE a.par_process_date >= '2024-04-09'
)'''
conn.execute(driver)




In [0]:
#confirm impact is only to EU new users
prelim_validation = conn.download('''select par_region, 
                        customer_tenure,
                        count(distinct(case when rule_id = 'gb_fraud_online_sup_eligibility_model_risk_filter_elgibility_experiment' and actions != '[]' then consumer_uuid  end)) as consumer_ct
                        from base_events
                        group by 1,2
                        order by 1,2
              ''')

print(prelim_validation)

In [0]:
#Confirm for EU new users, the 80% are being treated as BAU
eligbility_pop = '''create or replace temp table eligibility_pop as (select *,
                          
     from base_events where rule_id = 'gb_fraud_online_sup_eligibility_model_risk_filter_elgibility_experiment' and actions != '[]');'''
conn.execute(eligbility_pop)

## Confirm 80% are in bau_control
confirm_treatment_splits = conn.download('''WITH counts AS (
    SELECT 
        holdout_group,
        COUNT(distinct(consumer_uuid)) as count
    FROM eligibility_pop
    GROUP BY 1
)
SELECT 
    holdout_group,
    count,
    ROUND(100.0 * count / SUM(count) OVER (), 2) as percentage
FROM counts
ORDER BY count DESC;
''')
print(confirm_treatment_splits) #percentages are looking appropriate

In [0]:
#Confirm BAU control is giving all merchants to L risk and all - 2 merchants in the H risk, where Amazon and Iceland are excluded

confirm_bau_control = conn.download(''' 
                    select 
    pre_expansion_Risk_Tier,
    case when actions ilike '%400517492%' then 1 else 0 end as amazon_enabled_flag,
    case when actions ilike '%400461110%' then 1 else 0 end as ebay_enabled_flag,
    case when (length(actions) - 94) <= 0 then 0
       else round((length(actions) - 94)/12)
       end as merchant_ct,
    count(distinct(consumer_uuid)) as consumer_ct
    from eligibility_pop,
    where holdout_group = 'bau_control'
    group by 1,2,3,4
                    ''' )
#no amazon and ebay in high risk tiers, all merchants in low_risk
print(confirm_bau_control)

In [0]:
#Confirm holdout group is working as intended where all customers have 54 merchants enabled,
confirm_bau_control = conn.download(''' 
                    select 
    holdout_Group,
    pre_expansion_Risk_Tier,
    case when actions ilike '%400517492%' then 1 else 0 end as amazon_enabled_flag,
    case when actions ilike '%400461110%' then 1 else 0 end as ebay_enabled_flag,
    case when (length(actions) - 94) <= 0 then 0
       else round((length(actions) - 94)/12)
       end as merchant_ct,
    count(distinct(consumer_uuid)) as consumer_ct
    from eligibility_pop,
    where holdout_group != 'bau_control'
    group by 1,2,3,4,5
                    ''' )
#all customer risk tiers have access to amazon and ebay
print(confirm_bau_control)

In [0]:
#analyze the trust population, confirm it's only impacting GB new users
#create driver
trust_base = '''CREATE or replace temp table trusted_layer_validation AS (SELECT
        a.*, to_number(substring(MD5(concat(a.CONSUMER_ID,'oon_expansion')),1,6),'XXXXXX')/to_number('ffffff','XXXXXX')::decimal(10,2) as test_key,
        case when merchant_id in ('400538723', '400570637', '400575272', '400568202', '400504709', '400535118', '400551784', '400527714', '400515891', '400546925', '400527604', '400562359', '400474538', '400571508', '400515894', '400569331', '400572942', '400547364', '400474394', '400527759', '400575150', '400562345', '400570636', '400527737', '400527047', '400474401', '400475455', '400527639', '400527593', '400574444', '400461110', '400488351', '400507556', '400527756', '400569337', '400527058', '400570624', '400510052', '400510042', '400515896', '400575275', '400527704', '400514646', '400527833', '400522690', '400527601', '400562315', '400579004', '400527066', '400507557', '400509583', '400507558', '400507559',   '400517492') then 1 else 0 end as gb_sup_flag,
        CASE 
            WHEN c.first_order_date IS NULL THEN 'new'
            WHEN DATEDIFF('day', c.first_order_date, a.par_process_date) <= 14 THEN 'new'
            ELSE 'existing'
        END as customer_tenure
        FROM  AP_CUR_R_FEATSCI.curated_feature_science_red.RAW_C_E_FC_DECISION_RECORD_RULES_RT__JOBYG_DSL3_SV a
        LEFT JOIN AP_RAW_GREEN.GREEN.D_CONSUMER c
        ON a.CONSUMER_ID = c.uuid
        WHERE a.checkpoint in ('CHECKOUT_CONFIRM')
        -- AND ((is_rejected = 'True' and is_in_treatment ilike 'True'))
        AND a.par_process_date >= '2025-04-09'
        and a.par_Region = 'GB'); ''';                  

conn.execute(trust_base)



In [0]:

earliest_ts = '2025-04-09 19:03:38.711'
tl_pop_validation = conn.download(f'''select par_region, 
                        customer_tenure,
                        gb_sup_flag,
                        count(distinct  consumer_id) as total_consumer_Ct,
                        count(distinct(case when rule_id = 'gb_trusted_layer_v_eligibility_expansion' and actions != '[]' then consumer_id  end)) as impacted_consumer_ct,
                        count(distinct(case when rule_id = 'gb_trusted_layer_v_eligibility_expansion' and actions != '[]' then order_token end)) as impacted_token_ct
                        from trusted_layer_validation

                        -- add this filter to ensure event time looks from when smoke test started
                        where to_timestamp(event_info_event_time/1000) >= '{earliest_ts}'
                        group by 1,2,3
                        order by 1,2,3
              ''')
print(tl_pop_validation) #exclusively GB new user SUP, but what's the specific percentage


In [0]:

numerator = tl_pop_validation.loc[(tl_pop_validation['customer_tenure'] == 'new') & (tl_pop_validation['gb_sup_flag'] == 1)]['impacted_consumer_ct'].values[0]
denominator = tl_pop_validation.loc[(tl_pop_validation['customer_tenure'] == 'new') & (tl_pop_validation['gb_sup_flag'] == 1)]['total_consumer_ct'].values[0]

print(f'impacted consumer count is {round(numerator/denominator,3)}')

#10% split is looking valid 

In [0]:
impacted_customers = conn.download(f'''select test_key, consumer_id, to_timestamp(event_info_event_time/1000) as time from trusted_layer_validation where rule_id = 'gb_trusted_layer_v_eligibility_expansion' and actions != '[]'
and to_timestamp(event_info_event_time/1000) >= '{earliest_ts}' ''')

print(impacted_customers) # all customers have >= .9

In [0]:
min_test_key = impacted_customers['test_key'].min()
print(f'min test_key is {min_test_key}')

In [0]:
trust_pop = set(impacted_customers['consumer_id'])

eligibility_pop = conn.download('''select consumer_uuid, holdout_group, test_key from base_events where holdout_group = 'expanded eligibility, less trxn control' ''')

In [0]:
ct = 0
for consumer_uuid in trust_pop:
    # print(consumer_uuid, ct)
    if consumer_uuid in set(eligibility_pop['consumer_uuid']):
        ct += 1
    if consumer_uuid not in set(eligibility_pop['consumer_uuid']):
        print(consumer_uuid)

if ct == len(trust_pop):
    print(f'all trust customers are in the appropriate eligibility test group')