# Where apply button clicks come from

There's two ways to look at the data, each of which might give slightly different counts:
* LeadGeneration.ClickConversion from the button click
* PageView on the ISS page
* (and also redirect completed on ISS)

The button click event gives more context about where on the page the click happened, while the ISS pageview is the official definition of an action.


Events should be defined as per https://docs.google.com/spreadsheets/d/1HICh77BoGMIat9K3NPwz3pBayJWiAr0ohAlTuv7dr80/edit#gid=1692709656, but this hasn't be implemented consistently.  

Of note for button clicks / LeadGeneration.ClickConversion:
* We used to send product_id and provider_id, but with the move to Falcon that doesn't work any longer (or will provide incorrect results).  LPS in particular doesn't seem to have updated the implementation.
* The product comparison widget in the blog currently doesn't tell us what page it is on
* Sometimes there is no product or provider info coming through


Other things of note:
* LPS pages don't seem to be categorised as such in the database


Outstanding things not necessarily covered below:
* NPP clicks

In [314]:
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import sqlalchemy

from data_warehouse_querying import DataWarehouseQuery

# Settings

In [315]:
num_days_to_query = 2
to_datetime = datetime.now().date() - timedelta(days=1) #datetime(year=2020, month=3, day=1)
from_datetime = to_datetime - timedelta(days=num_days_to_query)


# Getting Base Data

In [316]:
dq = DataWarehouseQuery()
dq.connect()

In [317]:
products = dq.query("select * from dim_product")

Starting query at 2020-04-07T02:55:37.315620
Query took 0.10


In [318]:
products.head()

Unnamed: 0,product_id,product_name,source_product_id,sys_inserted,sys_updated,status,slug,language_id,channel_id,provider_id,country_id
0,56622,CIMB Platinum Mastercard,101,2019-07-08 19:34:51.551632,2019-07-08 19:34:51.551632,0,cimb-platinum-mastercard-212cae7f-f5cd-4dd1-ba...,1,4,102,1
1,70784,Maybank DUO Platinum Mastercard,105,2019-08-16 19:34:45.935610,2019-08-16 19:34:45.935610,1,maybank-duo-platinum-mastercard,1,4,107,1
2,75681,OCBC 90°N Card,106,2019-08-30 19:34:23.890164,2019-08-30 19:34:23.890164,0,ocbc-90-n-card,1,4,108,1
3,81857,Citibank Quick Cash (Existing Loan Customers),24,2019-09-16 19:35:32.751792,2019-09-16 19:35:32.751792,1,citibank-quick-cash-existing-customers,1,16,836,1
4,93497,OCBC ExtraCash Loan,28,2019-10-18 20:10:55.807708,2019-10-18 20:10:55.807708,0,ocbc-extra-cash-loan,1,16,833,1


In [319]:
providers = dq.query("select * from dim_provider")

Starting query at 2020-04-07T02:55:37.439415
Query took 0.06


In [320]:
providers.head()

Unnamed: 0,provider_id,provider_name,sys_inserted,sys_updated,source_provider_id,slug,status,channel_id,country_id,language_id
0,833,OCBC,2019-02-19 19:32:36.488054,2019-02-19 19:32:36.488054,6.0,ocbc,1,16,1,1
1,837,Standard Chartered Bank,2019-02-19 19:32:36.488054,2019-02-19 19:32:36.488054,2.0,scb,1,16,1,1
2,100,Standard Chartered Bank,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,67.0,scb,0,4,1,1
3,104,Citibank,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,56.0,citibank,1,4,1,1
4,255,AXA,2019-02-01 01:43:57.318499,2019-02-01 01:43:57.318499,259.0,axa-direct,1,20,1,1


In [321]:
channels = dq.query("select * from dim_channel")

Starting query at 2020-04-07T02:55:37.519304
Query took 0.03


In [322]:
channels.head()


Unnamed: 0,channel_id,channel_key,channel_name,sys_inserted,sys_updated
0,3,car-loan,Car Loan,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
1,7,health-insurance,Health Insurance,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
2,11,international-health,International Health Insurance,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
3,15,personal-accident-insurance,Personal Accident Insurance,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
4,19,savings-account,Saving Accounts,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062


In [323]:
providers_channels = pd.merge(providers, channels, on="channel_id", how="left")

In [324]:
len(providers)

496

In [325]:
len(providers_channels)

496

# Looking at the LeadGeneration.ClickConversion event

## Getting the click event data

In [326]:
query = """
select  

    country_code
    , dim_page_type.page_type
    , dim_page_type.page_sub_type
    , case when page_url like '%/embed/%' then true else false end as is_embed
    , page_url
    , device_os
    , device_category
    , browser
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'channel', true) as channel
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product_slug', true) as product_slug
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product', true) as product
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product_id', true) as product_id
    , dim_product.slug as product_from_id
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider_slug', true) as provider_slug
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider', true) as provider
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider_id', true) as provider_id
    , dim_provider.slug as provider_from_id
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_category', true) as affiliate_category
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_location', true) as affiliate_location
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_page_type', true) as affiliate_page_type
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_widget_type', true) as affiliate_widget_type
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'list_position', true) as list_position
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'action', true) as action
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'source', true) as source
    , dim_activity.activity_attributes
    from 
    
    -- TODO: cut down the join.s just copy / pasting
    fact_activities 
    left join dim_page on fact_activities.page_id = dim_page.page_id
    left join dim_page_type on dim_page_type.page_type_id = dim_page.page_type_id
    -- left join dim_session on fact_activities.session_id = dim_session.session_id
    left join dim_activity on fact_activities.activity_id = dim_activity.activity_id
    
    left join dim_activity_type on fact_activities.activity_type_id = dim_activity_type.activity_type_id
    left join dim_date on dim_date.date_id = fact_activities.activity_date_id
    -- left join dim_time on fact_activities.activity_time_id = dim_time.time_id
    left join dim_country on fact_activities.site_country_id = dim_country.country_id
    
    left join dim_browser on fact_activities.browser_id = dim_browser.browser_id -- firefox etc
    left join dim_device on fact_activities.device_id = dim_device.device_id -- device_os, device_category (desktop / mobile...)
    
    left join dim_channel on dim_channel.channel_key = json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'channel', true)
    -- only join product and provider if the slug isn't set i.e. assume that it's pre-falcon YMMV (and it's deprecated)
    left join dim_product on (dim_product.source_product_id = json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product_id', true) 
        and coalesce(json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product', true), '') =''
        and dim_product.channel_id = dim_channel.channel_id 
        and dim_product.country_id = dim_country.country_id) 
    left join dim_provider on (
        dim_provider.source_provider_id = json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider_id', true) 
        and coalesce(json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider', true), '') =''
        and dim_provider.channel_id = dim_channel.channel_id 
            and dim_provider.country_id = dim_country.country_id)
    

    
    where 
        dim_activity_type.activity_name = 'LeadGeneration.ClickConversion'
        
        and user_filter_type='external_visitor'
         and dim_date.full_date>='{from_date}'
            and dim_date.full_date<='{to_date}'
        
        
        -- NB: embeds aren't currently listed as blog pages :(
        
""".format(from_date= from_datetime.isoformat(), to_date=to_datetime.isoformat())


In [327]:
dq.query("select coalesce('n', '') != ''")

Starting query at 2020-04-07T02:55:37.607853
Query took 0.02


Unnamed: 0,?column?
0,True


In [328]:
print(query)


select  

    country_code
    , dim_page_type.page_type
    , dim_page_type.page_sub_type
    , case when page_url like '%/embed/%' then true else false end as is_embed
    , page_url
    , device_os
    , device_category
    , browser
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'channel', true) as channel
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product_slug', true) as product_slug
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product', true) as product
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'product_id', true) as product_id
    , dim_product.slug as product_from_id
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider_slug', true) as provider_slug
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'provider', true) as provider
    , json_extract_path_text(trim('"' from dim_activit

In [329]:
type(query)

str

In [330]:
query = sqlalchemy.text(query)
apply_clicks = dq.query(query)

Starting query at 2020-04-07T02:55:37.642922
Query took 104.37


In [359]:
apply_clicks.describe()

Unnamed: 0,country_code,page_type,page_sub_type,is_embed,page_url,device_os,device_category,browser,channel,product_slug,...,provider_id,provider_from_id,affiliate_category,affiliate_location,affiliate_page_type,affiliate_widget_type,list_position,action,source,activity_attributes
count,4747,4747,4747,4747,4747,4747,4747,4747,4747,4747.0,...,4747,23,4747.0,4747.0,4747.0,4747.0,4747.0,4747.0,4747.0,4747
unique,2,4,6,2,388,7,3,19,12,1.0,...,46,7,22.0,5.0,3.0,4.0,27.0,2.0,5.0,3174
top,sg,listing,channel_listing,False,www.moneysmart.sg/personal-loan,Android,mobile,Chrome,credit-cards,,...,1,citibank,,,,,,,,"{""channel"":""credit-cards"",""country"":""sg"",""is_p..."
freq,3245,2661,1846,4559,614,2204,3353,1116,2574,4747.0,...,904,9,4482.0,4411.0,4411.0,4404.0,4296.0,4636.0,4054.0,93


In [331]:
apply_clicks.head(5)

Unnamed: 0,country_code,page_type,page_sub_type,is_embed,page_url,device_os,device_category,browser,channel,product_slug,...,provider_id,provider_from_id,affiliate_category,affiliate_location,affiliate_page_type,affiliate_widget_type,list_position,action,source,activity_attributes
0,sg,listing,channel_listing,False,www.moneysmart.sg/credit-cards,Android,mobile,Samsung Internet,credit-cards,,...,1,,,,,,,,,"{""channel"":""credit-cards"",""country"":""sg"",""is_p..."
1,sg,listing,channel_listing,False,www.moneysmart.sg/credit-cards,Android,mobile,Samsung Internet,credit-cards,,...,1,,,,,,,,,"{""channel"":""credit-cards"",""country"":""sg"",""is_p..."
2,sg,listing,channel_listing,False,www.moneysmart.sg/credit-cards,Android,mobile,Chrome Mobile,credit-cards,,...,9,,,,,,,,,"{""channel"":""credit-cards"",""country"":""sg"",""is_p..."
3,sg,listing,channel_listing,False,www.moneysmart.sg/maid-insurance,Windows,desktop,Chrome,maid-insurance,,...,21,,,,,,,,,"{""channel"":""maid-insurance"",""country"":""sg"",""is..."
4,sg,listing,provider_listing,False,www.moneysmart.sg/credit-cards/dbs,Windows,desktop,Chrome,credit-cards,,...,8,,,,,,,,,"{""channel"":""credit-cards"",""country"":""sg"",""is_p..."


In [332]:
product_provider_summary_cols = [ "page_url", "action", "page_type", "channel"]+ [z for z in apply_clicks.columns if "product" in z or "provider" in z]
afilliate_cols = [z for z in apply_clicks.columns if "affiliate" in z]

In [333]:
apply_clicks[product_provider_summary_cols ].head()

Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id
0,www.moneysmart.sg/credit-cards,,listing,credit-cards,,citi-cashback-plus-card,66,,,citibank,1,
1,www.moneysmart.sg/credit-cards,,listing,credit-cards,,citi-cashback-plus-card,66,,,citibank,1,
2,www.moneysmart.sg/credit-cards,,listing,credit-cards,,american-express-singapore-airlines-krisflyer-...,7,,,american-express,9,
3,www.moneysmart.sg/maid-insurance,,listing,maid-insurance,,fwd-maid-insurance-essential,3,,,fwd,21,
4,www.moneysmart.sg/credit-cards/dbs,,listing,credit-cards,,dbs-live-fresh-card,61,,,dbs,8,


## Issues

In [334]:
def format_results(df):
    def make_clickable(val):
        # target _blank to open new window
        return '<a target="_blank" href="{}">{}</a>'.format("https://"+ val, val)
    
    return df.style.format({'page_url': make_clickable})

### Not having product / provider (slug) set (product_id or provider_id is deprecated)

In [351]:
df = apply_clicks[(apply_clicks.provider.isna()) | (apply_clicks.provider=="")][product_provider_summary_cols]
print("only first 20 shown")
format_results(df.head(20))

only first 20 shown


Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id
32,blog.moneysmart.sg/transportation/bus-mrt-fares-public-transport,,blog_page,credit-cards,,,,,,,1.0,
36,www.moneysmart.hk/zh-hk/health-insurance/vhis-ms,,Unknown,health-insurance,,,,,,,,
53,www.moneysmart.sg/investments/online-brokerages-ms,,Unknown,investments,,,,,,,,
54,www.moneysmart.sg/investments/online-brokerages-ms,,Unknown,investments,,,,,,,,
55,www.moneysmart.sg/investments/city-index-ms,,Unknown,investments,,,,,,,,
56,blog.moneysmart.sg/credit-cards/dbs-credit-cards-singapore-review,,blog_page,credit-cards,,,,,,,4.0,
69,blog.moneysmart.sg/personal-loans/best-personal-loan-singapore,,blog_page,personal-loan,,,,,,,7.0,uob
107,www.moneysmart.hk/zh-hk/credit-cards/annual-fee-waiver-ms,,Unknown,credit-cards,,,,,,,,
119,blog.moneysmart.sg/credit-cards/dbs-live-fresh-card-review,,blog_page,credit-cards,,,,,,,8.0,
148,blog.moneysmart.sg/credit-cards/credit-cards-singapore-free-airport-lounge-access,,blog_page,credit-cards,,,,,,,8.0,


In [358]:
df2 = pd.DataFrame(df.groupby(["page_url", "page_type","channel"]).size().reset_index().sort_values(0, ascending=False))
format_results(df2)

Unnamed: 0,page_url,page_type,channel,0
91,www.moneysmart.sg/investments/online-brokerages-ms,Unknown,investments,86
92,www.moneysmart.sg/investments/saxo-markets-ms,Unknown,investments,34
43,blog.moneysmart.sg/personal-loans/best-personal-loan-singapore,blog_page,personal-loan,22
73,www.moneysmart.sg/car-insurance/aig-ms,Unknown,car-insurance,14
30,blog.moneysmart.sg/credit-cards/dbs-credit-cards-singapore-review,blog_page,credit-cards,13
44,blog.moneysmart.sg/savings-accounts/dbs-multiplier-account-review,blog_page,credit-cards,12
48,blog.moneysmart.sg/shopping/lazada-promo-code-promotion,blog_page,credit-cards,12
67,www.moneysmart.hk/zh-hk/personal-loan/clear-credit-card-debts-ms,Unknown,personal-loan,12
7,blog.moneysmart.hk/zh-hk/credit-cards/%E9%9B%BB%E5%99%A8%E5%84%AA%E6%83%A0-%E4%BF%A1%E7%94%A8%E5%8D%A1-%E8%B2%B7%E9%9B%BB%E5%99%A8-%E5%84%AA%E6%83%A0,blog_page,credit-cards,10
86,www.moneysmart.sg/embed/9cb432acbab519e7863e0608254b41e7/result,Unknown,refinancing,9


### Using product_slug or provider_slug not product / provider

In [336]:
apply_clicks[~(apply_clicks.provider_slug.isna() | (apply_clicks.provider_slug==""))][product_provider_summary_cols]

Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id


In [337]:
apply_clicks[~(apply_clicks["product_slug"].isna() | (apply_clicks["product_slug"]==""))][product_provider_summary_cols]

Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id


### Not having any product or provider info

In [338]:
# No product info
df = apply_clicks[(apply_clicks["product"].isna() | (apply_clicks["product"]=="")) & (apply_clicks.product_id.isna() | ((apply_clicks["product_id"]=="")))][product_provider_summary_cols]
df2 = pd.DataFrame(df.groupby(["page_url", "page_type", "channel"]).size()).reset_index().sort_values(0, ascending=False).rename(columns={0:"click count"})
format_results(df2)

Unnamed: 0,page_url,page_type,channel,click count
77,www.moneysmart.sg/investments/online-brokerages-ms,Unknown,investments,86
78,www.moneysmart.sg/investments/saxo-markets-ms,Unknown,investments,34
43,blog.moneysmart.sg/personal-loans/best-personal-loan-singapore,blog_page,personal-loan,22
67,www.moneysmart.sg/car-insurance/aig-ms,Unknown,car-insurance,14
30,blog.moneysmart.sg/credit-cards/dbs-credit-cards-singapore-review,blog_page,credit-cards,13
48,blog.moneysmart.sg/shopping/lazada-promo-code-promotion,blog_page,credit-cards,12
44,blog.moneysmart.sg/savings-accounts/dbs-multiplier-account-review,blog_page,credit-cards,12
62,www.moneysmart.hk/zh-hk/personal-loan/clear-credit-card-debts-ms,Unknown,personal-loan,12
7,blog.moneysmart.hk/zh-hk/credit-cards/%E9%9B%BB%E5%99%A8%E5%84%AA%E6%83%A0-%E4%BF%A1%E7%94%A8%E5%8D%A1-%E8%B2%B7%E9%9B%BB%E5%99%A8-%E5%84%AA%E6%83%A0,blog_page,credit-cards,10
56,www.moneysmart.hk/zh-hk/health-insurance/vhis-ms,Unknown,health-insurance,9


In [339]:
# No provider info
apply_clicks[(apply_clicks.provider.isna() | (apply_clicks.provider==""))][product_provider_summary_cols]

Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id
32,blog.moneysmart.sg/transportation/bus-mrt-fare...,,blog_page,credit-cards,,,,,,,1,
36,www.moneysmart.hk/zh-hk/health-insurance/vhis-ms,,Unknown,health-insurance,,,,,,,,
53,www.moneysmart.sg/investments/online-brokerage...,,Unknown,investments,,,,,,,,
54,www.moneysmart.sg/investments/online-brokerage...,,Unknown,investments,,,,,,,,
55,www.moneysmart.sg/investments/city-index-ms,,Unknown,investments,,,,,,,,
56,blog.moneysmart.sg/credit-cards/dbs-credit-car...,,blog_page,credit-cards,,,,,,,4,
69,blog.moneysmart.sg/personal-loans/best-persona...,,blog_page,personal-loan,,,,,,,7,uob
107,www.moneysmart.hk/zh-hk/credit-cards/annual-fe...,,Unknown,credit-cards,,,,,,,,
119,blog.moneysmart.sg/credit-cards/dbs-live-fresh...,,blog_page,credit-cards,,,,,,,8,
148,blog.moneysmart.sg/credit-cards/credit-cards-s...,,blog_page,credit-cards,,,,,,,8,


In [340]:
# No product or provider info, grouped by number of clicks on the page
missing_providers = apply_clicks[((apply_clicks.provider=="" ) | (apply_clicks.provider.isna())) & ((apply_clicks.provider_id=="" ) | (apply_clicks.provider_id.isna())) ][["page_url", "provider", "provider_id"]]
missing_providers_grouped = missing_providers.groupby(["page_url"]).size().reset_index() #.rename(columns={0:"click count"})
#missing_providers_grouped.sort_values("provider_id", ascending=False)
format_results(pd.DataFrame(missing_providers_grouped.sort_values(0, ascending=False)))

Unnamed: 0,page_url,0
29,www.moneysmart.sg/investments/online-brokerages-ms,86
30,www.moneysmart.sg/investments/saxo-markets-ms,34
15,www.moneysmart.sg/car-insurance/aig-ms,14
25,www.moneysmart.sg/embed/9cb432acbab519e7863e0608254b41e7/result,13
24,www.moneysmart.sg/embed/5051cca749bae55521c34317d0799cae/result,12
10,www.moneysmart.hk/zh-hk/personal-loan/clear-credit-card-debts-ms,12
4,www.moneysmart.hk/zh-hk/health-insurance/vhis-ms,9
22,www.moneysmart.sg/car-insurance/ntuc-ms,7
5,www.moneysmart.hk/zh-hk/investments/retirement-products-deduct-taxes-ms,7
13,www.moneysmart.hk/zh-hk/personal-loan/no-credit-check-loans-ms,7


### Embed without any info about the page that it's on

In [341]:
print("all of them!")

all of them!


### Blog page without affiliate stuff set
Blog should have full details of e.g. where on the page it is coming from

In [342]:
apply_clicks.page_type.unique()

array(['listing', 'Unknown', 'blog_page', 'details'], dtype=object)

In [343]:
df = apply_clicks[ apply_clicks.page_type.isin(["blog_page"]) & ((apply_clicks.affiliate_category=="") | (apply_clicks.affiliate_location=="") | (apply_clicks.affiliate_page_type=="") |  (apply_clicks.affiliate_widget_type=="")\
             | (apply_clicks.affiliate_category.isna()) | (apply_clicks.affiliate_location.isna()) | (apply_clicks.affiliate_page_type.isna()) |  (apply_clicks.affiliate_widget_type.isna()))][product_provider_summary_cols + afiliate_cols]

format_results(df)

Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id,affiliate_category,affiliate_location,affiliate_page_type,affiliate_widget_type


In [344]:
"""
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_category', true) as affiliate_category
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_location', true) as affiliate_location
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_page_type', true) as affiliate_page_type
    , json_extract_path_text(trim('"' from dim_activity.activity_attributes), 'affiliate_widget_type', true) as affiliate_widget_type

"""

'\n    , json_extract_path_text(trim(\'"\' from dim_activity.activity_attributes), \'affiliate_category\', true) as affiliate_category\n    , json_extract_path_text(trim(\'"\' from dim_activity.activity_attributes), \'affiliate_location\', true) as affiliate_location\n    , json_extract_path_text(trim(\'"\' from dim_activity.activity_attributes), \'affiliate_page_type\', true) as affiliate_page_type\n    , json_extract_path_text(trim(\'"\' from dim_activity.activity_attributes), \'affiliate_widget_type\', true) as affiliate_widget_type\n\n'

### Fails to join on provider_id or product_id

Note that you might expect some pre-falcon stuff in HK not to join as we didn't have the application database loaded.

In [345]:
# product_id is set, but product_from_id is null
df = apply_clicks[((apply_clicks["product"]=="") | apply_clicks["product"].isna()) & apply_clicks.product_id.str.isnumeric() & apply_clicks.product_from_id.isna()][product_provider_summary_cols]
format_results(df)


Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id
105,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,106,,,FWD,88.0,
132,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,120,,,Tokio Marine,89.0,
167,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,107,,,FWD,88.0,
168,www.moneysmart.sg/embed/9cb432acbab519e7863e0608254b41e7/result,,Unknown,refinancing,,,2319,,,,,
231,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,106,,,FWD,88.0,
261,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,118,,,AXA,87.0,
279,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,114,,,Sompo,93.0,
297,www.moneysmart.sg/embed/5051cca749bae55521c34317d0799cae/result,,Unknown,refinancing,,,2392,,,,,
352,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,122,,,MSIG,86.0,
367,www.moneysmart.sg/car-insurance/wizard/results,conversion,Unknown,car-insurance,,,111,,,HLAS,92.0,


In [346]:
# provider can't be interpreted from provider_id
df = apply_clicks[((apply_clicks["provider"]=="") | apply_clicks["provider"].isna()) & apply_clicks.provider_id.str.isnumeric() & apply_clicks.provider_from_id.isna()][product_provider_summary_cols]
print(len(df))
format_results(df)


153


Unnamed: 0,page_url,action,page_type,channel,product_slug,product,product_id,product_from_id,provider_slug,provider,provider_id,provider_from_id
32,blog.moneysmart.sg/transportation/bus-mrt-fares-public-transport,,blog_page,credit-cards,,,,,,,1,
56,blog.moneysmart.sg/credit-cards/dbs-credit-cards-singapore-review,,blog_page,credit-cards,,,,,,,4,
119,blog.moneysmart.sg/credit-cards/dbs-live-fresh-card-review,,blog_page,credit-cards,,,,,,,8,
148,blog.moneysmart.sg/credit-cards/credit-cards-singapore-free-airport-lounge-access,,blog_page,credit-cards,,,,,,,8,
161,blog.moneysmart.sg/credit-cards/best-rewards-credit-cards-singapore,,blog_page,credit-cards,,,,,,,1,
181,blog.moneysmart.sg/budgeting/cheapest-sim-only-plans,,blog_page,credit-cards,,,,,,,12,
202,blog.moneysmart.hk/zh-hk/credit-cards/%E9%9B%BB%E5%99%A8%E5%84%AA%E6%83%A0-%E4%BF%A1%E7%94%A8%E5%8D%A1-%E8%B2%B7%E9%9B%BB%E5%99%A8-%E5%84%AA%E6%83%A0,,blog_page,credit-cards,,,,,,,19,
303,blog.moneysmart.sg/credit-cards/citibank-smrt-card-review,,blog_page,credit-cards,,,,,,,1,
368,blog.moneysmart.sg/shopping/lazada-promo-code-promotion,,blog_page,credit-cards,,,,,,,3,
371,blog.moneysmart.hk/zh-hk/credit-cards/%E9%9B%BB%E5%99%A8%E5%84%AA%E6%83%A0-%E4%BF%A1%E7%94%A8%E5%8D%A1-%E8%B2%B7%E9%9B%BB%E5%99%A8-%E5%84%AA%E6%83%A0,,blog_page,credit-cards,,,,,,,19,


In [347]:
providers_channels[providers_channels.source_provider_id == 1]

Unnamed: 0,provider_id,provider_name,sys_inserted_x,sys_updated_x,source_provider_id,slug,status,channel_id,country_id,language_id,channel_key,channel_name,sys_inserted_y,sys_updated_y
127,838,HSBC,2019-02-19 19:32:36.488054,2019-02-19 19:32:36.488054,1.0,hsbc,1,16,1,1,personal-loan,Personal Loan,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
140,5944,DBS,2019-06-13 19:34:26.014648,2019-06-13 19:34:26.014648,1.0,dbs,1,17,1,1,refinancing,Home Loan Refinancing,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
380,1901,HSBC,2019-03-18 07:48:04.367002,2019-03-18 07:48:04.367002,1.0,hsbc,1,24,1,1,debt-consolidation-plan,Debt Consolidation Plan,2019-03-18 07:40:50.126887,2019-03-18 07:40:50.126887
390,5970,DBS,2019-06-13 19:34:26.014648,2019-06-13 19:34:26.014648,1.0,dbs,1,10,1,1,home-loan,Home Loan,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062


In [348]:
providers_channels[~providers_channels.source_provider_id.isna()][providers_channels.source_provider_id>30].sort_values(["source_provider_id"])

  if __name__ == '__main__':


Unnamed: 0,provider_id,provider_name,sys_inserted_x,sys_updated_x,source_provider_id,slug,status,channel_id,country_id,language_id,channel_key,channel_name,sys_inserted_y,sys_updated_y
491,103,American Express,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,51.0,american-express,1,1,1,3,unknown,Unknown,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
259,112,Bank Of China,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,53.0,boc,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
253,102,CIMB,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,55.0,cimb,0,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
3,104,Citibank,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,56.0,citibank,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
129,105,DBS,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,58.0,dbs,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
128,101,Hitachi Capital,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,60.0,hitachi-capital,0,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
123,106,HSBC,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,62.0,hsbc,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
133,107,Maybank,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,63.0,maybank,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
258,108,OCBC,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,64.0,ocbc,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062
377,109,POSB,2019-01-21 19:32:53.867181,2019-01-21 19:32:53.867181,65.0,posb,1,4,1,1,credit-cards,Credit Cards,2018-09-11 10:11:04.481062,2018-09-11 10:11:04.481062


### Channels Observed (manual sense check)

In [349]:
apply_clicks.columns

Index(['country_code', 'page_type', 'page_sub_type', 'is_embed', 'page_url',
       'device_os', 'device_category', 'browser', 'channel', 'product_slug',
       'product', 'product_id', 'product_from_id', 'provider_slug', 'provider',
       'provider_id', 'provider_from_id', 'affiliate_category',
       'affiliate_location', 'affiliate_page_type', 'affiliate_widget_type',
       'list_position', 'action', 'source', 'activity_attributes'],
      dtype='object')

In [350]:
apply_clicks.groupby(["channel"]).size().sort_index()

channel
car-insurance               147
credit-cards               2574
debt-consolidation-plan      71
health-insurance              9
home-insurance               15
home-loan                     7
investments                 141
maid-insurance               59
personal-loan              1592
refinancing                  23
savings-account              51
travel-insurance             58
dtype: int64