This is one of the Objectiv [example notebooks](https://objectiv.io/docs/modeling/example-notebooks/). These notebooks can run [on your own data](https://objectiv.io/docs/modeling/get-started-in-your-notebook/), or you can instead run the [Demo](https://objectiv.io/docs/home/quickstart-guide/) to quickly try them out.

# Funnel Discovery

This example notebook shows how to use the 'Funnel Discovery' model on your data collected with Objectiv.

In classical funnel analysis you predefine the steps, and then analyze the differences for user attributes or behavior in each step. However, this means you have to make assumptions about which steps matter, and you potentially miss important, impactful flows, e.g. because they are not very obvious or still small. Yet these can represent major opportunities to boost or optimize.

This is where Funnel Discovery comes in: to discover all the (top) user journeys that lead to conversion, which do not, and run subsequent analyses on them.

In particular, we will discover in this example:

- The most popular consecutive steps overall;
- The steps/flows which lead to conversion;
- The most common drop-offs;
- The top & bottom converting journeys coming from marketing campaigns;
- Etcetera.

To get started, we first have to instantiate the model hub and an Objectiv DataFrame object.

In [43]:
from modelhub import ModelHub
from bach import display_sql_as_markdown

# instantiate the model hub and set the default time aggregation to daily
modelhub = ModelHub(time_aggregation='%Y-%m-%d')
# get a Bach DataFrame with Objectiv data within a defined timeframe
df = modelhub.get_objectiv_dataframe(start_date='2022-02-01', end_date='2022-06-30')

In [44]:
# add specific contexts to the data as columns
df['application'] = df.global_contexts.gc.application
df['feature_nice_name'] = df.location_stack.ls.nice_name

In [45]:
# select which event type to use for further analysis
df = df[df['event_type'] == 'PressEvent']

### Reference
* [modelhub.ModelHub.get_objectiv_dataframe](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/get_objectiv_dataframe/)
* [modelhub.SeriesGlobalContexts.gc](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesGlobalContexts/gc/)
* [modelhub.SeriesLocationStack.ls](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/ls/)

## First: define conversion

As a prerequisite for Funnel Discovery, define the events you see as conversion.

In this example we will view someone as converted when they go on to read the documentation from our website.

In [49]:
# define which data to use as conversion events; in this example, anyone who goes on to read the documentation
df['is_conversion_event'] = False
df.loc[df['application'] == 'objectiv-docs', 'is_conversion_event'] = True

Out of curiousity, let's see which features are used by users that converted, sorted by their conversion impact.

In [50]:
# calculate the percentage of converted users per feature: (converted users per feature) / (total users converted)
total_converted_users = df[df['is_conversion_event']]['user_id'].unique().count().value
top_conversion_locations = modelhub.agg.unique_users(df[df['is_conversion_event']], 
                                                     groupby='feature_nice_name')
top_conversion_locations = (top_conversion_locations / total_converted_users) * 100

# show the results, with .to_frame() for nicer formatting
top_conversion_locations = top_conversion_locations.to_frame().rename(
    columns={'unique_users': 'converted_users_percentage'})
top_conversion_locations.sort_values(by='converted_users_percentage', ascending=False).head()

Unnamed: 0_level_0,converted_users_percentage
feature_nice_name,Unnamed: 1_level_1
Link: Quickstart Guide located at Root Location: home => Navigation: docs-sidebar,15.946844
Link: logo located at Root Location: home => Navigation: navbar-top,10.797342
Link: Tracking located at Root Location: home => Navigation: navbar-top,10.631229
Link: Taxonomy located at Root Location: modeling => Navigation: navbar-top,10.299003
Link: Modeling located at Root Location: tracking => Navigation: navbar-top,9.966777


### Reference
* [bach.Series.unique](https://objectiv.io/docs/modeling/bach/api-reference/Series/unique/)
* [bach.DataFrame.count](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/count/)
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.rename](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/rename/)
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)

## See step sequences per user

Before we see what converted and what didn't, let's have a look at which consecutive steps each user took (aka the features they used) in general, after starting their session, based on the [location stack](https://objectiv.io/docs/tracking/core-concepts/locations). We have to specify the maximum n steps, and use the [get_navigation_paths](TODO) operation.

In [57]:
# Instantiate the FunnelDiscovery model from the open model hub
funnel = modelhub.get_funnel_discovery()

In [58]:
# For every user starting their session, find all maximum 4 consecutive steps they took
max_steps = 4
df_steps = funnel.get_navigation_paths(df, steps=max_steps, by='user_id')
df_steps.head()

Unnamed: 0_level_0,location_stack_step_1,location_stack_step_2,location_stack_step_3,location_stack_step_4
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0000bb2f-66e9-4e48-8e2f-7d0a82446ef4,Link: about-us located at Root Location: home ...,Link: logo located at Root Location: about => ...,,
0000bb2f-66e9-4e48-8e2f-7d0a82446ef4,Link: logo located at Root Location: about => ...,,,
00529837-d672-4747-9b87-fd09f2919326,Link: blog located at Root Location: home => N...,Pressable: after located at Root Location: hom...,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...
00529837-d672-4747-9b87-fd09f2919326,Pressable: after located at Root Location: hom...,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...,Link: docs located at Root Location: blog => N...
00529837-d672-4747-9b87-fd09f2919326,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...,Link: docs located at Root Location: blog => N...,Link: bach-and-sklearn located at Root Locatio...


### Reference
* [modelhub.models.funnel_discovery.get_funnel_discovery](TODO)
* [modelhub.models.funnel_discovery.get_navigation_paths](TODO)

## See top step sequences for all users

For the bigger picture, calculate the most frequent consecutive steps that all users took after starting their session, based on the [location stack](https://objectiv.io/docs/tracking/core-concepts/locations).

In [60]:
df_steps.value_counts().to_frame().head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,value_counts
location_stack_step_1,location_stack_step_2,location_stack_step_3,location_stack_step_4,Unnamed: 4_level_1
Pressable: after located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,,,,154
Pressable: after located at Root Location: home => Content: modeling => Content: modeling-workflow-before-after,,,,126
Link: about-us located at Root Location: home => Navigation: navbar-top,,,,68
Pressable: hamburger located at Root Location: home => Navigation: navbar-top,,,,63
Pressable: before located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,,,,54
Link: logo located at Root Location: home => Navigation: navbar-top,,,,52
Pressable: before located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,Pressable: after located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,,,49
Pressable: after located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,Pressable: before located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,,,48
Pressable: after located at Root Location: home => Content: capture-data => Content: data-capture-workflow-before-after,Pressable: after located at Root Location: home => Content: modeling => Content: modeling-workflow-before-after,,,46
Link: github located at Root Location: home => Navigation: navbar-top,,,,42


### Reference
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.value_counts](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/value_counts/)

## See step sequences that lead to conversion

Now let's find the sequences that actually lead to conversion.

In [62]:
# first, we can get which step is the first conversion step
df_first_conversion_step = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True)

df_first_conversion_step.head(10)

Unnamed: 0_level_0,location_stack_step_1,location_stack_step_2,location_stack_step_3,location_stack_step_4,_first_conversion_step_number
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0000bb2f-66e9-4e48-8e2f-7d0a82446ef4,Link: about-us located at Root Location: home ...,Link: logo located at Root Location: about => ...,,,
0000bb2f-66e9-4e48-8e2f-7d0a82446ef4,Link: logo located at Root Location: about => ...,,,,
00529837-d672-4747-9b87-fd09f2919326,Link: blog located at Root Location: home => N...,Pressable: after located at Root Location: hom...,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...,
00529837-d672-4747-9b87-fd09f2919326,Pressable: after located at Root Location: hom...,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...,Link: docs located at Root Location: blog => N...,
00529837-d672-4747-9b87-fd09f2919326,Link: spin-up-the-demo located at Root Locatio...,Link: blog located at Root Location: home => N...,Link: docs located at Root Location: blog => N...,Link: bach-and-sklearn located at Root Locatio...,4.0
00529837-d672-4747-9b87-fd09f2919326,Link: blog located at Root Location: home => N...,Link: docs located at Root Location: blog => N...,Link: bach-and-sklearn located at Root Locatio...,Link: basic-product-analytics located at Root ...,3.0
00529837-d672-4747-9b87-fd09f2919326,Link: docs located at Root Location: blog => N...,Link: bach-and-sklearn located at Root Locatio...,Link: basic-product-analytics located at Root ...,,2.0
00529837-d672-4747-9b87-fd09f2919326,Link: bach-and-sklearn located at Root Locatio...,Link: basic-product-analytics located at Root ...,,,1.0
00529837-d672-4747-9b87-fd09f2919326,Link: basic-product-analytics located at Root ...,,,,1.0
005aa19c-7e80-4960-928c-a0853355ee5f,Link: check-out-thijs-obj-on-github located at...,Link: jobs located at Root Location: about => ...,,,


In [None]:
# let's filter steps to first conversion 
df_steps_till_conversion = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True, only_converted_paths=True)
df_steps_till_conversion.head() 

In [None]:
# now let's take only stpes which were converted on the 4th one
condition = df_steps_till_conversion['_first_conversion_step_number'] == 4
df_steps_till_conversion[condition].head() 

## Users flow visualisation

Let's use the Sankey diagram to visualize the journey of our users on our website (the flows between the location stacks).
Remember that the width of each link represents the amount in the flow. For each link, if you hoover the mouse you can see the source and the target node.

In [None]:
funnel.plot_sankey_diagram(df_steps_till_conversion[condition], n_top_examples=15)

# Deep dive to step details

## Top drop-off locations

In [None]:
# selected only non converted users
df_non_converted = df[~df['is_conversion_event']]
converted_users = df[df['is_conversion_event']]['user_id']

# selects the events of these non converted users
df_non_converted = df_non_converted[~df_non_converted['user_id'].isin(converted_users)]

In [None]:
# the last location before leaving the website
drop_loc = df_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

In [None]:
# calculate the percentage
drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

## Marketing campaign

In [None]:
# get marketing data
df_marketing = df.copy()
df_marketing['utm_campaign'] = df_marketing.global_contexts.gc.get_from_context_with_type_series(type='MarketingContext', key='campaign')

# get all the users from marketing campaign
user_list = df_marketing[~df_marketing['utm_campaign'].isnull()].user_id
# get all the events of the users who had at least one utm_campaign not None value
df_marketing = df_marketing[df_marketing['user_id'].isin(user_list)]
df_marketing.head()

Let's define conversion events for marketing data.

In [None]:
df_marketing['is_conversion_event'] = False

# define which data to use as conversion events
df_marketing.loc[df_marketing['application'] == 'objectiv-docs', 'is_conversion_event'] = True

In [None]:
# get converted and non converted marketing dataframes
users_converted = df_marketing[df_marketing['is_conversion_event']].user_id
users_non_converted = df_marketing[~df_marketing['user_id'].isin(users_converted)].user_id

df_marketing_converted = df_marketing[df_marketing['is_conversion_event']]
df_marketing_non_converted = df_marketing[df_marketing['user_id'].isin(users_non_converted)] 

Let's calculate the perecentage of of converted and non-converted users

In [None]:
n_users_converted = df_marketing_converted['user_id'].unique().count().value
n_users_non_converted = df_marketing_non_converted['user_id'].unique().count().value
n_users_total = n_users_converted + n_users_non_converted

print(f'Converted users: {round((n_users_converted / n_users_total) * 100)}%')
print(f'Non-converted users: {round((n_users_non_converted / n_users_total) * 100)}%')

Now let's focus on non-converted users data.

#### Top drop-off locations for non-converted users from marketing campaign

In [None]:
drop_loc = df_marketing_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

#### User journey for non-converted users from marketing campaign

In [None]:
max_steps = 4
df_steps = funnel.get_navigation_paths(df_marketing_non_converted, steps=max_steps, by='user_id')
funnel.plot_sankey_diagram(df_steps, n_top_examples=20)

## Next steps

This was a basic demonstration on how can we use the open model hub for funnel discovery. One of the next steps could be to have a in-depth look at the marketing campaign data differences per source and user journeys between converted and drop-off ones.