This is one of the Objectiv [example notebooks](https://objectiv.io/docs/modeling/example-notebooks/). These notebooks can run [on your own data](https://objectiv.io/docs/modeling/get-started-in-your-notebook/), or you can instead run the [Demo](https://objectiv.io/docs/home/try-the-demo/) to quickly try them out.

# Funnel Discovery

This example notebook shows how to use the 'Funnel Discovery' model on your data collected with Objectiv.

In classical funnel analysis you predefine the steps, and then you analyze the differences for user attributes or behavior in each step. 

However, this means you have to make assumptions about which steps matter, and you potentially miss important, impactful flows, e.g. because they are not very obvious or still small. Yet these can represent major opportunities to boost or optimize.

This is where Funnel Discovery comes in: to discover _all_ the (top) user journeys that lead to conversion or drop-off, and run subsequent analyses on them.

In particular, we will discover in this example:

- The most popular consecutive steps overall;
- The steps/flows which lead to conversion;
- The most common drop-offs;
- The user journeys from marketing campaigns;
- Etcetera.

To get started, we first have to instantiate the model hub and an Objectiv DataFrame object.

In [None]:
# set the timeframe of the analysis
start_date = '2022-02-01'
end_date = None

In [None]:
# instantiate the model hub and set the default time aggregation to daily
# and set the global contexts that will be used in this example
from modelhub import ModelHub
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'marketing'])
# get an Objectiv DataFrame within a defined timeframe
df = modelhub.get_objectiv_dataframe(start_date=start_date, end_date=end_date)

The `location_stack` column, and the columns taken from the global contexts, contain most of the event-specific data. These columns are JSON typed, and we can extract data from it using the keys of the JSON objects with [`SeriesLocationStack`](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/SeriesLocationStack/) methods, or the `context` accessor for global context columns. See the [open taxonomy example](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts) for how to use the `location_stack` and global contexts. 

In [None]:
# add specific contexts to the data as columns
df['application_id'] = df.application.context.id
df['feature_nice_name'] = df.location_stack.ls.nice_name

In [None]:
# select which event type to use for further analysis - PressEvents to focus on what users directly interact with
df = df[df['event_type'] == 'PressEvent']

### Reference
* [modelhub.ModelHub](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/ModelHub/)
* [modelhub.ModelHub.get_objectiv_dataframe](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/get_objectiv_dataframe/)
* [using global context data](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts)
* [modelhub.SeriesLocationStack.ls](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/ls/)

## First: define what is conversion

As a prerequisite for Funnel Discovery, define the events you see as conversion.

In this example we will view someone as converted when they go on to read the documentation from our website, but you can [use any event](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/add_conversion_event/).

In [None]:
# define which data to use as conversion events; in this example, anyone who goes on to read the documentation
df['is_conversion_event'] = False
df.loc[df['application_id'] == 'objectiv-docs', 'is_conversion_event'] = True

Out of curiosity, let's see which features are used by users that converted, sorted by their conversion impact.

In [None]:
# calculate the percentage of converted users per feature: (converted users per feature) / (total users converted)
total_converted_users = df[df['is_conversion_event']]['user_id'].unique().count().value
top_conversion_locations = modelhub.agg.unique_users(df[df['is_conversion_event']], 
                                                     groupby='feature_nice_name')
top_conversion_locations = (top_conversion_locations / total_converted_users) * 100

# show the results, with .to_frame() for nicer formatting
top_conversion_locations = top_conversion_locations.to_frame().rename(
    columns={'unique_users': 'converted_users_percentage'})
top_conversion_locations.sort_values(by='converted_users_percentage', ascending=False).head()

### Reference
* [bach.Series.unique](https://objectiv.io/docs/modeling/bach/api-reference/Series/unique/)
* [bach.DataFrame.count](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/count/)
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.rename](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/rename/)
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## See step sequences per user

Before we see what helped conversion and what didn't, let's have a look at which consecutive steps each user took (aka the features they used) in general, after starting their session, based on the [location stack](https://objectiv.io/docs/tracking/core-concepts/locations). We have to specify the maximum n steps, and use the [get_navigation_paths](https://objectiv.io/docs/modeling/open-model-hub/models/funnels/FunnelDiscovery/get_navigation_paths) operation.

In [None]:
# instantiate the FunnelDiscovery model from the open model hub
funnel = modelhub.get_funnel_discovery()
# set the maximum n steps
max_steps = 4

In [None]:
# for every user starting their session, find all maximum n consecutive steps they took
df_steps = funnel.get_navigation_paths(df, steps=max_steps, by='user_id')
df_steps.head()

### Reference
* [modelhub.ModelHub.get_funnel_discovery](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/get_funnel_discovery)
* [modelhub.models.funnel_discovery.get_navigation_paths](https://objectiv.io/docs/modeling/open-model-hub/models/funnels/FunnelDiscovery/get_navigation_paths)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## See top step sequences for all users

For the bigger picture, calculate the most frequent consecutive steps that all users took after starting their session, based on the [location stack](https://objectiv.io/docs/tracking/core-concepts/locations).

In [None]:
df_steps.value_counts().to_frame().head(20)

### Reference
* [bach.DataFrame.value_counts](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/value_counts/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## See step sequences that lead to conversion

Now let's find the sequences that actually lead to conversion.

First, see which step resulted in conversion to the dataframe, which will be `NaN` for sequences that did not convert.

In [None]:
# add which step resulted in conversion to the dataframe, with the `add_conversion_step_column` param
df_first_conversion_step = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True)
df_first_conversion_step.head(10)

To filter down to all sequences that have actually converted, use the `only_converted_paths` parameter.

In [None]:
# filter down to all sequences that have actually converted with the `only_converted_paths` param
df_steps_till_conversion = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True, only_converted_paths=True)
df_steps_till_conversion.head(5) 

We can use this to for instance see which sequences converted on the 4th step.

In [None]:
# filter down to sequences that converted on the 4th step
condition_convert_on_step_4 = df_steps_till_conversion['_first_conversion_step_number'] == 4
df_steps_till_conversion[condition_convert_on_step_4].head()

#### Visualize these sequences in a Sankey diagram

We can use a Sankey diagram to visualize these customer journeys that lead to conversion (or drop-off). This helps you to select which sequences are most interesting to analyze further.

Let's plot it for the example above, where we filtered down to the sequences that converted on the 4th step. The width of each link represents the amount of times that flow was used, and you can hover over each link to see the source and target node.

In [None]:
# plot the Sankey diagram using the top 15 examples via the `n_top_examples` param
funnel.plot_sankey_diagram(df_steps_till_conversion[condition_convert_on_step_4], n_top_examples=15)

### Reference
* [modelhub.models.funnel_discovery.get_navigation_paths](https://objectiv.io/docs/modeling/open-model-hub/models/funnels/FunnelDiscovery/get_navigation_paths)
* [modelhub.models.funnel_discovery.plot_sankey_diagram](https://objectiv.io/docs/modeling/open-model-hub/models/funnels/FunnelDiscovery/plot_sankey_diagram)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## Deep-dive: top drop-off features

Also interesting to analyze is which features are used relatively often before users drop off. We can do this by finding all _last used_ features by non-converted users, and calculating their usage share.

In [None]:
# select only non-converted users
df_non_converted = df[~df['is_conversion_event']]
converted_users = df[df['is_conversion_event']]['user_id']
# select the events of these non converted users
df_non_converted = df_non_converted[~df_non_converted['user_id'].isin(converted_users)]

In [None]:
# get the last used feature in the location_stack before dropping off
drop_loc = df_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

In [None]:
# show the last used features by non-converted users, sorted by their usage share compared to all features
drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

### Reference
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)
* [bach.DataFrame.groupby](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/groupby/)
* [bach.SeriesString.to_json_array](https://objectiv.io/docs/modeling/bach/api-reference/Series/String/to_json_array)
* [bach.Series.materialize](https://objectiv.io/docs/modeling/bach/api-reference/Series/materialize/)
* [bach.DataFrame.count](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/count/)
* [bach.Series.value_counts](https://objectiv.io/docs/modeling/bach/api-reference/Series/value_counts/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.rename](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/rename/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## Deep-dive: marketing campaign journeys

The same analyses can be run for journeys that start from a marketing campaign, e.g. to analyze why campaigns do or do not convert.

In [None]:
# first, add marketing data to the dataframe
df_marketing = df.copy()
df_marketing['utm_campaign'] = df_marketing.marketing.context.campaign
# filter the dataframe down to users that came in via a marketing campaign
user_list = df_marketing[~df_marketing['utm_campaign'].isnull()].user_id
df_marketing = df_marketing[df_marketing['user_id'].isin(user_list)]

df_marketing.head()

Let's define what you see as conversion events for these users. In this example, we'll again view someone as converted when they go on to read the documentation from our website, but you can [use any event](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/add_conversion_event/).

In [None]:
# define which data to use as conversion events; in this example, anyone who goes on to read the documentation
df_marketing['is_conversion_event'] = False
df_marketing.loc[df_marketing['application_id'] == 'objectiv-docs', 'is_conversion_event'] = True

In [None]:
# get converted and non converted users as dataframes
users_converted = df_marketing[df_marketing['is_conversion_event']].user_id
users_non_converted = df_marketing[~df_marketing['user_id'].isin(users_converted)].user_id

df_marketing_converted = df_marketing[df_marketing['is_conversion_event']]
df_marketing_non_converted = df_marketing[df_marketing['user_id'].isin(users_non_converted)] 

For an overall look: let's calculate the share of converted and non-converted users

In [None]:
n_users_converted = df_marketing_converted['user_id'].unique().count().value
n_users_non_converted = df_marketing_non_converted['user_id'].unique().count().value
n_users_total = n_users_converted + n_users_non_converted

print(f'Converted users: {round((n_users_converted / n_users_total) * 100)}%\n\
Non-converted users: {round((n_users_non_converted / n_users_total) * 100)}%')

Now we're most interested in the large share of users who did not convert; let's have a look at them next.

### Top drop-off features for users from a marketing campaign
Similar to before, we will have a look at which features are used relatively often before users drop off, and this time when they came from a marketing campaign.

In [None]:
drop_loc = df_marketing_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

### Visualize the sequences in a Sankey diagram for non-converted users from a marketing campaign

Similar to before, we can use a Sankey diagram to visualize the customer journeys, this time the ones that lead to dropped-off  users that came from a marketing campaign.

Remember that the width of each link represents the amount of times that flow was used, and you can hover over each link to see the source and target node.

In [None]:
# for BigQuery: if the query is too complex, you can add a temporary table, e.g.
# df_marketing_non_converted = df_marketing_non_converted.materialize(materialization='temp_table')

max_steps = 4
df_steps = funnel.get_navigation_paths(df_marketing_non_converted, steps=max_steps, by='user_id')
funnel.plot_sankey_diagram(df_steps, n_top_examples=15)

### Reference
* [bach.DataFrame.copy](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/copy/)
* [using global context data](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts)
* [bach.Series.isnull](https://objectiv.io/docs/modeling/bach/api-reference/Series/isnull/)
* [bach.Series.isin](https://objectiv.io/docs/modeling/bach/api-reference/Series/isin/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)
* [bach.DataFrame.loc](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/loc/)
* [bach.Series.unique](https://objectiv.io/docs/modeling/bach/api-reference/Series/unique/)
* [bach.DataFrame.count](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/count/)
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)
* [bach.DataFrame.groupby](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/groupby/)
* [bach.SeriesString.to_json_array](https://objectiv.io/docs/modeling/bach/api-reference/Series/String/to_json_array)
* [bach.Series.materialize](https://objectiv.io/docs/modeling/bach/api-reference/Series/materialize/)
* [bach.Series.value_counts](https://objectiv.io/docs/modeling/bach/api-reference/Series/value_counts/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.rename](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/rename/)

## Get the SQL for any analysis
The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can [quickly create BI dashboards with this](https://objectiv.io/docs/home/try-the-demo#creating-bi-dashboards).

## Where to go next
Now that you've discovered the customer journeys that lead to conversion or drop-off, you can further analyze each of them to understand which ones could be optimized, or should get more/less focus. Another next step could be to have a more in-depth look at the marketing campaign data differences per source. 

See the [open taxonomy example](open-taxonomy-how-to.ipynb) for more on how to use open taxonomy based data, or have a look at the other example notebooks for other use cases.