This is one of the Objectiv example notebooks. For more examples visit the 
[example notebooks](https://objectiv.io/docs/modeling/example-notebooks/) section of our docs. The notebooks can run with the demo data set that comes with the our [quickstart](https://objectiv.io/docs/home/quickstart-guide/), but can be used to run on your own collected data as well.

All example notebooks are also available in our [quickstart](https://objectiv.io/docs/home/quickstart-guide/). With the quickstart you can spin up a fully functional Objectiv demo pipeline in five minutes. This also allows you to run these notebooks and experiment with them on a demo data set.

# Funnel Discovery

A funnel is a tool that helps to understand better the journey of the users throughout the product/website and measure the differences in user attributes or behavior in each step. These can allow the companies to find potential flaws 
and help to increase the conversion rate.

In the classical funnel analysis where you pre-define the steps and then start looking at the data one can miss valuable insights, that is why it is important to look at the funnels from different angles. In this notebook, we will demonstrate how model hub can be used for funnel discovery.

In particular, we will find:

- the most popular consecutive steps on our website,
- the user steps/flows which lead to conversion,
- the most common drop-off location, etc.

## Getting started
If you are running this example on your own collected data, [see the instructions here](https://objectiv.io/docs/modeling/get-started-in-your-notebook/) on how to setup the database connection and get started in your favorite notebook tool.

### Import the required packages for this notebook
The open model hub package can be installed with `pip install objectiv-modelhub` (this installs Bach as well).  
If you are running this notebook from our quickstart, the model hub and Bach are already installed, so you don't have to install it separately.

In [None]:
from modelhub import ModelHub

In [None]:
modelhub = ModelHub(time_aggregation='%Y-%m-%d')

In [None]:
df = modelhub.get_objectiv_dataframe(start_date='2022-02-02')

# select which event type to use for further analysis
df = df[df['event_type'] == 'PressEvent']

df['application'] = df.global_contexts.gc.application
df['feature_nice_name'] = df.location_stack.ls.nice_name

### Conversion events

Let's define conversion events in the Objectiv DataFrame.

In [None]:
df['is_conversion_event'] = False

# define which data to use as conversion events
df.loc[df['application'] == 'objectiv-docs', 'is_conversion_event'] = True

What is the percentage of converted users landing to a given location?

In [None]:
total_n_users = df[df['is_conversion_event']]['user_id'].unique().count().value
top_conversion_locations = modelhub.agg.unique_users(df[df['is_conversion_event']], groupby='feature_nice_name')

# calculate the percentage
top_conversion_locations = (top_conversion_locations / total_n_users) * 100

# calling .to_frame() for nicer formatting
top_conversion_locations = top_conversion_locations.to_frame().rename(
    columns={'unique_users': 'converted_users_percentage'})

top_conversion_locations.sort_values(by='converted_users_percentage', ascending=False).head()

### Get users consecutive steps

Now let's calculate the consecutive steps for the users in our website (we have to specify the maximum n steps)

In [None]:
funnel = modelhub.get_funnel_discovery()

In [None]:
# let's find all maximum n consecutive locations that users followed
max_steps = 4
df_steps = funnel.get_navigation_paths(df, steps=max_steps, by='user_id')
df_steps.head()

### Top step sequences

For getting the bigger picture it is useful to see what are the most frequent consecutive steps of our users.

In [None]:
df_steps.value_counts().to_frame().head(20)

### Converting step sequences 

Now let's find the sequences which lead to the conversion.

In [None]:
# first, we can get which step is the first conversion step
df_first_conversion_step = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True)

df_first_conversion_step.head()

In [None]:
# let's filter steps to first conversion 
df_steps_till_conversion = funnel.get_navigation_paths(df, steps=max_steps, by='user_id', add_conversion_step_column=True, only_converted_paths=True)
df_steps_till_conversion.head() 

In [None]:
# now let's take only stpes which were converted on the 4th one
condition = df_steps_till_conversion['_first_conversion_step_number'] == 4
df_steps_till_conversion[condition].head() 

### Users flow visualisation

Let's use the Sankey diagram to visualize the journey of our users on our website (the flows between the location stacks).
Remember that the width of each link represents the amount in the flow. For each link, if you hoover the mouse you can see the source and the target node.

In [None]:
funnel.plot_sankey_diagram(df_steps_till_conversion[condition], n_top_examples=15)

# Deep dive to step details

## Top drop-off locations

In [None]:
# selected only non converted users
df_non_converted = df[~df['is_conversion_event']]
converted_users = df[df['is_conversion_event']]['user_id']

# selects the events of these non converted users
df_non_converted = df_non_converted[~df_non_converted['user_id'].isin(converted_users)]

In [None]:
# the last location before leaving the website
drop_loc = df_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

In [None]:
# calculate the percentage
drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

## Marketing campaign

In [None]:
# get marketing data
df_marketing = df.copy()
df_marketing['utm_campaign'] = df_marketing.global_contexts.gc.get_from_context_with_type_series(type='MarketingContext', key='campaign')

# get all the users from marketing campaign
user_list = df_marketing[~df_marketing['utm_campaign'].isnull()].user_id
# get all the events of the users who had at least one utm_campaign not None value
df_marketing = df_marketing[df_marketing['user_id'].isin(user_list)]
df_marketing.head()

Let's define conversion events for marketing data.

In [None]:
df_marketing['is_conversion_event'] = False

# define which data to use as conversion events
df_marketing.loc[df_marketing['application'] == 'objectiv-docs', 'is_conversion_event'] = True

In [None]:
# get converted and non converted marketing dataframes
users_converted = df_marketing[df_marketing['is_conversion_event']].user_id
users_non_converted = df_marketing[~df_marketing['user_id'].isin(users_converted)].user_id

df_marketing_converted = df_marketing[df_marketing['is_conversion_event']]
df_marketing_non_converted = df_marketing[df_marketing['user_id'].isin(users_non_converted)] 

Let's calculate the perecentage of of converted and non-converted users

In [None]:
n_users_converted = df_marketing_converted['user_id'].unique().count().value
n_users_non_converted = df_marketing_non_converted['user_id'].unique().count().value
n_users_total = n_users_converted + n_users_non_converted

print(f'Converted users: {round((n_users_converted / n_users_total) * 100)}%')
print(f'Non-converted users: {round((n_users_non_converted / n_users_total) * 100)}%')

Now let's focus on non-converted users data.

#### Top drop-off locations for non-converted users from marketing campaign

In [None]:
drop_loc = df_marketing_non_converted.sort_values('moment').groupby('user_id')['feature_nice_name'].to_json_array().json[-1].materialize()
total_count = drop_loc.count().value

drop_loc_percent = (drop_loc.value_counts() / total_count) * 100
drop_loc_percent = drop_loc_percent.to_frame().rename(columns={'value_counts': 'drop_percentage'})
drop_loc_percent.sort_values(by='drop_percentage', ascending=False).head()

#### User journey for non-converted users from marketing campaign

In [None]:
max_steps = 4
df_steps = funnel.get_navigation_paths(df_marketing_non_converted, steps=max_steps, by='user_id')
funnel.plot_sankey_diagram(df_steps, n_top_examples=20)

## Next steps

This was a basic demonstration on how can we use the open model hub for funnel discovery. One of the next steps could be to have a in-depth look at the marketing campaign data differences per source and user journeys between converted and drop-off ones.