This is one of the Objectiv [example notebooks](https://objectiv.io/docs/modeling/example-notebooks/). These notebooks can run [on your own data](https://objectiv.io/docs/modeling/get-started-in-your-notebook/), or you can instead run the [Demo](https://objectiv.io/docs/home/try-the-demo/) to quickly try them out.

# Product analytics

This example notebook shows how you can easily do basic product analytics on your data. [See here how to get started in your notebook](https://objectiv.io/docs/modeling/get-started-in-your-notebook/).

## Get started
We first have to instantiate the model hub and an Objectiv DataFrame object.

In [None]:
# set the timeframe of the analysis
start_date = '2022-03-01'
end_date = None

In [None]:
from modelhub import ModelHub, display_sql_as_markdown
from datetime import datetime

# instantiate the model hub and set the default time aggregation to daily
# and set the global contexts that will be used in this example
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application'])
# get a Bach DataFrame with Objectiv data within a defined timeframe
df = modelhub.get_objectiv_dataframe(start_date=start_date, end_date=end_date)

The `location_stack` column, and the columns taken from the global contexts, contain most of the event-specific data. These columns are JSON typed, and we can extract data from it using the keys of the JSON objects with [`SeriesLocationStack`](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/SeriesLocationStack/) methods, or the `context` accessor for global context columns. See the [open taxonomy example](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts) for how to use the `location_stack` and global contexts. 

In [None]:
df['application_id'] = df.application.context.id
df['feature_nice_name'] = df.location_stack.ls.nice_name
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')

### Reference
* [modelhub.ModelHub](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/ModelHub/)
* [modelhub.ModelHub.get_objectiv_dataframe](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/get_objectiv_dataframe/)
* [using global context data](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts)
* [modelhub.SeriesLocationStack.ls](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/ls/)

### Have a look at the data

In [None]:
# sort by users sessions
df.sort_values(['session_id', 'session_hit_number'], ascending=False).head()

In [None]:
# explore the data with describe
df.describe(include='all').head()

### Reference
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)
* [bach.DataFrame.describe](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/describe/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

Next we'll go though a selection of product analytics metrics. We can use models from the [open model hub](https://objectiv.io/docs/modeling/open-model-hub/), or use [modeling library Bach](https://objectiv.io/docs/modeling/bach/) to run data analyses directly on the data store, with Pandas-like syntax.

For each example, [`head()`](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/), [`to_pandas()`](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/to_pandas/) or [`to_numpy()`](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/to_numpy/) can be used to execute the generated SQL and get the results in your notebook.

## Unique users
Let's see the number of unique users over time, with the [unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/) model. By default it will use the `time_aggregation` set when the model hub was instantiated, in this case '%Y-%m-%d', so daily. For `monthly_users`, the default time_aggregation is overridden by using a different `groupby` argument.

In [None]:
# unique users, monthly
monthly_users = modelhub.aggregate.unique_users(df, groupby=modelhub.time_agg(df, '%Y-%m'))
monthly_users.sort_index(ascending=False).head()

In [None]:
# unique users, daily
daily_users = modelhub.aggregate.unique_users(df)
daily_users.sort_index(ascending=False).head(10)

To see the number of users per main product section, group by its [root_location](https://objectiv.io/docs/taxonomy/reference/location-contexts/RootLocationContext).

In [None]:
# unique users, per main product section
users_root = modelhub.aggregate.unique_users(df, groupby=['application_id', 'root_location'])
users_root.sort_index(ascending=False).head(10)

### Reference
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.DataFrame.sort_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_index/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## Retention

To measure how well we are doing at keeping users with us after their first interaction, we can use a retention matrix.

To calculate the retention matrix, we need to distribute the users into mutually exclusive cohorts based on the `time_period` (can be `daily`, `weekly`, `monthly`, or `yearly`) they first interacted.

In the retention matrix:
- each row represents a cohort;
- each column represents a time range, where time is calculated with respect to the cohort start time;
- the values of the matrix elements are the number or percentage (depending on `percentage` parameter) of users in a given cohort that returned again in a given time range.

The users' activity starts to be counted from the `start_date` specified when the modelhub was instantiated.

In [None]:
# retention matrix, monthly, with percentages
retention_matrix = modelhub.aggregate.retention_matrix(df, time_period='monthly', percentage=True, display=True)
retention_matrix.head()

### Drilling down retention cohorts

In the retention matrix above, we can see there's a drop in retained users in the second cohort the next month. We can directly zoom into the different cohorts and see the difference.

In [None]:
# calculate the first cohort
cohorts = df[['user_id', 'moment']].groupby('user_id')['moment'].min().reset_index()
cohorts = cohorts.rename(columns={'moment': 'first_cohort'})

# add first cohort of the users to our DataFrame
df_with_cohorts = df.merge(cohorts, on='user_id')

In [None]:
# filter data where users belong to the #0 cohort
cohort0_filter = (df_with_cohorts['first_cohort'] > datetime(2022, 3, 1)) & (df_with_cohorts['first_cohort'] < datetime(2022, 4, 1))
df_with_cohorts[cohort0_filter]['event_type'].value_counts().head()

In [None]:
# filter data where users belong to the #1 cohort (the problematic one)
cohort1_filter = (df_with_cohorts['first_cohort'] > datetime(2022, 4, 1)) & (df_with_cohorts['first_cohort'] < datetime(2022, 5, 1))
df_with_cohorts[cohort1_filter]['event_type'].value_counts().head()

One interesting thing to note here, for example, is that there are relatively more [`VisibleEvents`](https://objectiv.io/docs/taxonomy/reference/events/VisibleEvent) in the first cohort than in the second 'problematic' one.

This is  just a simple example to demonstrate the differences you can find between cohorts. You could run other models like [top product features](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/top_product_features/), or develop more in-depth analyses.

### Reference
* [modelhub.Aggregate.retention_matrix](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/retention_matrix/)
* [bach.DataFrame.groupby](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/groupby/)
* [bach.DataFrame.min](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/min/)
* [bach.DataFrame.reset_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/reset_index/)
* [bach.DataFrame.rename](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/rename/)
* [bach.DataFrame.merge](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/merge/)
* [bach.DataFrame.value_counts](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/value_counts/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## Time spent (aka duration)
Here we calculate the average duration of a user's session, using the [session_duration model](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/session_duration/).

In [None]:
# duration, monthly average
duration_monthly = modelhub.aggregate.session_duration(df, groupby=modelhub.time_agg(df, '%Y-%m'))
duration_monthly.sort_index(ascending=False).head()

In [None]:
# duration, daily average
duration_daily = modelhub.aggregate.session_duration(df)
duration_daily.sort_index(ascending=False).head()

 To see the average time spent by users in each main product section (per month in this case), group by its [root_location](https://objectiv.io/docs/taxonomy/reference/location-contexts/RootLocationContext).

In [None]:
# duration, monthly average per root_location
duration_root_month = modelhub.aggregate.session_duration(df, groupby=['application_id', 'root_location', modelhub.time_agg(df, '%Y-%m')]).sort_index()
duration_root_month.head(10)

In [None]:
# how is the overall time spent distributed?
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id', exclude_bounces=False)
# materialization is needed because the expression of the created Series contains aggregated data, and it is not allowed to aggregate that.
session_duration.materialize().quantile(q=[0.25, 0.50, 0.75]).head()

### Reference
* [modelhub.Aggregate.session_duration](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/session_duration/)
* [bach.DataFrame.sort_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_index/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)
* [bach.DataFrame.groupby](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/groupby/)
* [bach.DataFrame.materialize](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/materialize/)

## Top used product features

To see which features are most used, we can use the [top_product_features model](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/top_product_features/). 

In [None]:
# see top used product features - by default we select only user actions (InteractiveEvents)
top_product_features = modelhub.aggregate.top_product_features(df)
top_product_features.head()

### Top used features per product area
We also want to look at which features were used most in our top product areas.

In [None]:
# select only user actions, so stack_event_types must contain 'InteractiveEvent'
interactive_events = df[df.stack_event_types.json.array_contains('InteractiveEvent')]
# from these interactions, get the number of unique users per application_id, root_location, feature, and event type.
top_interactions = modelhub.agg.unique_users(interactive_events, groupby=['application_id','root_location','feature_nice_name', 'event_type'])
top_interactions = top_interactions.reset_index()

In [None]:
# let's look at the homepage on our website
home_users = top_interactions[(top_interactions.application_id == 'objectiv-website') &
                              (top_interactions.root_location == 'home')]
home_users.sort_values('unique_users', ascending=False).head()

From the same `top_interactions` object, we can see the top used features on our documentation, which is a separate application.

In [None]:
# see the top used features on our documentation application
docs_users = top_interactions[top_interactions.application_id == 'objectiv-docs']
docs_users.sort_values('unique_users', ascending=False).head()

### Reference
* [modelhub.Aggregate.top_product_features](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/top_product_features/)
* [bach.SeriesJson.json.array_contains](https://objectiv.io/docs/modeling/bach/api-reference/Series/Json/json/#array_contains)
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.DataFrame.reset_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/reset_index/)
* [bach.DataFrame.sort_values](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_values/)

## Conversions
Users have impact on product goals, e.g. conversion to a signup. Here we look at their conversion to such goals. First you define a conversion event, which in this example we've defined as clicking a link to our GitHub repo.

In [None]:
# create a column that extracts all location stacks that lead to our GitHub repo
df['github_press'] = df.location_stack.json[{'id': 'objectiv-on-github', '_type': 'LinkContext'}:]
df.loc[df.location_stack.json[{'id': 'github', '_type': 'LinkContext'}:]!=[],'github_press'] = df.location_stack
# define which events to use as conversion events
modelhub.add_conversion_event(location_stack=df.github_press,
                              event_type='PressEvent',
                              name='github_press')

This conversion event can then be used by several models using the defined name ('github_press'). First we calculate the number of unique converted users.

In [None]:
# number of conversions, daily
df['is_conversion_event'] = modelhub.map.is_conversion_event(df, 'github_press')
conversions = modelhub.aggregate.unique_users(df[df.is_conversion_event])
conversions.to_frame().sort_index(ascending=False).head(10)

### Conversion rate
To calculate the daily conversion rate, we use the earlier created `daily_users` DataFrame.

In [None]:
# conversion rate, daily
conversion_rate = conversions / daily_users
conversion_rate.sort_index(ascending=False).head(10)

### Features  before conversion
We can calculate what users did _before_ converting.

In [None]:
# features used before users converted
top_features_before_conversion = modelhub.agg.top_product_features_before_conversion(df, name='github_press')
top_features_before_conversion.head()

### Exact features that converted
Let's understand which product features actually triggered the conversion.

In [None]:
# features that triggered the conversion
conversion_locations = modelhub.agg.unique_users(df[df.is_conversion_event], 
                                                 groupby=['application_id', 'feature_nice_name', 'event_type'])
conversion_locations.sort_values(ascending=False).to_frame().head()

### Time spent before conversion
Finally, let's see how much time converted users spent before they converted.

In [None]:
# label sessions with a conversion
df['converted_users'] = modelhub.map.conversions_counter(df, name='github_press') >= 1

# label hits where at that point in time, there are 0 conversions in the session
df['zero_conversions_at_moment'] = modelhub.map.conversions_in_time(df, 'github_press') == 0

# filter on above created labels
converted_users = df[(df.converted_users & df.zero_conversions_at_moment)]

# how much time do users spend before they convert?
modelhub.aggregate.session_duration(converted_users, groupby=None).to_frame().head()

### Reference
* [bach.SeriesJson.json](https://objectiv.io/docs/modeling/bach/api-reference/Series/Json/json/)
* [modelhub.ModelHub.add_conversion_event](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/add_conversion_event/)
* [modelhub.Map.is_conversion_event](https://objectiv.io/docs/modeling/open-model-hub/models/helper-functions/is_conversion_event/)
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.Series.to_frame](https://objectiv.io/docs/modeling/bach/api-reference/Series/to_frame/)
* [bach.DataFrame.sort_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_index/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)
* [modelhub.Aggregate.top_product_features_before_conversion](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/top_product_features_before_conversion/)
* [modelhub.Map.conversions_counter](https://objectiv.io/docs/modeling/open-model-hub/models/helper-functions/conversions_counter/)
* [modelhub.Map.conversions_in_time](https://objectiv.io/docs/modeling/open-model-hub/models/helper-functions/conversions_in_time/)
* [modelhub.Aggregate.session_duration](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/session_duration/)

## Funnel Discovery
To analyze the paths that users take that impact your product goals, have a look at the [Funnel Discovery notebook](./funnel-discovery.ipynb).

## Marketing analysis
To analyze the above metrics and more for users coming from marketing efforts, have a look at the [Marketing Analytics notebook](./marketing-analytics.ipynb).

## Get the SQL for any analysis
The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can [quickly create BI dashboards with this](https://objectiv.io/docs/home/try-the-demo#creating-bi-dashboards).

In [None]:
# show SQL for analysis; this is just one example, and works for any Objectiv model/analysis
display_sql_as_markdown(conversions)