This is one of the Objectiv example notebooks. For more examples visit the 
[example notebooks](https://objectiv.io/docs/modeling/example-notebooks/) section of our docs. The notebooks can run with the demo data set that comes with the our [quickstart](https://objectiv.io/docs/home/try-the-demo/), but can be used to run on your own collected data as well.

All example notebooks are also available in our [quickstart](https://objectiv.io/docs/home/try-the-demo/). With the quickstart you can spin up a fully functional Objectiv demo pipeline in five minutes. This also allows you to run these notebooks and experiment with them on a demo data set.

# Intro
In this notebook, we briefly demonstrate how you can use pre-built models from the [open model hub](https://objectiv.io/docs/modeling/) in conjunction with our modeling library [Bach](https://objectiv.io/docs/modeling/bach/) to quickly build model stacks to answer common product analytics questions.

This example uses real, unaltered data that was collected from https://objectiv.io/ with Objectiv’s Tracker. All models in the open model hub are compatible with data sets that have been validated against the [open analytics taxonomy](https://objectiv.io/docs/taxonomy/).

For an overview of all available models, check out the [open model hub docs](https://objectiv.io/docs/modeling/open-model-hub/models/).

## Getting started
If you are running this example on your own collected data, [see the instructions here](https://objectiv.io/docs/modeling/get-started-in-your-notebook/) on how to setup the database connection and get started in your favorite notebook tool.

### Import the required packages for this notebook
The open model hub package can be installed with `pip install objectiv-modelhub` (this installs Bach as well).  
If you are running this notebook from our quickstart, the model hub and Bach are already installed, so you don't have to install it separately.

In [None]:
from modelhub import ModelHub, display_sql_as_markdown

# Instantiate the model hub object
As a first step, the model hub object is instantiated. The model hub contains collection of data models and convenience functions that can be used with Objectiv data. With `get_objectiv_dataframe()` a Bach DataFrame is created, that already has all columns and data types set correctly and as such can always be used with model hub models.

Bach is Objectiv’s data modeling library. With Bach, you can use familiar Pandas-like DataFrame operations in your notebook. It uses a SQL abstraction layer that enables models to run on the full dataset, and you can output models to SQL with a single command.

The Objectiv Bach api is heavily inspired by the pandas api. We believe this provides a great, generic interface to handle large amounts of data in a python environment while supporting multiple data stores. For more details on Objectiv Bach, visit the docs.

This object points to the data on which the models from the open model hub will be applied. The `time_aggregation` parameter determines the standard timeframe that is used with aggregation functions from the model hub. Ie. '%Y-%m-%d' means that days are used for the time aggregation. Only data starting at `start_date` is used for all following operations.

**Note**  
All operation and models in this notebook are run directly on entire data set in the SQL database using Bach. While the api resembles pandas, pandas is _not_ used for any the operations and calculations.

In [None]:
# instantiate the model hub
modelhub = ModelHub(time_aggregation='%Y-%m-%d')

In [None]:
# get the Bach DataFrame with Objectiv data
df = modelhub.get_objectiv_dataframe(start_date='2021-11-16')

# Using the open model hub
The open model hub is a growing collection of open-source, free to use data models that you can take,
combine and run for product analysis and exploration. It includes models for a wide range of typical product
analytics use cases. The source is available for all models and you're free to make any changes to them. 

The model hub has three types of functions/models:
1. [Helper functions](https://objectiv.io/docs/modeling/open-model-hub/models/helper-functions/). These helper functions simplify manipulating and analyzing the data.
2. [Aggregation models](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/). These models consist of a combination of Bach instructions that run some of the more common data analyses or product analytics metrics.
3. [Machine learning models](https://objectiv.io/docs/modeling/open-model-hub/models/machine-learning/).

Helper functions always return a series with the same shape and index as the
DataFrame they are applied to. This ensures they can be added as a column to that
DataFrame. Helper functions that return SeriesBoolean can be used to filter
the data. The helper functions can be accessed with the `map` accessor from a
model hub instance.

Aggregation models perform multiple Bach instructions that run some of the more common data analyses or
product analytics metrics. Always return aggregated data in some form from the DataFrame the model is applied to. Aggregation models can be accessed with the `aggregate` accessor from a model hub instance.

Most of the model hub helper functions and aggregation models take `data` as their first argument: this is
the DataFrame with the Objectiv data to apply the model to. For an example of a machine learning model look
at the [logistic regression example](https://objectiv.io/docs/modeling/example-notebooks/logistic-regression/).

This notebook demonstrates how to use the model hub by showcasing a selection of the models from the model hub.

## A simple aggregation model
Calculating the unique users is one of the basic models in the model hub. As it is an aggregation model, it is called with `model_hub.aggregate.unique_users()`. It uses the time_aggregation that is set when the model hub was instantiated. With `.head()` we immediately query the data to show the results. `.to_pandas()` can be used to use all results as a pandas object in python. These (and following) results are sorted descending, so we show the latest data first.

In [None]:
users = modelhub.aggregate.unique_users(df)
users.sort_index(ascending=False).head(10)

## Using `map` with the model hub & combining models
This example shows how you use map to label users as a new user. This uses *time_aggregation*. As *time_aggregation* was set to '%Y-%m-%d' it means all hits are labeled as new for the entire day in which the user had its first session.

In [None]:
df['is_new_user'] = modelhub.map.is_new_user(df)
df.is_new_user.head(10)

Or we can label conversion events. To do this we first have to define what a conversion is by setting the type of event and the location on the product at which this event was triggered with `add_conversion_event` (this is called the location stack, see [here](https://objectiv.io/docs/modeling/example-notebooks/open-taxonomy/#location_stack) for info).

In [None]:
modelhub.add_conversion_event(location_stack=df.location_stack.json[{'id': 'Quickstart Guide', '_type': 'LinkContext'}:],
                              event_type='PressEvent',
                              name='quickstart_presses')
df['conversion_events'] = modelhub.map.is_conversion_event(df, 'quickstart_presses')
df.conversion_events.head(10)

### Map, filter, aggregate
As the map functions above retured a SeriesBoolean, they can be used in the model hub combined with a filter and aggregation models. We use the same aggregation model we showed earlier (`unique_users`), but now with the filter `df.conversion_events` applied. This gives the unique converted users per day.

In [None]:
modelhub.aggregate.unique_users(df[df.conversion_events]).sort_index(ascending=False).head(10)

Similarly, we can use other aggregation models from the model hub. In the example below, the average session duration is calculated for new users.

In [None]:
duration_new_users = modelhub.aggregate.session_duration(df[df.is_new_user])
duration_new_users.sort_index(ascending=False).head(10)

### Combining model results
Results from aggregation models can be used together if they share the same index type (similar to pandas). In this example the share of new users per day is calculated.

In [None]:
new_user_share = modelhub.agg.unique_users(df[df.is_new_user]) / modelhub.agg.unique_users(df)
new_user_share.sort_index(ascending=False).head(10)

### Using multiple model hub filters

The model hub's map results can be combined and reused. In this example we set two helper function's results
as a column to the original DataFrame and use them both to filter the data and apply an aggregation model.
In this example we calculate the number of users that were new in a month and also that converted twice on a day.

In [None]:
df['is_new_user_month'] = modelhub.map.is_new_user(df, time_aggregation = '%Y-%m')
df['is_twice_converted'] = modelhub.map.conversions_in_time(df, name='quickstart_presses')==2
modelhub.aggregate.unique_users(df[df.is_new_user_month & df.is_twice_converted]).sort_index(ascending=False).head()

## What's next?
There are several options on how to continue working with the data or using the results otherwise, i.e. for visualization.

### 1. Export models to SQL
As mentioned, all operations and models performed on the DataFrame are run on the SQL database. Therefore it is possible to view all objects as an SQL statement. For the `new_user_share` results that was created this looks as follows:

In [None]:
# complex SQL statement alert!
display_sql_as_markdown(new_user_share)

The SQL for any analysis can be exported with this one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can [quickly create BI dashboards with this](https://objectiv.io/docs/home/try-the-demo#creating-bi-dashboards).

### 2. Further data crunching using the Bach Modeling Library
All results from the model hub are in the form of Bach DataFrames or Series. This makes the model hub and Bach work seamlessly together.

In [None]:
# We'll do a lot of operations on the data in the df DataFrame. To make this easier for the
# database (especially BigQuery), we tell Bach to materialize the current DataFrame as temporary
# table. This statement has no direct effect, but any invocation of head() on the dataframe later
# on will consist of two queries: one to create a temporary table with the current state of the
# dataframe, and one that queries that table and does subsequent operations
df = df.materialize(materialization='temp_table')

In [None]:
# label the number of time a user is converted in a session at a moment using the model hub.
df['conversion_count'] = modelhub.map.conversions_in_time(df, name='quickstart_presses')

# use Bach to do any supported operation using pandas syntax.
# select users that converted
converted_users = df[df.conversion_events].user_id.unique()
# select PressEvents of users that converted
df_selection = df[(df.event_type == 'PressEvent') &
                  (df.user_id.isin(converted_users))]
# calculate the number of PressEvents before conversion per session
presses_per_session = df_selection[df_selection.conversion_count == 0].groupby('session_id').session_hit_number.count()

Show the results, now the underlying query is executed.

In [None]:
presses_per_session.head()

There is another [example](https://objectiv.io/docs/modeling/bach/examples/) that demonstrates what you can do with the Bach modeling
library, or head over to the [api reference](https://objectiv.io/docs/modeling/bach/api-reference/) for a complete overview of the possibilities.

### 3. Export DataFrame and model hub results to pandas DataFrame
Bach DataFrames and/or model hub results can always be exported to pandas. Since Bach DataFrame operation run on the full dataset in the SQL database, it is recommended to export to pandas if data small enough; ie by aggregation or selection.  
By exporting the data to pandas you can use all the options from pandas as well as pandas compatible ML packages.

We plot the previously calculated presses per session before conversion using pandas built-in plotting methods.

In [None]:
# presses_per_session_pd is a pandas Series
presses_per_session_pd = presses_per_session.to_pandas()
presses_per_session_pd.hist()

This concludes the open model hub demo.

We hope you’ve gotten a taste of the power and flexibility of the open model hub to quickly answer common product analytics questions. You can take it a lot further and build highly specific model stacks for in-depth analysis and exploration.

For a complete overview of all available and upcoming models, check out the [model hub docs](https://objectiv.io/docs/modeling/open-model-hub/models/).