# Intro
In this notebook, we briefly demonstrate how you can use pre-built models from the [open model hub](https://objectiv.io/docs/modeling/Objectiv/bach_open_taxonomy.ModelHub/#bach_open_taxonomy.ModelHub) in conjunction with our [modeling library](https://objectiv.io/docs/modeling/) to quickly build model stacks to answer common product analytics questions.

This example uses real, unaltered data that was collected from objectiv.io with Objectiv’s Tracker. All models in the open model hub are compatible with datasets that have been validated against the [open analytics taxonomy](https://objectiv.io/docs/taxonomy/).

For an overview of all available models, check out the [open model hub docs](https://objectiv.io/docs/modeling/Objectiv/bach_open_taxonomy.ModelHub/#bach_open_taxonomy.ModelHub).

In [None]:
from bach_open_taxonomy import ObjectivFrame

# Instantiate the data object
As a first step, the Objectiv Frame object is instantiated. The Objectiv Frame is an extension to Bach DataFrame to use specifically for data that was collected with Objectiv’s Tracker. Bach is Objectiv’s data modeling library. With Bach, you can use familiar Pandas-like DataFrame operations in your notebook. It uses a SQL abstraction layer that enables models to run on the full dataset, and you can output models to SQL with a single command.

The Objectiv Bach api is heavily inspired by the pandas api. We believe this provides a great, generic interface to handle large amounts of data in a python environment while supporting multiple data stores. For more details on Objectiv Bach, visit the docs.

It loads the data as stored by the Objectiv Tracker, makes a few transformations, and sets the right data types.

This object points to the data on which the models from the open model hub will be applied. The `time_aggregation` parameter determines the standard timeframe that is used with aggregation functions from the model hub. Ie. 'YYYY-MM-DD' means that days are used for the time aggregation. Only data starting at `start_date` is used for all following operations.

**Note**  
All operation and models in this notebook are run directly on entire data set in the SQL database using Bach. While the api resembles pandas, pandas is _not_ used for any the operations and calculations.

In [None]:
of = ObjectivFrame.from_objectiv_data(time_aggregation='YYYY-MM-DD', start_date='2021-11-16')

# Using the open model hub
The ModelHub contains a growing collection of open-source, free to use data models that you can take, chain and run to quickly build highly specific model stacks for product analysis and exploration. It includes models for a wide range of typical product analytics use cases. The source is available for all models and you're free to make any changes to them. 

All models are in the `.model_hub` namespace of the Objectiv Frame. You can also use `.mh` as a shorthand. 

The model hub has two main type of functions: `map` and `aggregate`. 
* `map` functions always return a series with the same shape and index as the Objectiv Frame they originate from. This ensures they can be added as a column to that Objectiv Frame.
* `aggregate` fuctions return aggregated data in some form from the Objectiv Frame. Can also be accessed with `agg`.

Additionally the Objectiv Frame can be filtered with the `filter` method. `map` functions that return SeriesBoolean can be used with the `filter` method.

This notebook demonstrates how to use the model hub by showcasing a selection of the models from the model hub.

## A simple aggregation model
Calculating the unique users is one of the basic models in the model hub. As it is an aggregation model, it is called with `.model_hub.aggregate.unique_users()`. It uses the time_aggregation that is set when the Objectiv Frame was instantiated. With `.head()` we immediately query the data to show the results. `.to_pandas()` can be used to use all results as a pandas object in python. These (and following) results are sorted descending, so we show the latest data first.

In [None]:
users = of.model_hub.aggregate.unique_users()
users.sort_index(ascending=False).head(10)

## Using `map` with the model hub & combining models
This example shows how you use map to label users as a new user. This uses *time_aggregation*. As *time_aggregation* was set to 'YYYY-MM-DD' it means all hits are labeled as new for the entire day in which the user had its first session.

In [None]:
is_new_user = of.model_hub.map.is_new_user()
is_new_user.head(10)

Or we can label conversion events. To do this we first have to define what a conversion is by setting the type of event and the location on the product at which this event was triggered with `add_conversion_event` (this is called the location stack, see [here](open-taxonomy-how-to.ipynb#global_contexts-&-location_stack) for info).

In [None]:
of.add_conversion_event(location_stack=of.location_stack.json[{'id': 'Quickstart Guide', '_type': 'LinkContext'}:],
                        event_type='PressEvent',
                        name='quickstart_presses')
conversion_events = of.mh.map.is_conversion_event('quickstart_presses')
conversion_events.head(10)

### Map, filter, aggregate
As the map functions above retured a SeriesBoolean, they can be used in the model hub combined with filter and aggregation models. We use the same aggregation model we showed earlier (`unique_users`), but now with the filter `conversion_events` applied. This gives the unique converted users per day.

In [None]:
of.model_hub.filter(conversion_events).model_hub.aggregate.unique_users().head(10)

Similarly, we can use other aggregation models from the model hub. In the example below, the average session duration is calculated for new users.

In [None]:
duration_new_users = of.model_hub.filter(is_new_user).model_hub.aggregate.session_duration()
duration_new_users.sort_index(ascending=False).head(10)

### Combining model results
Results from aggregation models can be used together if they share the same index type (similar to pandas). In this example the share of new users per day is calculated.

In [None]:
new_user_share = of.mh.filter(is_new_user).mh.agg.unique_users() / of.mh.agg.unique_users()
new_user_share.sort_index(ascending=False).head(10)

### Chaining model filters

Filter model results can be chained, as the output is always an ObjectivFrame with the same columns but less or equal the number of rows as the initial ObjectivFrame. In this example we calculate the number of users that were new in a month and also that converted twice on a day.

In [None]:
of.mh.filter(of.mh.map.is_new_user(time_aggregation = 'YYYY-MM'))\
  .mh.filter(of.mh.map.conversion_count(name='quickstart_presses')==2)\
  .mh.agg.unique_users()\
  .head()

## What's next?
There are several options on how to continue working with the data or using the results otherwise, i.e. for visualization.

### 1. Export models to sql
As mentioned, all operations and models performed on the ObjectivFrame are run on the SQL data base. Therefore it is possible to view all objects as an SQL statement. For the `new_user_share` results that was created this looks as follows:

In [None]:
# complex sql statement alert!
print(new_user_share.view_sql())

### 2. Export to metabase
Aggregation models can be exported to Metabase, to visualize and share your results. This is done for the unique new users:

In [None]:
of.model_hub.to_metabase(of.mh.filter(is_new_user).mh.agg.unique_users(), config={'name': 'Unique New Users'})

The results can be viewed here: http://localhost:3000/dashboard/1-model-hub

### 3. Further data crunching using the Bach Modeling Library
Everything from the [Bach modeling library](https://objectiv.io/docs/modeling/) is available to ObjectivFrames. This makes the model hub and bach work seamlessly together.

In [None]:
# label conversions and the number of time a user is converted in a session at a moment using the model hub.
of['is_conversion'] = of.model_hub.map.is_conversion_event(name='quickstart_presses')
of['conversion_count'] = of.model_hub.map.conversion_count(name='quickstart_presses')

# use Bach to do any supported operation using pandas syntax.
# select users that converted
converted_users = of[of.is_conversion].user_id.unique()
# select PressEvents of users that converted
of_selection = of[(of.event_type == 'PressEvent') &
                  (of.user_id.isin(converted_users))]
# calculate the number of PressEvents before conversion per session
presses_per_session = of_selection[of_selection.conversion_count == 0].groupby('session_id').session_hit_number.count()

Show the results, now the underlying query is executed.

In [None]:
presses_per_session.head()

There is another notebook in the same folder that demonstrates what you can do with the Bach modeling library [open-taxonomy-how-to.ipynb](open-taxonomy-how-to.ipynb), or head over to the [api reference](https://objectiv.io/docs/modeling/reference) for a complete overview of the possibilities.

### 4. Export ObjectivFrame and model hub results to pandas DataFrame
ObjectivFrames and/or model hub results can always be exported to pandas. Since ObjectivFrame operation run on the full dataset in the SQL database, it is recommended to export to pandas if data small enough; ie by aggregation or selection.  
By exporting the data to pandas you can use all the options from pandas as well as pandas compatible ML packages.

We plot the previously calculated presses per session before conversion using pandas built-in plotting methods.

In [None]:
# presses_per_session_pd is a pandas Series
presses_per_session_pd = presses_per_session.to_pandas()
presses_per_session_pd.hist()

This concludes the open model hub demo.

We hope you’ve gotten a taste of the power and flexibility of the open model hub to quickly answer common product analytics questions. You can take it a lot further and build highly specific model stacks for in-depth analysis and exploration.

For a complete overview of all available and upcoming models, check out the [model hub docs](https://objectiv.io/docs/modeling/Objectiv/bach_open_taxonomy.ModelHub).