This is one of the Objectiv example notebooks. For more examples visit the 
[example notebooks](https://objectiv.io/docs/modeling/example_notebooks/) section of our docs. The notebooks can run with the demo data set that comes with the our [quickstart](https://objectiv.io/docs/home/quickstart-guide/), but can be used to run on your own collected data as well.

All example notebooks are also available in our [quickstart](https://objectiv.io/docs/home/quickstart-guide/). With the quickstart you can spin up a fully functional Objectiv demo pipeline in five minutes. This also allows you to run these notebooks and experiment with them on a demo data set.

# Basic user intent analysis

In this notebook, we briefly demonstrate how you can easily do basic user intent analysis on your data.

## Getting started

### Import the required packages for this notebook
The open model hub package can be installed with `pip install objectiv-modelhub` (this installs Bach as well).  
If you are running this notebook from our quickstart, the model hub and Bach are already installed, so you don't have to install it separately.

In [49]:
from modelhub import ModelHub
from bach import display_sql_as_markdown

At first we have to instantiate the Objectiv DataFrame object and the model hub.

In [8]:
# instantiate the model hub
modelhub = ModelHub(time_aggregation='YYYY-MM-DD')

# get the Bach DataFrame with Objectiv data
df = modelhub.get_objectiv_dataframe(start_date='2022-02-02')


The columns 'global_contexts' and the 'location_stack' contain most of the event specific data. These columns
are json type columns and we can extract data from it based on the keys of the json objects using `SeriesGlobalContexts` or `SeriesGlobalContexts` methods to extract the data.

In [9]:
# adding specific contexts to the data
df['application'] = df.global_contexts.gc.application
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')

## Exploring root location
The `root_location` context in the `location_stack` uniquely represents the top-level UI location of the user. As a first step of grasping user internt, this is a good starting point to see in what main areas of your product users are spending time.

In [14]:
# model hub: unique users per root location
users_root = modelhub.aggregate.unique_users(df, groupby=['application','root_location'])
users_root.head(10)

application       root_location
objectiv-docs     home             127
                  modeling         177
                  taxonomy          85
                  tracking         196
objectiv-website  about             79
                  blog              88
                  home             309
                  jobs              66
                  join-slack        15
                  privacy           13
Name: unique_users, dtype: int64

## Exploring session duration
The average `session_duration` model from the [open model hub](https://objectiv.io/docs/modeling/) is another good pointer to explore first for user intent.

In [32]:
# model hub: duration, per root location
duration_root = modelhub.aggregate.session_duration(df, groupby=['application', 'root_location']).sort_index()
duration_root.head(10)

application       root_location
objectiv-docs     home            0 days 00:06:12.911952
                  modeling        0 days 00:07:48.745281
                  taxonomy        0 days 00:06:53.972140
                  tracking        0 days 00:04:24.685181
objectiv-website  about           0 days 00:03:39.784815
                  blog            0 days 00:05:50.151914
                  home            0 days 00:05:09.574889
                  jobs            0 days 00:02:18.935500
                  join-slack      0 days 00:02:54.565217
                  privacy         0 days 00:01:34.075571
Name: session_duration, dtype: timedelta64[ns]

In [33]:
# how is this time spent distributed?
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id')

# materialization is needed because the expression of the created series contains aggregated data, and it is not allowed to aggregate that.
session_duration.to_frame().materialize()['session_duration'].quantile(q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).head(10)

quantile
0.1   0 days 00:00:00.685500
0.2   0 days 00:00:01.445000
0.3   0 days 00:00:03.913500
0.4   0 days 00:00:19.430000
0.5   0 days 00:01:01.289000
0.6   0 days 00:02:56.851000
0.7   0 days 00:03:23.995000
0.8   0 days 00:06:16.642000
0.9   0 days 00:21:18.982500
Name: session_duration, dtype: timedelta64[ns]

## Defining different stages of user intent
After exploring the `root_location` and `session_duration`, we can make a simple definition of different stages of user intent.

Based on the objectiv.io website data in the quickstart, we could define them as:

| User intent | Root locations | Duration |
| :--- | :--- | : --- |
| 1) Inform | website: home, blog, about | less than 2 minutes |
| 2) Explore | website: home, blog, about <br>docs: home, modeling, taxonomy, tracking | between 2 and 20 minutes |
| 3) Implement | website: home, blog, about <br>docs: home, modeling, taxonomy, tracking | more than 20 minutes | 

This is just for illustration purposes, you can adjust these definitions based on your own collected data. 

## Assigning user intent
Based on the simple definitions above, we can start assigning a stage of intent to each user. We do this per timeframe (in this case monthly), as users can progress from one stage to the next over time.

In [35]:
# model hub: calculate average session duration per user, monthly
user_duration = modelhub.aggregate.session_duration(df, groupby=['user_id', modelhub.time_agg(df, 'YYYY-MM')])
user_duration.sort_index(ascending=False).head(10)

user_id                               time_aggregation
ffc0ba50-9146-438c-bac3-38faa7183dda  2022-04            0 days 00:00:33.219000
ff48d79a-195a-476a-b49d-0e212de43c96  2022-04            0 days 00:01:10.978333
                                      2022-03            0 days 00:00:57.196500
ff33827e-671b-41c3-a6d4-6e13838c4e3a  2022-03            0 days 00:03:08.289000
fec82cdd-c052-4195-a73d-4d948d121e4d  2022-03            0 days 00:00:00.001500
fea957cf-d396-43ed-b987-e59d99dab76e  2022-03            0 days 00:03:40.536000
fd3580c3-12c8-40e3-9b35-1e3960480582  2022-04            0 days 00:03:01.933000
fc73b949-061d-4d40-b670-03a3af5bf4a8  2022-03            0 days 00:03:23.205000
fc4389c3-6931-4323-ba38-211d5eb4874d  2022-03            0 days 00:01:35.546000
fc015b47-d187-43e7-b337-7177b07e4f78  2022-03            0 days 00:00:00.008000
Name: session_duration, dtype: timedelta64[ns]

In [42]:
# model hub: calculate average session duration per user, per root location, monthly
user_root_duration = modelhub.aggregate.session_duration(df, groupby=['user_id', 'application', 'root_location', modelhub.time_agg(df, 'YYYY-MM')])
user_root_duration.sort_index(ascending=False).head(10)

user_id                               application       root_location  time_aggregation
ffc0ba50-9146-438c-bac3-38faa7183dda  objectiv-website  home           2022-04            0 days 00:00:33.219000
ff48d79a-195a-476a-b49d-0e212de43c96  objectiv-website  home           2022-04            0 days 00:01:10.978333
                                                                       2022-03            0 days 00:00:56.477500
ff33827e-671b-41c3-a6d4-6e13838c4e3a  objectiv-website  blog           2022-03            0 days 00:00:01.125000
                                      objectiv-docs     tracking       2022-03            0 days 00:00:00.004000
                                                        taxonomy       2022-03            0 days 00:00:59.272000
                                                        modeling       2022-03            0 days 00:00:19.193000
                                                        home           2022-03            0 days 00:00:00.006000
fec82cdd