This is one of the Objectiv example notebooks. For more examples visit the 
[example notebooks](https://objectiv.io/docs/modeling/example_notebooks/) section of our docs. The notebooks can run with the demo data set that comes with the our [quickstart](https://objectiv.io/docs/home/quickstart-guide/), but can be used to run on your own collected data as well.

All example notebooks are also available in our [quickstart](https://objectiv.io/docs/home/quickstart-guide/). With the quickstart you can spin up a fully functional Objectiv demo pipeline in five minutes. This also allows you to run these notebooks and experiment with them on a demo data set.

# Basic user intent analysis

In this notebook, we briefly demonstrate how you can easily do basic user intent analysis on your data.

## Getting started

### Import the required packages for this notebook
The open model hub package can be installed with `pip install objectiv-modelhub` (this installs Bach as well).  
If you are running this notebook from our quickstart, the model hub and Bach are already installed, so you don't have to install it separately.

In [20]:
from modelhub import ModelHub
from bach import display_sql_as_markdown
import bach
import pandas as pd 
from datetime import timedelta

At first we have to instantiate the Objectiv DataFrame object and the model hub.

In [21]:
# instantiate the model hub
modelhub = ModelHub(time_aggregation='YYYY-MM-DD')

# get the Bach DataFrame with Objectiv data
df = modelhub.get_objectiv_dataframe(start_date='2022-02-02')

The columns 'global_contexts' and the 'location_stack' contain most of the event specific data. These columns
are json type columns and we can extract data from it based on the keys of the json objects using `SeriesGlobalContexts` or `SeriesGlobalContexts` methods to extract the data.

In [22]:
# adding specific contexts to the data
df['application'] = df.global_contexts.gc.application
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')

## Exploring root location
The `root_location` context in the `location_stack` uniquely represents the top-level UI location of the user. As a first step of grasping user internt, this is a good starting point to see in what main areas of your product users are spending time.

In [23]:
# model hub: unique users per root location
users_root = modelhub.aggregate.unique_users(df, groupby=['application','root_location'])
users_root.head(10)

application       root_location
objectiv-docs     docs             349
                  home             167
                  modeling         205
                  taxonomy         105
                  tracking         217
                  NaN               51
objectiv-website  about            202
                  blog             390
                  home             713
                  jobs             177
Name: unique_users, dtype: int64

## Exploring session duration
The average `session_duration` model from the [open model hub](https://objectiv.io/docs/modeling/) is another good pointer to explore first for user intent.

In [24]:
# model hub: duration, per root location
duration_root = modelhub.aggregate.session_duration(df, groupby=['application', 'root_location']).sort_index()
duration_root.head(10)

application       root_location
objectiv-docs     docs            0 days 00:07:10.198819
                  home            0 days 00:06:47.658310
                  modeling        0 days 00:08:07.080317
                  taxonomy        0 days 00:08:05.581182
                  tracking        0 days 00:06:04.432021
                  NaN             0 days 00:05:37.044739
objectiv-website  about           0 days 00:03:07.623953
                  blog            0 days 00:03:58.400109
                  home            0 days 00:06:21.426541
                  jobs            0 days 00:02:26.013513
Name: session_duration, dtype: timedelta64[ns]

In [25]:
# how is this time spent distributed?
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id')

# materialization is needed because the expression of the created series contains aggregated data, and it is not allowed to aggregate that.
session_duration.to_frame().materialize()['session_duration'].quantile(q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).head(10)

quantile
0.1   0 days 00:00:00.615600
0.2   0 days 00:00:01.299200
0.3   0 days 00:00:03.748400
0.4   0 days 00:00:19.432400
0.5   0 days 00:01:05.058000
0.6   0 days 00:02:46.672000
0.7   0 days 00:03:25.927800
0.8   0 days 00:07:40.631800
0.9   0 days 00:21:47.009000
Name: session_duration, dtype: timedelta64[ns]

## Defining different stages of user intent
After exploring the `root_location` and `session_duration`, we can make a simple definition of different stages of user intent.

Based on the objectiv.io website data in the quickstart, we could define them as:

| User intent | Root locations | Duration |
| :--- | :--- | : --- |
| 1 - Inform | home, blog, about | less than 2 minutes |
| 2 - Explore | modeling, taxonomy, tracking | between 2 and 20 minutes |
| 3 - Implement | modeling, taxonomy, tracking | more than 20 minutes | 

This is just for illustration purposes, you can adjust these definitions based on your own collected data. 

## Assigning user intent
Based on the simple definitions above, we can start assigning a stage of intent to each user. We do this per timeframe (in this case monthly), as users can progress from one stage to the next over time.

In [26]:
# model hub: calculate average session duration per user, monthly
user_duration = modelhub.aggregate.session_duration(df, groupby=['user_id', modelhub.time_agg(df, 'YYYY-MM')])
user_duration.sort_index(ascending=False).head(10)

user_id                               time_aggregation
ffc42c2e-7db3-488f-a4ae-3e6452c5050b  2022-02            0 days 00:00:33.013500
ffc233df-022a-4567-b74a-631b23c45dfc  2022-02            0 days 00:02:01.376000
ffc0ba50-9146-438c-bac3-38faa7183dda  2022-04            0 days 00:00:33.219000
ff9c6d4b-06f6-46b1-ae8d-06854e9112c3  2022-01            0 days 00:03:42.328000
ff65c618-588f-4607-ad4e-440f58871129  2022-01            0 days 00:03:20.055000
ff48d79a-195a-476a-b49d-0e212de43c96  2022-04            0 days 00:01:10.978333
                                      2022-03            0 days 00:00:57.196500
                                      2022-02            0 days 00:00:00.661667
                                      2022-01            0 days 00:00:33.684000
ff33827e-671b-41c3-a6d4-6e13838c4e3a  2022-03            0 days 00:03:08.289000
Name: session_duration, dtype: timedelta64[ns]

In [27]:
# model hub: calculate average session duration per user, per root location, monthly
user_root_duration = modelhub.aggregate.session_duration(df, groupby=['user_id', 'application', 'root_location', modelhub.time_agg(df, 'YYYY-MM')])
user_root_duration.sort_index(ascending=False).head(10)

user_id                               application       root_location  time_aggregation
ffc42c2e-7db3-488f-a4ae-3e6452c5050b  objectiv-docs     docs           2022-02            0 days 00:00:03.434000
ffc233df-022a-4567-b74a-631b23c45dfc  objectiv-website  home           2022-02            0 days 00:02:01.376000
ffc0ba50-9146-438c-bac3-38faa7183dda  objectiv-website  home           2022-04            0 days 00:00:33.219000
ff9c6d4b-06f6-46b1-ae8d-06854e9112c3  objectiv-docs     NaN            2022-01            0 days 00:03:42.328000
ff65c618-588f-4607-ad4e-440f58871129  objectiv-docs     NaN            2022-01            0 days 00:03:20.055000
ff48d79a-195a-476a-b49d-0e212de43c96  objectiv-website  home           2022-04            0 days 00:01:10.978333
                                                                       2022-03            0 days 00:00:56.477500
                                                                       2022-02            0 days 00:00:00.638500
        

In [28]:
# assign user intent according to the defintions above
user_root_duration_f = user_root_duration.to_frame().reset_index()

roots = bach.DataFrame.from_pandas(engine=df.engine, df=pd.DataFrame({'roots':['home','blog','about']}), convert_objects=True).roots
roots2 = bach.DataFrame.from_pandas(engine=df.engine, df=pd.DataFrame({'roots':['modeling','taxonomy', 'tracking']}), convert_objects=True).roots

inform = user_root_duration_f[(user_root_duration_f.root_location.isin(roots)) &
                              (user_root_duration_f.session_duration < timedelta(0,120))]

explore = user_root_duration_f[(((user_root_duration_f.root_location.isin(roots))) | 
                                ((user_root_duration_f.root_location.isin(roots2)))) &
                               (user_root_duration_f.session_duration >= timedelta(0,120)) &
                               (user_root_duration_f.session_duration <= timedelta(0,1200))]

implement = user_root_duration_f[(((user_root_duration_f.root_location.isin(roots))) | 
                                  ((user_root_duration_f.root_location.isin(roots2)))) &
                                 (user_root_duration_f.session_duration > timedelta(0,1200))]

inform['bucket'] = '1 - inform'
explore['bucket'] = '2 - explore'
implement['bucket'] = '3 - implement'

results = inform.append(implement).append(explore)

# total number of users per intent
results.to_pandas().groupby(['bucket']).agg({'root_location':'unique','session_duration':['min','max'],'user_id':'nunique'})

Unnamed: 0_level_0,root_location,session_duration,session_duration,user_id
Unnamed: 0_level_1,unique,min,max,nunique
bucket,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1 - inform,"[about, home, blog]",0 days 00:00:00.001000,0 days 00:01:59.692000,652
2 - explore,"[modeling, home, taxonomy, blog, tracking, about]",0 days 00:02:00.367000,0 days 00:19:30.948000,368
3 - implement,"[about, home, taxonomy, blog, tracking, modeling]",0 days 00:20:11.870000,0 days 00:59:43.205000,36


In [31]:
# user intent over time
results.to_pandas().groupby(['bucket', 'time_aggregation']).agg({'user_id':'nunique'})

Unnamed: 0_level_0,Unnamed: 1_level_0,user_id
bucket,time_aggregation,Unnamed: 2_level_1
1 - inform,2022-01,17
1 - inform,2022-02,370
1 - inform,2022-03,230
1 - inform,2022-04,85
2 - explore,2022-01,20
2 - explore,2022-02,146
2 - explore,2022-03,152
2 - explore,2022-04,80
3 - implement,2022-01,4
3 - implement,2022-02,19
