# An example of Gloria Mark's 25 minute refocus time, calculated with real activity data

In this notebook we see an example of the Gloria Mark refocus time length (25 minutes) put to work to find real examples of a user refocusing after interruption, and being re-interrupted by digitial communication tools.

First, importing important packages for demonstration / analysis:

In [11]:
import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath('application/'))))
from application.models import User, GoogleCalendarEvent, GoogleCalendarUser, SlackUser
from application.applied_science import data_utils as du
import datetime as dt
from application.applied_science import data_annotation as da
import pandas as pd
print_columns = ['slack_conversation_read_count', 'slack_user_event_count', 
                 'google_calendar_event_id', 'focused_work_period_start']

Next, getting an authenticated user:

In [12]:
users = User.query.filter(User.fully_authenticated).all()
user = users[1]
user

User('lauriermantel','lauriermantel@gmail.com','default.jpg')

Then, we get the user's activity for this week. Note: 2016 rows * 5 minute intervals / 60 minutes per hour = 168 hours, or 1 exact week

In [3]:
activity_df = da.get_activity_report_df(user)
activity_df.index = activity_df.datetime_utc
activity_df

Unnamed: 0_level_0,id,datetime_utc,user_id,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,google_calendar_event_count
datetime_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-03-01 05:00:00,1330526,2020-03-01 05:00:00,12,0,0,,0
2020-03-01 05:05:00,1330527,2020-03-01 05:05:00,12,0,0,,0
2020-03-01 05:10:00,1330528,2020-03-01 05:10:00,12,0,0,,0
2020-03-01 05:15:00,1330529,2020-03-01 05:15:00,12,0,0,,0
2020-03-01 05:20:00,1339169,2020-03-01 05:20:00,12,0,0,,0
...,...,...,...,...,...,...,...
2020-03-08 04:35:00,1850877,2020-03-08 04:35:00,12,0,0,,0
2020-03-08 04:40:00,1850878,2020-03-08 04:40:00,12,0,0,,0
2020-03-08 04:45:00,1850879,2020-03-08 04:45:00,12,0,0,,0
2020-03-08 04:50:00,1850880,2020-03-08 04:50:00,12,0,0,,0


Next, defining the Gloria Mark interruption model based on her paper [here](https://www.ics.uci.edu/~gmark/chi08-mark.pdf). Basically, `pfr` is "periods for refocus" - the number of 5 minute periods required to refocus after an interruption. `ipl` is "interruption period length", the number of periods required for an interruption to occur. `mira` is "message interruption read amount", the number of messages read required for them to be considered an interruption. `misa` is "message interruption send amount", the number of messages sent requiring them to be an interruption.

An interruption to a 5-minute period is defined as reading at least two messages and/or sending one message, _or_ having a meeting (calendar event).

Then, once a user has been interrupted, they must have no interruptions for the minimum number of "periods for refocus".

In [4]:
def gloria_mark(mini_df, pfr=5, ipl=2, mira=2, misa=1):
  # refocus is 5 periods (25 minutes)
  # interruption period is 2 (need to get interrupted 10 minutes in a row)
  # slack interruption is reading 2 messages or sending 1 message
  slack_read_col = mini_df[du.SLACK_CONVERSATION_READ_COLUMN_NAME]
  slack_send_col = mini_df[du.SLACK_USER_EVENT_COLUMN_NAME]
  cal_event_count_col = mini_df[du.GOOGLE_CALENDAR_EVENT_COUNT_COLUMN_NAME]
  interruptions = mini_df.head(ipl).loc[
    (slack_read_col.head(ipl) >= mira)
    | (slack_send_col.head(ipl) >= misa)
  ]
  
  if (len(interruptions) >= ipl) or (cal_event_count_col[0] > 0):
    interruption_dt = mini_df.head(1).datetime_utc[0]
  else:
    interruption_dt = None
  
  if(len(mini_df.head(pfr).loc[
    (slack_read_col.head(pfr) > mira)
    | (slack_send_col.head(pfr) > misa)
    | (cal_event_count_col.head(pfr) > 0)]) == 0):
    refocus_dt = mini_df.head(pfr).datetime_utc[-1]
  else:
    refocus_dt = None
  return interruption_dt, refocus_dt

activity_df['focused_work_period_start'] = da.focused_work_calculation(activity_df, gloria_mark, df_size=5)
activity_df.head()[print_columns]

Unnamed: 0_level_0,user_id,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,focused_work_period_start
datetime_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-03-01 05:00:00,12,0,0,,NaT
2020-03-01 05:05:00,12,0,0,,NaT
2020-03-01 05:10:00,12,0,0,,NaT
2020-03-01 05:15:00,12,0,0,,NaT
2020-03-01 05:20:00,12,0,0,,NaT


In [5]:
activity_df.describe()

Unnamed: 0,id,user_id,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,google_calendar_event_count
count,2016.0,2016.0,2016.0,2016.0,267.0,2016.0
mean,1818506.0,12.0,0.059028,0.112599,12538.617978,0.150298
std,100046.6,0.0,0.287002,0.665879,17.808192,0.404356
min,1330526.0,12.0,0.0,0.0,12510.0,0.0
25%,1849370.0,12.0,0.0,0.0,12513.0,0.0
50%,1849874.0,12.0,0.0,0.0,12549.0,0.0
75%,1850377.0,12.0,0.0,0.0,12551.0,0.0
max,1850881.0,12.0,4.0,11.0,12553.0,2.0


We can then find places where focus periods end:

In [13]:
# eofp - "end of focus periods"
eofp = activity_df.loc[pd.isnull(activity_df.focused_work_period_start).astype(int) == 0]
eofp[print_columns]

Unnamed: 0_level_0,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,focused_work_period_start
datetime_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-03-01 05:30:00,1,3,,2020-03-01 05:20:00
2020-03-01 19:25:00,1,1,,2020-03-01 06:05:00
2020-03-01 21:25:00,1,4,,2020-03-01 20:00:00
2020-03-02 03:00:00,2,5,,2020-03-01 22:00:00
2020-03-02 17:00:00,0,0,12510.0,2020-03-02 03:30:00
2020-03-02 23:00:00,0,0,12513.0,2020-03-02 22:20:00
2020-03-03 08:15:00,0,1,,2020-03-03 00:20:00
2020-03-03 23:20:00,0,4,,2020-03-03 08:45:00
2020-03-04 00:45:00,0,2,,2020-03-04 00:05:00
2020-03-04 01:50:00,1,1,,2020-03-04 01:15:00


Then, we can examine the lengs of these focus periods:

In [14]:
eofp.datetime_utc - eofp.focused_work_period_start

datetime_utc
2020-03-01 05:30:00   00:10:00
2020-03-01 19:25:00   13:20:00
2020-03-01 21:25:00   01:25:00
2020-03-02 03:00:00   05:00:00
2020-03-02 17:00:00   13:30:00
2020-03-02 23:00:00   00:40:00
2020-03-03 08:15:00   07:55:00
2020-03-03 23:20:00   14:35:00
2020-03-04 00:45:00   00:40:00
2020-03-04 01:50:00   00:35:00
2020-03-04 04:05:00   01:45:00
2020-03-04 04:40:00   00:05:00
2020-03-04 14:30:00   09:25:00
2020-03-04 18:45:00   02:25:00
2020-03-05 00:00:00   04:50:00
2020-03-05 14:30:00   11:40:00
2020-03-05 19:05:00   01:50:00
2020-03-06 00:00:00   03:55:00
2020-03-06 02:40:00   00:20:00
2020-03-06 05:35:00   01:35:00
2020-03-06 17:00:00   11:00:00
2020-03-07 16:00:00   16:40:00
2020-03-08 04:55:00   10:35:00
dtype: timedelta64[ns]

One example is the 3rd one: it lasts 1 hour and 25 minutes .....
We can look at the time before its beginning: what was the user doing before the focus period began?

In [15]:
interest_period_start = eofp.focused_work_period_start[2]
interest_period_end = eofp.datetime_utc[2]
uninterrupted_time_example = activity_df.loc[(activity_df.datetime_utc >= interest_period_start)
                                             & (activity_df.datetime_utc <= interest_period_end)]
time_before_uninterrupted = uninterrupted_time_example.datetime_utc[0] - dt.timedelta(minutes=25)
activity_df.loc[(activity_df.datetime_utc >= time_before_uninterrupted)
                & (activity_df.datetime_utc <= uninterrupted_time_example.datetime_utc[-1])][print_columns]

Unnamed: 0_level_0,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,focused_work_period_start
datetime_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-03-01 19:35:00,1,5,,NaT
2020-03-01 19:40:00,0,0,,NaT
2020-03-01 19:45:00,0,0,,NaT
2020-03-01 19:50:00,0,0,,NaT
2020-03-01 19:55:00,0,0,,NaT
2020-03-01 20:00:00,0,0,,NaT
2020-03-01 20:05:00,0,0,,NaT
2020-03-01 20:10:00,0,0,,NaT
2020-03-01 20:15:00,0,0,,NaT
2020-03-01 20:20:00,0,0,,NaT


This is an example of a user distracting themselves by sending / receiving Slack messages: in this case, they sent 5 and received one in the period before the distraction. At the end, we can see they started reading / sending messages.

As well, looking at the `focused_work_period_start` column, we see that at `datetime_utc == 2020-03-01 21:25:00`, the focused work period ended. Even though the original interruption was during the 5 minute period starting at `2020-03-01 19:35:00` (UTC), the focused work period does not start until 25 minutes later at `2020-03-01 20:00:00`: this is because 25 minutes are required to refocus.

Looking at the time before this period, what were they doing? Were they having a lot of online conversation?

In [16]:
activity_df.loc[(activity_df.datetime_utc >= time_before_uninterrupted - dt.timedelta(minutes=20))
                & (activity_df.datetime_utc <= time_before_uninterrupted)][print_columns]

Unnamed: 0_level_0,slack_conversation_read_count,slack_user_event_count,google_calendar_event_id,focused_work_period_start
datetime_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-03-01 19:15:00,0,0,,NaT
2020-03-01 19:20:00,0,0,,NaT
2020-03-01 19:25:00,1,1,,2020-03-01 06:05:00
2020-03-01 19:30:00,1,3,,NaT
2020-03-01 19:35:00,1,5,,NaT


We can see that they had about 15 minutes where 3 messages were read and 8 messages sent.

Although this sample data is from the creators of Fulfilled.ai and is not necessarily representative of what a software engineer's schedule or workflows exactly look like; it shows that a model backed by science can at least detect 

Saving the data for future inspection; as the state of the database may change:

In [17]:
activity_df.to_csv('gloria-mark-model-demonstration.csv')