# System Usage


This document is to investigate user interaction behaviour to planning for futre resource allocation and performance 
monitoring to our system.

---

### @Tansinee T.

In [12]:
import pandas as pd
import numpy as np

from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource

In [13]:
#Ignore this slide, it for activate bokeh
output_notebook()

# Write usage

Let's investigate events that impact contents in the database (write event). Write event is componsed of 2 actions
 - **content shared**, to create a new content, grow data size of active content DB.
 - **content removed**, to remove an existing content, the content is moved to archived and size of active content DB becom smaller.
 
We'll focus on data only second-half 2016.

## Data Exploration

In [2]:
write_log = pd.read_csv('data/deskdrop/shared_articles.csv')
write_log['timestamp'] = pd.to_datetime(write_log['timestamp'], unit='s')
write_log.head()

Unnamed: 0,timestamp,eventType,contentId,authorPersonId,authorSessionId,authorUserAgent,authorRegion,authorCountry,contentType,url,title,text,lang
0,2016-03-28 19:19:39,CONTENT REMOVED,-6451309518266745024,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dea...,"Ethereum, a Virtual Currency, Enables Transact...",All of this work is still very early. The firs...,en
1,2016-03-28 19:39:48,CONTENT SHARED,-4110354420726924665,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dea...,"Ethereum, a Virtual Currency, Enables Transact...",All of this work is still very early. The firs...,en
2,2016-03-28 19:42:26,CONTENT SHARED,-7292285110016212249,4340306774493623681,8940341205206233829,,,,HTML,http://cointelegraph.com/news/bitcoin-future-w...,Bitcoin Future: When GBPcoin of Branson Wins O...,The alarm clock wakes me at 8:00 with stream o...,en
3,2016-03-28 19:47:54,CONTENT SHARED,-6151852268067518688,3891637997717104548,-1457532940883382585,,,,HTML,https://cloudplatform.googleblog.com/2016/03/G...,Google Data Center 360° Tour,We're excited to share the Google Data Center ...,en
4,2016-03-28 19:48:17,CONTENT SHARED,2448026894306402386,4340306774493623681,8940341205206233829,,,,HTML,https://bitcoinmagazine.com/articles/ibm-wants...,"IBM Wants to ""Evolve the Internet"" With Blockc...",The Aite Group projects the blockchain market ...,en


In [3]:
(min(write_log['timestamp']), max(write_log['timestamp']))

(Timestamp('2016-03-28 19:19:39'), Timestamp('2017-02-28 18:51:11'))

## Data Movement

Pressure on content-database

In [20]:
write_log['date'] = write_log['timestamp'].apply(pd.datetime.date)
recent_write_log = write_log.loc[
    (write_log['timestamp'] >= '2016-06-01') &
    (write_log['timestamp'] < '2017-01-01')
]

daily_type_count = recent_write_log.groupby(['date', 'eventType'])['timestamp'].count().reset_index()
daily_type_count.rename(columns={'timestamp': 'count'}, inplace=True)

#Convert count to negative on `Content Remove`
daily_type_count['count'] = daily_type_count['count'].where(
    daily_type_count['eventType'] == 'CONTENT SHARED', -daily_type_count['count'])

daily_type_count.head()

Unnamed: 0,date,eventType,count
0,2016-06-01,CONTENT SHARED,27
1,2016-06-02,CONTENT REMOVED,-1
2,2016-06-02,CONTENT SHARED,40
3,2016-06-03,CONTENT SHARED,23
4,2016-06-04,CONTENT SHARED,4


In [32]:
data_load_figure = figure(plot_width=750, plot_height=400, title='Daily Data Movement', 
                            x_axis_type='datetime')

create_events = ColumnDataSource(daily_type_count[daily_type_count['eventType'] == 'CONTENT SHARED'])
data_load_figure.vbar(width=1.0, x='date', top='count', source=create_events,
                     color='blue')

delete_events = ColumnDataSource(daily_type_count[daily_type_count['eventType'] == 'CONTENT REMOVED'])
data_load_figure.vbar(width=1.0, x='date', top='count', source=delete_events,
                     color='red')

In [28]:
show(data_load_figure)

## Descriptive Statistic on Data Movement

In [29]:
create_content = daily_type_count[daily_type_count['eventType'] == 'CONTENT SHARED']

Daily **average** content share 

In [30]:
create_content['count'].mean()

8.846153846153847

Daily **max** content share

In [31]:
create_content['count'].max()

40