# Event occurrence probability prior to SAMPLE_PIP_AND_IMO_PARAMETERS

## 1. Overview

The analysis in this notebook aims to investigate what are the events with more occurrences that occur before one SAMPLE_PIP_AN_IMO_PARAMETERS event.

The answer can help/indicate the if there is a trigger event and to response the follow question: ***"Is there a trigger to start/run the SAMPLE_PIP_AND_IMO_PARAMETERS event?"***.

To this analysis will be used the data frame **BOOK9.DAT** and it consists in capture the unique events that occurs before a sample_pip in an interval of 10 minutes and check the ratio of each one.

## 2. Notebook preparation
### 2.1 Load packages

In [1]:
if (!require("dplyr")) { install.packages("dplyr", lib='/R/library', repos='http://cran.us.r-project.org') }
if (!require("tidyr")) { install.packages("tidyr", lib='/R/library', repos='http://cran.us.r-project.org') }
if (!require("ggplot2")) { install.packages("ggplot2", lib='/R/library', repos='http://cran.us.r-project.org') }
if (!require("lubridate")) { install.packages("lubridate", lib='/R/library', repos='http://cran.us.r-project.org') }

Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: tidyr
Loading required package: ggplot2
Loading required package: lubridate

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date



### 2.2 Load data

Load data from the file with content of just one printer on just one week - called **BOOK9.DAT**.

In [2]:
load('book9.dat')
str(data)

'data.frame':	5316 obs. of  18 variables:
 $ event_RowNumber  : num  1.61e+10 1.61e+10 1.61e+10 1.61e+10 1.61e+10 ...
 $ event_press      : num  4.5e+07 4.5e+07 4.5e+07 4.5e+07 4.5e+07 ...
 $ event_recNum     : num  366826 366827 366829 366828 366830 ...
 $ event_date       : POSIXct, format: "2016-07-13 03:00:00" "2016-07-13 03:00:00" ...
 $ event_time       : num  74740 74741 74847 74847 74849 ...
 $ event_name       : chr  "EVENT_HANDLER_STARTED" "PLUG_TIME_DELTA" "PLUG_TIME_DELTA" "EVENT_HANDLER_STARTED" ...
 $ event_jobid      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ event_sheets     : num  0 0 0 0 0 ...
 $ event_impressions: num  0 0 0 0 0 ...
 $ event_state      : chr  "INIT_STATE" "INIT_STATE" "INIT_STATE" "INIT_STATE" ...
 $ event_mode       : chr  "STANDARD_MODE" "STANDARD_MODE" "STANDARD_MODE" "STANDARD_MODE" ...
 $ event_p1         : chr  "5780" "0" "0" "6232" ...
 $ event_p2         : chr  "Event Log COM Server 1.0" "Old Plug" "Old Plug" "Event Log COM Server 1.0" ...
 $ event_p3

## 3. Organize the data

As there are different data format in this data frame, it is required to manipulate the **event_date** and **event_time** to generate a new column with the content of these two columns into one called **full_time**.

In [3]:
df <- data %>%
    mutate(event_press = as.character(event_press)) %>%
    mutate(event_date = as.character(event_date)) %>%
    mutate(event_date = substr(event_date,1,10)) %>%
    mutate(event_time = as.character(event_time)) %>%
    mutate(event_time = sapply(event_time, function(elem) { ifelse(nchar(elem) == 5,paste('0',elem,sep=""),elem) })) %>%
    mutate(event_time = sapply(event_time, function(elem) { paste(substr(elem,1,2),":",substr(elem,3,4),":",substr(elem,5,6),sep="") })) %>%
    unite(full_time, c(event_date, event_time), sep=" ") %>%
    mutate(full_time = ymd_hms(full_time))

:  1 failed to parse.

## 4. Get the full_time of SAMPLE_PIP_AND_IMO_PARAMETERS events

In [4]:
df2 <- df %>%
    filter(event_name == "SAMPLE_PIP_AND_IMO_PARAMETERS") %>%
    select(full_time)
fullTimes <- df2$full_time

## 5. Get the prior events based in the SAMPLE_PIP events

Based in the SAMPLE_PIP_AND_IMO_PARAMENTERS events, we will get the previous events that have occured in the **ten minutes (10)** before each sample_pip event. Observation: Unique event is being registered here.  

In [5]:
dict <- list()
test <- ""

for(fullTime in fullTimes) {
    t1 <- fullTime - dminutes(10)
    t2 <- fullTime
    
    temp <- df %>%
        filter(full_time >= t1 & full_time <= t2) %>%
        select(event_name) %>%
        distinct()
    
    for(event_name in temp$event_name) {
        #print (event_name)
        test <- append(test, event_name)
    }
}

## 6. Calculate the event ratio

Calculate the event ratio based in the occurrences of SAMPLE_PIP_AND_IMO_PARAMETERS and show the top 10.

In [6]:
event_occu <- sort(table(test), decreasing = TRUE)
event_occu <- as.data.frame(event_occu)

In [7]:
sample_occu <- event_occu[(event_occu$test=='SAMPLE_PIP_AND_IMO_PARAMETERS'), ]$Freq

In [9]:
event_occu["ratio"] <- NA
event_occu$ratio <- event_occu$Freq/sample_occu
head(event_occu, 11)

Unnamed: 0,test,Freq,ratio
1,SAMPLE_PIP_AND_IMO_PARAMETERS,70,1.0
2,MIC_CD_ADDING_NOTIFY,44,0.628571428571429
3,PCN_GAP_CURRENT_SAMPLE,40,0.571428571428571
4,PM_ENGINE_RELATED_COUNTERS,32,0.457142857142857
5,PM_JOB_STATISTICS,32,0.457142857142857
6,PRINT_JOB_STATISTICS,29,0.414285714285714
7,MIC_WATER_HEATER_FAILURE,27,0.385714285714286
8,PRINT_JOB_INTERRUPT,26,0.371428571428571
9,MTC_REPORT_HUMIDITY_SENSORS,24,0.342857142857143
10,PM_ABNORMAL_PRINT_TERMINAT,24,0.342857142857143


## 7. Conclusion

Based in the result, the event "MIC_CD_ADDING_NOTIFY" is appearing in approximately **63%** and the event "PCN_GAP_CURRENT_SAMPLE" around **57%** in the interval of **ten (10) minutes** that prior the sample_pip event.

Checking the available data, there isn't an event that we can consider as the trigger to start/run the SAMPLE_PIP, but as the data set is extremelly low (just one printer for just one week) so any conclusion here cannot be applied for any other printer model, family or firmware. 