## Periodic pattern mining on canadian TV logs
<img src="skmine_series.png" alt="logo" style="width: 60%;"/>

### The problem, informally
Let's take a simple example. 

Imagine you set an alarm to wake up every day around 7:30AM, and go to work. Sometimes you wake up a bit earlier (your body anticipates on the alarm), and sometimes a bit later, for example if you press the "snooze" button and refuse to face the fact that you have to wake up.

In python we can load those "wake up" events as logs, and store them in a [pandas.Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html), like

In [1]:
import datetime as dt
import pandas as pd
one_day = 60 * 24  # a day in minutes
minutes = [0, one_day - 1, one_day * 2 - 1, one_day * 3, one_day * 4 + 2, one_day * 7]

S = pd.Series("wake up", index=minutes)
start = dt.datetime.strptime("16/04/2020 07:30", "%d/%m/%Y %H:%M")
S.index = S.index.map(lambda e: start + dt.timedelta(minutes=e))
S.index = S.index.round("min")  # minutes as the lowest unit of difference
S

2020-04-16 07:30:00    wake up
2020-04-17 07:29:00    wake up
2020-04-18 07:29:00    wake up
2020-04-19 07:30:00    wake up
2020-04-20 07:32:00    wake up
2020-04-23 07:30:00    wake up
dtype: object

In [2]:

import numpy as np
sorted([5,9,31,2,3]), np.sort([5,9,31,2,3])

([2, 3, 5, 9, 31], array([ 2,  3,  5,  9, 31]))

In [3]:
S.index.to_numpy()

array(['2020-04-16T07:30:00.000000000', '2020-04-17T07:29:00.000000000',
       '2020-04-18T07:29:00.000000000', '2020-04-19T07:30:00.000000000',
       '2020-04-20T07:32:00.000000000', '2020-04-23T07:30:00.000000000'],
      dtype='datetime64[ns]')

We can see the wake-up time is not exactly the same every day, but overall a consistent regular pattern seems to emerge.

Now imagine that in addition to wake up times, we also have records of other daily activities (meals, work, household chores, etc.), and that rather than a handful of days, those records span several years and make up several thousands of events

**How would you be able to detect regularities in the data ?**

### Introduction to periodic pattern mining
Periodic pattern mining aims at exploiting regularities not only about `what happens` by finding coordinated event occurrences, but also about `when it happens` and `how it happens`, by **finding consistent inter-occurrence timeintervals**.

Next, we introduce the concept of cycles

#### The cycle : a building block for periodic pattern mining
Here is an explicit example of a cycle

<img src="cycle_color.png" alt="cycle" style="width: 60%;"/>

This definition, while being relatively simple, is general enough to allow us to find regularities in different types of logs

#### Handling noise in our timestamps

Needless to say, it would be too easy if events in our data were equally spaced. As data often comes noisy, we have to be fault tolerant, and allow small errors to sneak into our cycles. 

That's the role of `shift corrections`, which capture the small deviations from perfectly regular periodic repetitions, and allow to reconstruct the (noisy) original sequence of events, using the following relation
<img src="shifts.png" alt="shifts" style="width: 60%;"/>

#### A tiny example with scikit-mine
`scikit-mine` offers a `PeriodicCycleMiner`, out of the box.
You can use it to **detect regularities, in the form of cycles**, in the input data. 

These regularities are submitted to an MDL criterion, so that we do not mistakenly include redundant occurences, nor forget to consider other intervals that would sumarize our data in a better way.

MDL offers a framework to find `the best set of cycles`, i.e the set that gives the most succint representation of the data. And `as humans, we often like to deal with non-redundant, well organized data`.

In [4]:
from skmine.periodic_esther import PeriodicCycleMiner
pcm = PeriodicCycleMiner().fit(S)
pcm.discover()

  INDEX_TYPES = (pd.DatetimeIndex, pd.RangeIndex, pd.Int64Index,)
  pd.Int64Index,


Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-04-16 07:30:00,5,1 days 00:00:30,67.885186,"{(158762700, 0)}",[wake up],"[-90000000000, -30000000000, 30000000000, 9000..."


You can see one cycle has been extracted for our event `wake up`. The cycle covers the entire business week, but not the last monday separated by the weekend

It has a length of 5 and a period close to 1 day, as expected.

Also, note that we "lost" some information here. Our period of 1 day offers the best summary for this data.
Accessing the little "shifts" as encountered in original data is also possible, with an extra argument in our `.discover` call

In [5]:
pcm.discover()

Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-04-16 07:30:00,5,1 days 00:00:30,67.885186,"{(158762700, 0)}",[wake up],"[-90000000000, -30000000000, 30000000000, 9000..."


The last column named `dE` contains a list of shifts to apply to our cycle in case we want to reconstruct the original data. Trailing zeros have been removed for efficiency, and their values are `relative to the period`, but we can see there is:
 * a -90 second shift between the 1st and 2nd entry (1day30s - 90s later = waking up at 7:29 on tuesday)
 * a 30 second shift between the 2nd and 3rd entry (1day30s - 30s later = still waking up at 7:29 on wednesday)
 * an 30 second shift between the 3rd and 4th entry (back to 7:30 on thursday)
 * an 90 second shift between the 4th and 5th entry (1day 30s + 90s later = waking up at 7:32 on friday)

Also note that we can get the "uncovered" events, called `redisuals`

In [6]:
pcm.get_residuals()

0    {(2020-04-23 07:30:00, 0)}
dtype: object

This way `pcm` does not store all the data, but has all information needed to reconstruct it entirely !!

In [7]:
pcm.reconstruct()

Unnamed: 0,occs
0,"[1587022200000000000, 1587108540000000000, 158..."


#### **ESTHER CODE INTEGRATION TEST**

In [8]:
from skmine.periodic_esther import PeriodicCycleMiner
import pandas as pd


In [9]:
pcm = PeriodicCycleMiner().fit(S)
pcm.discover()

Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-04-16 07:30:00,5,1 days 00:00:30,67.885186,"{(158762700, 0)}",[wake up],"[-90000000000, -30000000000, 30000000000, 9000..."


You can see one cycle has been extracted for our event `wake up`. The cycle covers the entire business week, but not the last monday separated by the weekend

It has a length of 5 and a period close to 1 day, as expected.

Also, note that we "lost" some information here. Our period of 1 day offers the best summary for this data.
Accessing the little "shifts" as encountered in original data is also possible, with an extra argument in our `.discover` call

In [10]:
pcm.discover()

Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-04-16 07:30:00,5,1 days 00:00:30,67.885186,"{(158762700, 0)}",[wake up],"[-90000000000, -30000000000, 30000000000, 9000..."


The last column named `dE` contains a list of shifts to apply to our cycle in case we want to reconstruct the original data. Trailing zeros have been removed for efficiency, and their values are `relative to the period`, but we can see there is:
 * a -90 second shift between the 1st and 2nd entry (1day30s - 90s later = waking up at 7:29 on tuesday)
 * a 30 second shift between the 2nd and 3rd entry (1day30s - 30s later = still waking up at 7:29 on wednesday)
 * an 30 second shift between the 3rd and 4th entry (back to 7:30 on thursday)
 * an 90 second shift between the 4th and 5th entry (1day 30s + 90s later = waking up at 7:32 on friday)

Also note that we can get the "uncovered" events, called `redisuals`

In [11]:
pcm.get_residuals()

0    {(2020-04-23 07:30:00, 0)}
dtype: object

This way `pcm` does not store all the data, but has all information needed to reconstruct it entirely !!

In [12]:
pcm.reconstruct()

Unnamed: 0,occs
0,"[1587022200000000000, 1587108540000000000, 158..."


### An example with Canadian TV programs
#### Fetching logs from canadian TV

In this section we are going to load some event logs of TV programs (the `WHAT`), indexed by their broadcast timestamps (the `WHEN`).

`PeriodicCycleMiner` is here to help us discovering regularities (the `HOW`)

In [13]:
from skmine.datasets import fetch_canadian_tv
from skmine.periodic import PeriodicCycleMiner

#### Searching for cycles in TV programs

Remember about the definition of cycles ?
Let's apply it to our TV programs

In our case

* $\alpha$ is the name of a TV program

* $r$ is the number of broadcasts (repetitions) for this TV program (inside this cycle)

* $p$ is the optimal time delta between broadcasts in this cycle. If a program is meant to be live everyday at 14:00PM, then $p$ is likely to be `1 day`

* $\tau$ is the first broadcast time in this cycle

* $dE$ are the shift corrections between the $p$ and the actual broadcast time of an event. If a TV program was scheduled at 8:30:00AM and it went on air at 8:30:23AM the same day, then we keep track of a `23 seconds shift`. This way we can summarize our data (via cycles), and reconstruct it (via shift corrections). 


Finally we are going to dig a little deeper into these cycles, to answer quite complex questions about our logs. We will see that cycles contains usefull information about our input data

In [14]:
ctv_logs = fetch_canadian_tv()
ctv_logs.head()



  s = pd.read_csv(p, **kwargs)


timestamp
2020-08-01 06:00:00            The Moblees
2020-08-01 06:11:00    Big Block Sing Song
2020-08-01 06:13:00    Big Block Sing Song
2020-08-01 06:15:00               CBC Kids
2020-08-01 06:15:00               CBC Kids
Name: canadian_tv, dtype: string

In [15]:
pcm = PeriodicCycleMiner(keep_residuals=True).fit(ctv_logs)
pcm.discover()



# %snakeviz -t pcm.fit(ctv_logs)

                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty
                the model is left empty


Unnamed: 0,Unnamed: 1,start,length,period,cost
A Kandahar Away,1,2020-08-02 06:00:00,4,7 days 00:00:00,49.063694
A Kandahar Away,0,2020-08-03 07:11:00,4,1 days 00:00:00,50.757989
Absolutely Toronto,1,2020-08-01 11:00:00,4,7 days 00:00:00,49.260768
Absolutely Toronto,0,2020-08-03 09:48:00,4,1 days 00:00:00,62.851081
Across the Line,0,2020-08-03 07:30:00,4,1 days 00:00:00,50.645595
...,...,...,...,...,...
Jackie Robinson,4,2020-08-18 00:30:00,4,0 days 00:30:00,50.805869
Jackie Robinson,5,2020-08-25 02:00:00,4,0 days 00:30:00,50.805869
Jamie & Jimmy's Food Fight Club,0,2020-08-02 07:36:00,5,7 days 00:00:00,46.509879
Jamie's 15 Minute Meals,0,2020-08-11 09:12:00,3,6 days 23:59:30,53.459446


`Note` : no need to worry for the warning, it's here to notify duplicate event/timestamp pairs have been found

In [16]:
cycles = pcm.discover()
cycles

Unnamed: 0,Unnamed: 1,start,length,period,cost
A Kandahar Away,1,2020-08-02 06:00:00,4,7 days 00:00:00,49.063694
A Kandahar Away,0,2020-08-03 07:11:00,4,1 days 00:00:00,50.757989
Absolutely Toronto,1,2020-08-01 11:00:00,4,7 days 00:00:00,49.260768
Absolutely Toronto,0,2020-08-03 09:48:00,4,1 days 00:00:00,62.851081
Across the Line,0,2020-08-03 07:30:00,4,1 days 00:00:00,50.645595
...,...,...,...,...,...
Jackie Robinson,4,2020-08-18 00:30:00,4,0 days 00:30:00,50.805869
Jackie Robinson,5,2020-08-25 02:00:00,4,0 days 00:30:00,50.805869
Jamie & Jimmy's Food Fight Club,0,2020-08-02 07:36:00,5,7 days 00:00:00,46.509879
Jamie's 15 Minute Meals,0,2020-08-11 09:12:00,3,6 days 23:59:30,53.459446


The resulting dataframe has a [MultIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html).

The first level is the event name, the second level corresponds to the cycle number, as we can detect multiple cycles for the same event

Now that we have our cycles in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), we can play with the pandas API and answer questions about our logs

#### Did I find cycles for the TV show "Arthurt Shorts"

In [17]:
cycles.loc["Arthur Shorts"]

Unnamed: 0,start,length,period,cost
0,2020-08-03 23:00:00,4,1 days,50.297673


#### What are the top 10 longest cycles ?

In [18]:
cycles.nlargest(10, ["length"])

Unnamed: 0,Unnamed: 1,start,length,period,cost
Dr. Seuss' The Lorax,0,2020-08-06 00:30:00,7,0 days 00:30:00,54.987582
Big Block Sing Song,0,2020-08-02 08:57:00,5,7 days 00:00:00,34.509879
Bondi Vet,0,2020-08-01 07:40:00,5,7 days 00:00:00,46.509879
CBC Arts: Exhibitionists,0,2020-08-02 07:30:00,5,7 days 00:00:00,34.509879
CBC Winnipeg Comedy Festival,0,2020-08-02 07:34:00,5,7 days 00:00:00,46.509879
Ethiopian Musicians,0,2020-08-10 08:59:00,5,1 days 00:00:00,57.686133
In the Making,0,2020-08-01 11:48:00,5,7 days 00:00:00,40.509879
Interrupt This Program,0,2020-08-02 11:00:00,5,7 days 00:00:00,34.509879
Jamie & Jimmy's Food Fight Club,0,2020-08-02 07:36:00,5,7 days 00:00:00,46.509879
Jamie's Super Foods,0,2020-08-03 16:00:00,5,1 days 00:00:00,47.171713


#### what are the 10 most unpunctual TV programs ?
For this we are going to :
 1. extract the shift corrections along with other informations about our cycles
 2. compute the sum of the absolute values for the shift corrections, for every cycles
 3. get the 10 biggest sums

In [19]:
full_cycles = pcm.discover(shifts=True)
full_cycles.head()

Unnamed: 0,Unnamed: 1,start,length,period,cost,dE
A Kandahar Away,1,2020-08-02 06:00:00,4,7 days,49.063694,"[0, 0, 0]"
A Kandahar Away,0,2020-08-03 07:11:00,4,1 days,50.757989,"[0, 0, 0]"
Absolutely Toronto,1,2020-08-01 11:00:00,4,7 days,49.260768,"[0, 0, 0]"
Absolutely Toronto,0,2020-08-03 09:48:00,4,1 days,62.851081,"[0, -60000000000, 60000000000]"
Across the Line,0,2020-08-03 07:30:00,4,1 days,50.645595,"[0, 0, 0]"


In [20]:
def absolute_sum(*args):
    return sum(map(abs, *args))

# level 0 is the name of the TV program
shift_sums = full_cycles["dE"].map(absolute_sum).groupby(level=[0]).sum()
shift_sums.nlargest(10)

Frankie Drake Mysteries         1200000000000
CBC News: The National           300000000000
Anne with an E                   240000000000
Daniel Tiger's Neighbourhood     240000000000
Grand Designs                    240000000000
Burden of Truth                  180000000000
Ethiopian Musicians              180000000000
Absolutely Toronto               120000000000
Bondi Vet                        120000000000
CBC Winnipeg Comedy Festival     120000000000
Name: dE, dtype: int64

#### What TV programs have been broadcasted every day for at least 5 days straight?
Let's make use of the [pandas.DataFrame.query](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html) method to express our question in an SQL-like syntax

In [21]:
cycles.query('length >= 5 and period >= "1 days"', engine='python')

Unnamed: 0,Unnamed: 1,start,length,period,cost
Big Block Sing Song,0,2020-08-02 08:57:00,5,7 days,34.509879
Bondi Vet,0,2020-08-01 07:40:00,5,7 days,46.509879
CBC Arts: Exhibitionists,0,2020-08-02 07:30:00,5,7 days,34.509879
CBC Winnipeg Comedy Festival,0,2020-08-02 07:34:00,5,7 days,46.509879
Ethiopian Musicians,0,2020-08-10 08:59:00,5,1 days,57.686133
In the Making,0,2020-08-01 11:48:00,5,7 days,40.509879
Interrupt This Program,0,2020-08-02 11:00:00,5,7 days,34.509879
Jamie & Jimmy's Food Fight Club,0,2020-08-02 07:36:00,5,7 days,46.509879
Jamie's Super Foods,0,2020-08-03 16:00:00,5,1 days,47.171713


#### What TV programs are broadcast only on business days ?
From the previous query we see we have a lot of 5-length cycles, with periods of 1 day.
An intuition is that these cycles take place on business days. Let's confirm this by considering cycles with
 1. start timestamps on mondays
 2. periods of roughly 1 day  

In [22]:
monday_starts = cycles[cycles.start.dt.weekday == 0]  # start on monday
monday_starts.query('length == 5 and period >= "1 days"', engine='python')

Unnamed: 0,Unnamed: 1,start,length,period,cost
Ethiopian Musicians,0,2020-08-10 08:59:00,5,1 days,57.686133
Jamie's Super Foods,0,2020-08-03 16:00:00,5,1 days,47.171713


### **ESTHER CODE INTEGRATION TEST** with Canadian TV programs
#### Fetching logs from canadian TV

In this section we are going to load some event logs of TV programs (the `WHAT`), indexed by their broadcast timestamps (the `WHEN`).

`PeriodicCycleMiner` is here to help us discovering regularities (the `HOW`)

In [23]:
from skmine.datasets import fetch_canadian_tv
from skmine.periodic_esther import PeriodicCycleMiner

#### Searching for cycles in TV programs

Remember about the definition of cycles ?
Let's apply it to our TV programs

In our case

* $\alpha$ is the name of a TV program

* $r$ is the number of broadcasts (repetitions) for this TV program (inside this cycle)

* $p$ is the optimal time delta between broadcasts in this cycle. If a program is meant to be live everyday at 14:00PM, then $p$ is likely to be `1 day`

* $\tau$ is the first broadcast time in this cycle

* $dE$ are the shift corrections between the $p$ and the actual broadcast time of an event. If a TV program was scheduled at 8:30:00AM and it went on air at 8:30:23AM the same day, then we keep track of a `23 seconds shift`. This way we can summarize our data (via cycles), and reconstruct it (via shift corrections). 


Finally we are going to dig a little deeper into these cycles, to answer quite complex questions about our logs. We will see that cycles contains usefull information about our input data

In [24]:
ctv_logs = fetch_canadian_tv()
ctv_logs.head()



  s = pd.read_csv(p, **kwargs)


timestamp
2020-08-01 06:00:00            The Moblees
2020-08-01 06:11:00    Big Block Sing Song
2020-08-01 06:13:00    Big Block Sing Song
2020-08-01 06:15:00               CBC Kids
2020-08-01 06:15:00               CBC Kids
Name: canadian_tv, dtype: string

Compute only simple cycles on ctv_logs with the karg *complex=False* : 

In [35]:
pcm = PeriodicCycleMiner().fit(ctv_logs, complex=False)
pcm.discover()



Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-08-02 06:00:00,5,7 days 00:00:00,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
1,2020-08-03 07:11:00,5,7 days 00:00:00,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
2,2020-08-07 07:11:00,5,6 days 00:00:00,55.275162,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
3,2020-08-03 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
4,2020-08-10 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
...,...,...,...,...,...,...,...
2021,2020-08-02 07:36:00,3,14 days 00:00:00,51.030045,"{(159704982, 15), (159653370, 6), (159628680, ...",[Thrillusionists],"[0, 0]"
2022,2020-08-02 07:36:00,5,7 days 00:00:00,66.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Thrillusionists],"[60000000000, -60000000000, 0, 0]"
2023,2020-08-11 09:12:00,3,6 days 23:59:30,59.565225,"{(159704982, 15), (159653370, 6), (159628680, ...",[True and the Rainbow Kingdom],"[30000000000, -30000000000]"
2024,2020-08-03 16:00:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Vet on the Hill],"[0, 0, 0, 0]"


Compute simple and complex (with horizontal and vertical combinations) cycles on ctv_logs : 

In [25]:
pcm = PeriodicCycleMiner().fit(ctv_logs)
pcm.discover()

# %snakeviz -t pcm.fit(ctv_logs)



Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-08-02 06:00:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
1,2020-08-03 07:11:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
2,2020-08-07 07:11:00,5,6 days,55.275162,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
3,2020-08-03 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
4,2020-08-10 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
...,...,...,...,...,...,...,...
7379,2020-08-03 10:31:00,68,1 days,365.607172,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC Kids, Metro Morning, CBC Kids, Beat Bugs,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7380,2020-08-03 19:30:00,68,1 days,339.979759,"{(159704982, 15), (159653370, 6), (159628680, ...","[Just For Laughs: Gags, Metro Morning, CBC Kid...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7381,2020-08-03 13:00:00,68,1 days,339.630775,"{(159704982, 15), (159653370, 6), (159628680, ...","[Murdoch Mysteries, Metro Morning, CBC Kids, B...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7382,2020-08-03 17:59:00,68,1 days,339.387772,"{(159704982, 15), (159653370, 6), (159628680, ...","[News, Metro Morning, CBC Kids, Beat Bugs, CBC...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


`Note` : no need to worry for the warning, it's here to notify duplicate event/timestamp pairs have been found

Now that we have our cycles in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), we can play with the pandas API and answer questions about our logs

#### Did I find cycles for the TV show "Arthurt Shorts"

In [27]:
cycles[cycles["event"].apply(lambda x: "Arthur Shorts" in x)]

Unnamed: 0,start,length,period,cost,residuals,event,dE
37,2020-08-01 11:00:00,5,7 days 00:00:00,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Arthur Shorts],"[0, 0, 0, 0]"
38,2020-08-17 09:48:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Arthur Shorts],"[0, 0, 0, 0]"
39,2020-08-10 09:47:00,5,1 days 00:00:00,63.235114,"{(159704982, 15), (159653370, 6), (159628680, ...",[Arthur Shorts],"[0, 0, 60000000000, 0]"
40,2020-08-03 09:48:00,5,7 days 00:00:00,66.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Arthur Shorts],"[-60000000000, 60000000000, 0, 0]"
41,2020-08-07 09:48:00,5,6 days 00:00:00,67.275162,"{(159704982, 15), (159653370, 6), (159628680, ...",[Arthur Shorts],"[0, 0, -60000000000, 60000000000]"
...,...,...,...,...,...,...,...
7247,2020-08-03 09:48:00,33,6 days 23:59:00,230.020370,"{(159704982, 15), (159653370, 6), (159628680, ...","[Arthur Shorts, CBC Kids, CBC Kids]","[0, 0, 0, -60000000000, 0, 60000000000, 0, 0, ..."
7248,2020-08-03 08:05:00,33,6 days 23:59:00,230.052883,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC Kids, Arthur Shorts, CBC Kids]","[0, 0, 0, 0, -60000000000, 0, 60000000000, 0, ..."
7249,2020-08-03 09:48:00,36,6 days 23:59:00,251.696441,"{(159704982, 15), (159653370, 6), (159628680, ...","[Arthur Shorts, CBC Kids, CBC Kids, Big Block ...","[0, 0, 0, -60000000000, 0, 60000000000, 0, 0, ..."
7252,2020-08-03 09:48:00,36,6 days 23:59:00,257.696372,"{(159704982, 15), (159653370, 6), (159628680, ...","[Arthur Shorts, CBC Kids, Big Block Sing Song,...","[0, 0, 0, -60000000000, 0, 60000000000, 0, 0, ..."


#### What are the top 10 longest cycles ?

In [28]:
cycles.nlargest(10, ["length"])

Unnamed: 0,start,length,period,cost,residuals,event,dE
6162,2020-08-03 04:00:00,144,7 days,678.614626,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC News: The National, Addison, Beat Bugs, C...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
6161,2020-08-02 06:11:00,140,7 days,642.859808,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC Kids, CBC News: The National, Addison, Be...","[0, -60000000000, 60000000000, 0, 0, 0, 0, 0, ..."
6159,2020-08-03 04:00:00,132,7 days,623.550508,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC News: The National, Addison, Beat Bugs, C...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
6163,2020-08-03 04:00:00,132,7 days,623.550508,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC News: The National, Addison, Beat Bugs, C...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
6164,2020-08-03 04:00:00,132,7 days,617.462896,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC News: The National, Addison, Beat Bugs, C...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
6165,2020-08-03 04:00:00,132,7 days,617.504813,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC News: The National, Addison, Beat Bugs, C...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
5956,2020-08-01 06:11:00,128,7 days,585.402179,"{(159704982, 15), (159653370, 6), (159628680, ...","[Big Block Sing Song, CBC News: The National, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
5957,2020-08-01 06:13:00,128,7 days,585.400303,"{(159704982, 15), (159653370, 6), (159628680, ...","[Big Block Sing Song, CBC News: The National, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
5958,2020-08-01 07:55:00,128,7 days,585.304144,"{(159704982, 15), (159653370, 6), (159628680, ...","[Big Block Sing Song, CBC News: The National, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
5973,2020-08-01 06:15:00,128,7 days,581.925674,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC Kids, CBC News: The National, Addison, Be...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


#### what are the 10 most unpunctual TV programs ?
For this we are going to :
 1. extract the shift corrections along with other informations about our cycles
 2. compute the sum of the absolute values for the shift corrections, for every cycles
 3. get the 10 biggest sums

In [30]:
full_cycles = pcm.discover()
full_cycles.head()

Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-08-02 06:00:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
1,2020-08-03 07:11:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
2,2020-08-07 07:11:00,5,6 days,55.275162,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
3,2020-08-03 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
4,2020-08-10 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"


In [31]:
def absolute_sum(*args):
    return sum(map(abs, *args))

# level 0 is the name of the TV program
shift_sums = full_cycles["dE"].map(absolute_sum).groupby(level=[0]).sum()
shift_sums.nlargest(10)

4505    720000000000
5057    720000000000
5058    720000000000
464     660000000000
205     600000000000
5574    600000000000
5716    600000000000
5722    600000000000
6017    600000000000
6180    600000000000
Name: dE, dtype: int64

#### What TV programs have been broadcasted every day for at least 5 days straight?
Let's make use of the [pandas.DataFrame.query](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html) method to express our question in an SQL-like syntax

In [32]:
cycles.query('length >= 5 and period >= "1 days"', engine='python')

Unnamed: 0,start,length,period,cost,residuals,event,dE
0,2020-08-02 06:00:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
1,2020-08-03 07:11:00,5,7 days,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
2,2020-08-07 07:11:00,5,6 days,55.275162,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
3,2020-08-03 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
4,2020-08-10 07:11:00,5,1 days,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
...,...,...,...,...,...,...,...
7379,2020-08-03 10:31:00,68,1 days,365.607172,"{(159704982, 15), (159653370, 6), (159628680, ...","[CBC Kids, Metro Morning, CBC Kids, Beat Bugs,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7380,2020-08-03 19:30:00,68,1 days,339.979759,"{(159704982, 15), (159653370, 6), (159628680, ...","[Just For Laughs: Gags, Metro Morning, CBC Kid...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7381,2020-08-03 13:00:00,68,1 days,339.630775,"{(159704982, 15), (159653370, 6), (159628680, ...","[Murdoch Mysteries, Metro Morning, CBC Kids, B...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
7382,2020-08-03 17:59:00,68,1 days,339.387772,"{(159704982, 15), (159653370, 6), (159628680, ...","[News, Metro Morning, CBC Kids, Beat Bugs, CBC...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


#### What TV programs are broadcast only on business days ?
From the previous query we see we have a lot of 5-length cycles, with periods of 1 day.
An intuition is that these cycles take place on business days. Let's confirm this by considering cycles with
 1. start timestamps on mondays
 2. periods of roughly 1 day  

In [33]:
monday_starts = cycles[cycles.start.dt.weekday == 0]  # start on monday
monday_starts.query('length == 5 and period >= "1 days"', engine='python')

Unnamed: 0,start,length,period,cost,residuals,event,dE
1,2020-08-03 07:11:00,5,7 days 00:00:00,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
3,2020-08-03 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
4,2020-08-10 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
5,2020-08-17 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
6,2020-08-24 07:11:00,5,1 days 00:00:00,57.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Addison],"[0, 0, 0, 0]"
...,...,...,...,...,...,...,...
1880,2020-08-03 08:18:00,5,7 days 00:00:30,72.029735,"{(159704982, 15), (159653370, 6), (159628680, ...",[PJ Masks],"[30000000000, -30000000000, 30000000000, -9000..."
1881,2020-08-10 08:19:00,5,1 days 00:00:30,75.235179,"{(159704982, 15), (159653370, 6), (159628680, ...",[PJ Masks],"[30000000000, 30000000000, -30000000000, -9000..."
1928,2020-08-17 08:44:00,5,1 days 00:00:00,93.235173,"{(159704982, 15), (159653370, 6), (159628680, ...",[Rusty Rivets],"[60000000000, 120000000000, -120000000000, -60..."
1981,2020-08-03 03:30:00,5,7 days 00:00:00,54.030035,"{(159704982, 15), (159653370, 6), (159628680, ...",[This Hour Has 22 Minutes],"[0, 0, 0, 0]"


References
----------

1.
    Galbrun, E & Cellier, P & Tatti, N & Termier, A & Crémilleux, B
    "Mining Periodic Pattern with a MDL Criterion"

2.
    Galbrun, E
    "The Minimum Description Length Principle for Pattern Mining : A survey"

3. 
    Termier, A
    ["Periodic pattern mining"](http://people.irisa.fr/Alexandre.Termier/dmv/DMV_Periodic_patterns.pdf) 