# 05 Time series Analysis

If you understand the trends of a business, you can identify its faults and plan for improvements.

The analysis and understanding of time series is an important skill.

However we must understand first how to utilize python's native date and time module, __datetime__.

## The datetime module

Our calendar is hard to translate into python code. Leap years, leap seconds, weekends, days of the week are concepts we are familiar with, because it is part of our culture. However, programmatically, we have a linear notion of time. Fortunately, everything we need to analyse temporal trends is present in the datetime module (python standard).

With business trends, you have to remember three things when analysing data:

1. The human being is a creature of habits.
1. [The Law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers) tells us that with enough data points you will see results approximating its expected value. Events with low probability will start to show up in your data, and events with high probability will beahve as expected.
1. Disruptive events destroy 1 and 2.

In some business models, the glue that holds an analysis or model together is the ability to consistently track the passage of time, independently of where you are in the world (timezone).

In [None]:
import datetime
dir(datetime)

Notice the date, time, datetime and timedelta classes:

* __date__ was python's original class to handle days (calendar)
* __time__ was python's original class to handle time (hours, minutes, seconds, ms, etc...)
* __datetime__ is the amalgamation of the prvious classes. It now holds everything needed to represent temporal informaiton.
* __timedelta__ is a special class to perform temporal additions and subtractions.

In [None]:
help(datetime)

[Launch date of the best music album of all time](https://en.wikipedia.org/wiki/Second_Toughest_in_the_Infants)

In [None]:
sci = datetime.datetime(1996, 3, 11)

In [None]:
sci

In [None]:
print(sci)

In [None]:
print(sci.day, sci.year, sci.month)

In [None]:
sci.weekday() #0 -> Monday, 6 -> Sunday

<div class="alert alert-info"> 
    <br>
    <b>Exercise: In which day of the week were you born?</b>   
    <br>
    <br>
</div>

In [None]:
bday = datetime.datetime(2022, 2, 13)
bday.weekday()

---

## Format, format, FORMAT

A datetime value can have many representations. Remember that different parts of the world use the calendar in a different manner.

By default, datetimes are printed __yyyy-mm-dd hh:mm:ss__

If you start your file names with such a format, chronological order will match alphabetical order, releasing you from some overhead for organisation.

__IMPORTANT__: the datetime object stores the information at all time. The format is just a way of printing the datetime object information.

In [None]:
sci.strftime(format="%y-%m")

In [None]:
sci.strftime(format="In %Y during %B, SCI was released on a %A")

[Please check the __directives__ here](https://docs.python.org/3/library/datetime.html)

__Results may vary due to "locale"__!

<div class="alert alert-info"> 
    <br>
    <b>Exercise: Write a small output for a special date with strftime. Something like "On August 6, 2005, I saw Underworld and Fatboy Slim at Sudoeste".</b>   
    <br>
    <br>
</div>

---

## Right about now

What day is today?

In [None]:
import datetime

sci = datetime.datetime(1996, 3, 11)

In [None]:
datetime.datetime.now()

In [None]:
datetime.datetime.now().strftime(format="Today is %A.")

The ```now``` method allows you to read computer time and create objects with information about "where in time" your code is running. How long ago was the music album released?

In [None]:
datetime.datetime.now() + sci

You can __subtract__ datetime objects. When you perform a subtraction on __datetime__ objects you obtain a __timedelta__ object. However, __additions__ are not defined.

<div class="alert alert-info"> 
    <br>
    <b>Exercise: How old are you in days?</b>   
    <br>
    <br>
</div>

In [None]:
datetime.datetime.now() - datetime.datetime(2000, 1, 1)

---

## The timedelta object

Due to the culture we live in, time has become something so ingrained in our lines of thought we may find it hard to convert it into computer code.

If it's January 31st, in 10 days it will not be January 41st. If it's 23:00, in two hours it will already be a new day, but probably not in some other time zones.

When you define a timedelta object, you need to specify the interval.

In [None]:
wk_interval = datetime.timedelta(days=7)

In [None]:
today = datetime.datetime.now()

In [None]:
today - wk_interval

In [None]:
today + wk_interval

In [None]:
hour_interval = datetime.timedelta(hours=1)
hour_interval

In [None]:
wk_interval + hour_interval

In short:

| Object 1 | Object 2 | Result of subtraction | Result of addition|
|---|---|---|---|
| datetime | datetime | timedelta | not defined |
| datetime | timedelta | datetime | datetime |
| timedelta | timedelta | timedelta | timedelta |

__Remember__: You can't add January 2nd to February 8th. __Also__, the _timedelta_ object does not have the same methods as datetime. In fact, it has very little support.

<div class="alert alert-info"> 
    <br>
    <b>In 137 days from now, what day of the week is it going to be?</b>   
    <br>
    <br>
</div>

In [None]:
datetime.datetime.now() + datetime.timedelta(days=137)

---

## Why does this matters?

Most data warehouses at a company's data lake could be managed by teams from different subcontractors. The different teams may have different standards when it comes to store chronological information. Let's assume we have two KPIs we want to monitor, from two different tables. The first is the revenue of online sales for the ficticional mobile app __Astronomy (A)__ and the second is the revenue of online sales for the also ficticional mobile app __Barbecuer (B)__ (files are in the '../Files/' directory). You know the sales figures are related to the same time period.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Astronomy

In [None]:
astro = pd.read_csv('../Files/Astronomy_sales.csv')

In [None]:
astro.head()

In [None]:
astro.plot(x='date', y='sales', figsize=(15,6))
plt.show()

In [None]:
astro.dtypes

__The dates are not in order!__ We are just plotting the values by the order they are stored in the file. Let's convert the "dates" column to datetime via pandas:

In [None]:
astro['date'] = pd.to_datetime(astro['date'])

In [None]:
astro.plot(x='date', y='sales', figsize=(15,6))
plt.show()

If you pass a datetime object to pandas, pandas knows under the hood how to handle it. For instance, in plots, it automatically organises your time column in a chronological fashion.

In [None]:
astro.dtypes

#### Barbecuer

The __Barbecuer__ table is handled by another Data Engineering team. Lets see how they stored it.

In [None]:
barb = pd.read_csv('../Files/Barbecuer_sales.csv')

In [None]:
barb.head()

In [None]:
barb.dtypes

In [None]:
barb.plot(x='date', y='sales', figsize=(15,6))
plt.show()

What are we plotting now?

In [None]:
barb['date'] = pd.to_datetime(barb['date'])

In [None]:
barb.plot(x='date', y='sales', figsize=(15,6))
plt.show()

In [None]:
barb.head()

We know for sure that we are not monitoring data for a mobile app in 1970!

Whilst __to\_datetime__ is very smart at inferring your time format, it will become apparent it does not cover every situation. Let's restart the Barbecuer app analysis.

In [None]:
barb = pd.read_csv('../Files/Barbecuer_sales.csv')
barb.head(10)

<div class="alert alert-info"> 
    <br>
    <b>Exercise: We know this interval is for the first days of January 2024. Can you infer the datetime format?</b>   
    <br>
    <br>
</div>

In [None]:
barb['date'] = pd.to_datetime(barb['date'], format='%y%m%d%H')

In [None]:
barb.plot(x='date', y='sales', figsize=(15,6))
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(15,6))

astro.plot(x='date', y='sales', ax=ax, label='Astronomy')
barb.plot(x='date', y='sales', ax=ax, label='Barbecuer')

plt.show()

<div class="alert alert-info"> 
    <br>
    <b>Exercise: A third app, Cacophony, was released just before the new year. How is it doing compared with the other apps?</b>   
    <br>
    <br>
</div>

In [None]:
# %load ../Files/cacophony.py