# Time Series 

**[1] Python datetime module**<br>
- Datetime object<br>
    - Method: Conversion from string to datetime<br>
    - Method: Conversion from datetime to string<br>
- Timedelta object<br>

**[2] Pandas**<br>
- Function <code>to_datetime()</code><br>
- DatetimeIndex<br>
    - Subset selection<br>
    - Information extraction<br>
    - Method <code>resample()</code><br>

## [1] Python dateTime module

In [None]:
import datetime as dt

- **Datetime object - creation**

In [None]:
# Use datetime()
mydt = dt.datetime(2021, 7, 1)
mydt

In [None]:
# Get current time by using datetime.now()
datetime_now = dt.datetime.now()
datetime_now

- **Datetime object - attributes**

In [None]:
mydt = dt.datetime(year = 2021, month = 7, day = 6, hour = 11, minute = 21)
mydt

In [None]:
print(mydt.year)
print(mydt.month)
print(mydt.day)
print(mydt.hour)
print(mydt.minute)

- **Format codes**

|Directive|Meaning|
|:--|:--|
|%Y|Four-digit year|
|%y|Two-digit year|
|%m|Two-digit month [01,12]|
|%d|Two-digit day [01,31]|
|%H|Hour (24-hour clock) [00,23]|
|%M|Two-digit minute [00,59]|
|%S|Two-digit minute [00,59]|


- **Method - <code>strptime()</code>**<br>

Returns a DateTime object corresponding to the string.

In [None]:
s1 = '2019-01-03'
t1 = dt.datetime.strptime(s1, '%Y-%m-%d')
t1

In [None]:
s2 = '03/01/2019'
t2 = dt.datetime.strptime(s2, '%d/%m/%Y')
t2

In [None]:
s3 = '03/01/19'
t3 = dt.datetime.strptime(s3, '%d/%m/%y')
t3

In [None]:
s4 = '10:30 03/01/19'
t4 = dt.datetime.strptime(s4, '%H:%M %d/%m/%y')
t4

- **Method - <code>strftime()</code>**<br>

Returns a string representation of the DateTime object with the given format.

In [None]:
t5 = dt.datetime(2019,1,3)
t5

In [None]:
s5 = t5.strftime('%d-%m-%Y')
s5

- **Timedelta object**<br>

The difference between two datetime objects.

- Example-1

In [None]:
# Create two datetime objects
t1 = dt.datetime(2021, 6, 15)
t2 = dt.datetime(2021, 7, 6)

# Create a timedelta obejct
diff_12 = t2 - t1
diff_12

In [None]:
# Access attribute of a timedelta object
diff_34.seconds

- Example-2

In [None]:
# Create two datetime objects
t3 = dt.datetime(2021, 6, 15, 10, 12, 40)
t4 = dt.datetime(2021, 6, 15, 10, 13, 20)

# Create a timedelta obejct
diff_34 = t4 - t3
diff_34

In [None]:
# Access attribute of a timedelta object
diff_34.seconds

## Exercise.A

**(A.1) Create a datetime object named <code>dt_start</code> with the following arguments: year = 2022, month = 8, day = 15.**

**(A.2) Convert the following variable <code>str1</code> to a datetime object named <code>dt_end</code>.**

In [None]:
str1 = "2022-11-13"

**(A.3) How many days between <code>dt_start</code> and <code>dt_end</code>?**

## [2] Pandas 

In [None]:
import pandas as pd

### [2.1] Function <code>to_datetime()</code>

- **Import dataset with datetime column**

In [None]:
covid_df = pd.read_csv("../dataset/covid_2021.csv")
covid_df

- **Convert a column to a datetime type**

In [None]:
covid_df["date"] = pd.to_datetime(covid_df["date"])

covid_df.info()

### [2.2] DatetimeIndex

- **Import dataset and parse datetime columns**<br>
    - <code>parse_date</code>: Specify the column names that should be parsed as dates.
    - <code>index_col</code>: Specify which column should be used as the index.

In [None]:
covid_df = pd.read_csv("../dataset/covid_2021.csv", 
                       parse_dates = ["date"], 
                       index_col = 0)   # or index_col = "date"
covid_df.head(10)

### [2.2.1] DatetimeIndex: Subset selection

- **Subset selection using DatetimeIndex**

In [None]:
# by month
covid_df.loc['2021-05',:]

In [None]:
# range
covid_df.loc['2021-05-25':'2021-06-01',:]

- **Line chart**

In [None]:
covid_df.plot(y = 'positive', figsize = (12,4))

## Exercise.B

**(B.1) Import dataset <code>fashion.csv</code> and set the column <code>Date</code>as DatetimeIndex.**

**(B.2) Draw a line chart to show Tiger_of_Sweden’s sales in 2016.**

**(B.3) Use a multiple line chart to show the sales of Eton, Levi_s, and Tiger_of_Sweden from 2014 to 2016.**

### [2.2.2] DatetimeIndex: Information extraction

- **Add "month" as a new column**

In [None]:
covid_df["month"] = covid_df.index.month
covid_df

- **Add "day_of_week" as a new column**
    - Attribute <code>weekday</code>: 0, 1, ..., 6 (Sunday).
    - Method <code>day_name()</code>: Monday, Tuesday, ..., Sunday.

In [None]:
covid_df["day_of_week"] = covid_df.index.day_name()
covid_df

- **Group data by new column**

In [None]:
covid_df.groupby("day_of_week").positive.mean()

### [2.2.3] DatetimeIndex: Resampling

- **Aggregate daily data to monthly data**

In [None]:
# Step1: Get a Resampler object  
covid_rs = covid_df.resample('M')
type(covid_rs)

In [None]:
#Step2: Call an aggregate function
covid_month = covid_rs.positive.sum()
covid_month

- **Cast to index at a particular frequency**

In [None]:
covid_month.index = covid_month.index.to_period('M')
covid_month

- **Visualize monthly data**

In [None]:
covid_month.plot(kind = "bar", y = "positive")

- **Example-1:** Calculate weekly total number of positive cases

In [None]:
# Aggregate the "positive" column weekly and calculate the sum of cases
covid_df.resample('W').positive.sum()

- **Example-1:** Calculate the number of complaints per day

In [None]:
complain_df = pd.read_csv("../dataset/complaints.csv", 
                          parse_dates = ["Created Date"], 
                          index_col = [1], 
                          dtype = {"Incident Zip":object})
complain_df.sort_index(inplace = True)
complain_df.head()

In [None]:
complain_df.resample("D").size().plot(figsize = (10, 4))

## Exercise.C

**(C.1) Use the dataframe obtained in Exercise B. Group the data by year and calculate the total annual sales of each brand. Store the result in a new variable named <code>year_df</code>.**<br>
Hint: <code>resample("Y").sum()</code>

**(C.2) Use the year as the index of <code>year_df</code>.**<br>
Hint: <code>to_period()</code>

**(C.3) Display the result obtained in (C.2) with a heatmap.**

- **Question**: In which year did Tiger of Sweden have the highest annual sales?
- **Answer**:     