# Wk18-Lecture01-CodeAlong: Preparing Time Series Data

## Learning Objectives

- By the end of this CodeAlong, students will be able to:
    - Create date time indices
    - Resample at various frequencies
    - Impute null values for time series 
    - Convert wide-form data to long-form
    

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# import matplotlib.ticker as mticks

import missingno as miss

import seaborn as sns
import numpy as np

pd.set_option('display.float_format',lambda x:f"{x:,.2f}")

In [None]:
sns.set_context('notebook', font_scale=0.9)
plt.style.use(['ggplot'])

In [None]:
crypto = pd.read_csv("Data/stocks/wide-form-crypto.csv")
crypto

# Working with Wide-Form Time Series Data (Regular Intervals)

## <font color='blue'> Step 1 </font>: Convert datetime to One Column

In [None]:
## First columns are id columns


In [None]:
## Melt the crypto data 


## <font color='blue'> Step 2: </font> Convert Datetime column to <font color='green'> datetime </font> type with <font color='green'> pd.to_datetime </font>

## Using pd.to_datetime with strftime codes!

- Datetime objects have a `.strftime()` method (string-format-time)

- 📖 **strftime cheat sheet: https://strftime.org/**
- 📖 **Official Table of Python datetime format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes **


| Directive   | Meaning                                                                                                                                                                          | Example                                                                      |
|:------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------|
| %a          | Weekday as locale’s abbreviated name.                                                                                                                                            | Sun, Mon, …, Sat (en_US); So, Mo, …, Sa (de_DE)                              |
| %A          | Weekday as locale’s full name.                                                                                                                                                   | Sunday, Monday, …, Saturday (en_US); Sonntag, Montag, …, Samstag (de_DE)     |
| %w          | Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.                                                                                                                | 0, 1, …, 6                                                                   |
| %d          | Day of the month as a zero-padded decimal number.                                                                                                                                | 01, 02, …, 31                                                                |
| %b          | Month as locale’s abbreviated name.                                                                                                                                              | Jan, Feb, …, Dec (en_US); Jan, Feb, …, Dez (de_DE)                           |
| %B          | Month as locale’s full name.                                                                                                                                                     | January, February, …, December (en_US); Januar, Februar, …, Dezember (de_DE) |
| %m          | Month as a zero-padded decimal number.                                                                                                                                           | 01, 02, …, 12                                                                |
| %y          | Year without century as a zero-padded decimal number.                                                                                                                            | 00, 01, …, 99                                                                |
| %Y          | Year with century as a decimal number.                                                                                                                                           | 0001, 0002, …, 2013, 2014, …, 9998, 9999                                     |
| %H          | Hour (24-hour clock) as a zero-padded decimal number.                                                                                                                            | 00, 01, …, 23                                                                |
| %I          | Hour (12-hour clock) as a zero-padded decimal number.                                                                                                                            | 01, 02, …, 12                                                                |
| %p          | Locale’s equivalent of either AM or PM.                                                                                                                                          | AM, PM (en_US); am, pm (de_DE)                                               |
| %M          | Minute as a zero-padded decimal number.                                                                                                                                          | 00, 01, …, 59                                                                |
| %S          | Second as a zero-padded decimal number.                                                                                                                                          | 00, 01, …, 59                                                                |
| %f          | Microsecond as a decimal number, zero-padded to 6 digits.                                                                                                                        | 000000, 000001, …, 999999                                                    |
| %z          | UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive).                                                                                                 | (empty), +0000, -0400, +1030, +063415, -030712.345216                        |
| %Z          | Time zone name (empty string if the object is naive).                                                                                                                            | (empty), UTC, GMT                                                            |
| %j          | Day of the year as a zero-padded decimal number.                                                                                                                                 | 001, 002, …, 366                                                             |
| %U          | Week number of the year (Sunday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. | 00, 01, …, 53                                                                |
| %W          | Week number of the year (Monday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Monday are considered to be in week 0. | 00, 01, …, 53                                                                |
| %c          | Locale’s appropriate date and time representation.                                                                                                                               | Tue Aug 16 21:30:00 1988 (en_US); Di 16 Aug 21:30:00 1988 (de_DE)            |
| %x          | Locale’s appropriate date representation.                                                                                                                                        | 08/16/88 (None); 08/16/1988 (en_US); 16.08.1988 (de_DE)                      |
| %X          | Locale’s appropriate time representation.                                                                                                                                        | 21:30:00 (en_US); 21:30:00 (de_DE)                                           |
| %%          | A literal '%' character.                                                                                                                                                         | %                                                                            |

In [None]:
## Compare to dates in dataframe
long_crypto['Date'][0]

In [None]:
## Creating/testing our time format
import datetime
today_datetime = datetime.datetime.today()

fmt = ## strftime code
today_datetime.strftime(fmt)

- We can speed up the conversion by specifying the correct time format and setting `infer_datetime_format=False`

In [None]:
## Use the fmt to convert datetime type


## <font color='blue'> Step 3: </font> Set datetime as Index

In [None]:
## Set Date as index


In [3]:
## Check index


## Example Slicing with Datetime Index <font color='red'> Super Powers!

In [None]:
## Slice One Year:


In [None]:
## Slice One Month


In [None]:
## Slice One day and two columns


In [None]:
## Slice Range of Dates


## <font color='blue'> Step 4: </font> Resample to Desired Frequency

### Pandas Frequency Codes

![pandas frequency codes](../pandas_freq_cheatsheet.png)

### Grouping and resampling in one step!

In [None]:
# Group by currency and resample as daily. Keep only 'Value' column
crypto_ts = "" ## group and resample
crypto_ts

In [None]:
## Check type


In [None]:
## Check index


In [None]:
## Plot data


> What are we seeing?

In [None]:
## Check data again


In [None]:
## Unstack the data
crypto_unstacked = "" ## unstack 
crypto_unstacked

In [None]:
## Plot unstacked data


# Changing matplotlib default plot size

In [None]:
## Check current Default


In [None]:
## Change default to be wider


In [None]:
## Try plotting again


## Selecting our TS for Resampling Demonstration

We are going to clice out the data for `Bitcoin` during 2021 and 2022

In [None]:
## Slice out 2021 and 2022 Bitcoin data
ts = '' ## Slice
ts

In [None]:
## Plot new ts data


## Resampling

In [None]:
## Check the index to confirm the current freq


>It looks like we have minute-resolution data (frequency = minutes).

### Resample as Weeks Frequency

In [None]:
## Resample as weekly using correct freq code and use .asfreq as agg emthod
ts_W = '' ## resample
ts_W

In [None]:
## Check index for frequency


In [None]:
## Plot the weekly data


### Let's resample and plot our ts as daily, weekly, and monthly to compare.

In [None]:
freqs = ['D','W','M','Q','A']


for freq in freqs:
    ax = ts.plot(label='Original',  style='.-',
            title=f'Comparing Resampled Frequencies: {freq}');

    ts_temp = ts.resample(freq).last()
    ts_temp.plot(style='.-', label=freq,ax=ax)
    plt.legend()
    plt.show()

# <font color='blue'> Step 5 </font> Impute as Needed

There are no missing values currently, but let's make some and the impute them.

In [None]:
## Make a copy of the unstacked data
missing = ''

## Replace all 2020 values with nans

## Plot results


### Forward Fill

In [None]:
## Forward Fill missing values
missing_ff = ''

## Plot results


### Back Fill

In [None]:
## Back fill missing values
missing_bf = ''

## Plot results


### Interpolate

In [None]:
## interpolate missing values
missing_interp = ''

## plot results


# Save the processed and unstacked data

In [None]:
import os

## Make a new folder to save data
folder = "Data/FromClass/"
os.makedirs(folder, exist_ok=True)

In [None]:
## Save the unstacked data
crypto_unstacked.to_csv(folder + 'crypto_currencies.csv')

# Appendix: Using Tick Date Formatters/Locators

- Let's add a minor xtick every 3 months.

In [None]:
import matplotlib.dates as mdates

In [None]:
## CREATE ARTISTS FOR MAJOR XTICKS (Years)
# Create a year locator
loc_major_yr = mdates.YearLocator()
# Create a year formatter using 4-digit years
fmt_major_yr = mdates.DateFormatter("%Y")


## CREATE ARTISTS FOR MINOR XTICKS ( Months)
# Create a month locator that will add months at 1,4,7,10
loc_minor_3m = mdates.MonthLocator(bymonth=[1,4,7,10])
# Createa monthnformatter that will use 3-letter month names
fmt_minor_3m = mdates.DateFormatter("%b")

In [None]:
## Create our plot and save the ax
fig, ax = plt.subplots()
ax.plot(crypto_unstacked)

# ax = crypto_unstacked.plot()
ax.set(ylabel="Value", title='Crypto Coins')

# Set xaxis major locator/formatter
ax.xaxis.set_major_locator(loc_major_yr)
ax.xaxis.set_major_formatter(fmt_major_yr)


# Set xaxis minor locator/formatter
ax.xaxis.set_minor_locator(loc_minor_3m)
ax.xaxis.set_minor_formatter(fmt_minor_3m)

In [None]:
## Create our plot and save the ax
fig, ax = plt.subplots()
ax.plot(crypto_unstacked)

# Set the labels and title
ax.set(ylabel="Value", title='Crypto Coins')

# Set xaxis major locator/formatter
ax.xaxis.set_major_locator(loc_major_yr)
ax.xaxis.set_major_formatter(fmt_major_yr)


# Set xaxis minor locator/formatter
ax.xaxis.set_minor_locator(loc_minor_3m)
ax.xaxis.set_minor_formatter(fmt_minor_3m)



# Add gridlines for major xaxis ticks
ax.grid(which='major',axis='x',color='k',ls=':',lw=1)

## Rotate the major tick years using fig.autofmt_xdate
fig = ax.get_figure()
fig.autofmt_xdate(which='major', rotation=90,ha='center')

In [None]:
def format_xdates(ax):
    
    # Create a year locator
    loc_major_yr = mdates.YearLocator()
    # Create a year formatter using 4-digit years
    fmt_major_yr = mdates.DateFormatter("%Y")


    # Create a month locator that will add months at 1,4,7,10
    loc_minor_3m = mdates.MonthLocator(bymonth=[1,4,7,10])
    # Createa monthnformatter that will use 3-letter month names
    fmt_minor_3m = mdates.DateFormatter("%b")
    

    
    # Set xaxis major locator/formatter
    ax.xaxis.set_major_locator(loc_major_yr)
    ax.xaxis.set_major_formatter(fmt_major_yr)


    # Set xaxis minor locator/formatter
    ax.xaxis.set_minor_locator(loc_minor_3m)
    ax.xaxis.set_minor_formatter(fmt_minor_3m)

    

    # Add gridlines for major xaxis ticks
    ax.grid(which='major',axis='x',color='k',ls=':',lw=1)

    ## Rotate the major tick years using fig.autofmt_xdate
    fig = ax.get_figure()
    fig.autofmt_xdate(which='major', rotation=90,ha='center')
    return fig


In [None]:
## Create the figure and axis
fig, ax = plt.subplots()

## Format the xticks
format_xdates(ax)
ax.set_title('Crypto Coins')
ax.set_ylabel('Value')
## Plot the data
ax.plot(crypto_unstacked);