<a href="https://colab.research.google.com/github/maxmatical/Machine-Learning/blob/master/datepart_function.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The add_datepart function requires fastai and fastai.tabular. You can install fastai using the following code

In [0]:
# Set up environment and download fastai v1
!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
!pip install fastai
#!pip install fastprogress
#!pip install pathlib

In [0]:
from fastai import *
from fastai.tabular import *

One particularly useful function in fastai is the regex function

In [12]:
help(re)

Help on module re:

NAME
    re - Support for regular expressions (RE).

MODULE REFERENCE
    https://docs.python.org/3.6/library/re
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides regular expression matching operations similar to
    those found in Perl.  It supports both 8-bit and Unicode strings; both
    the pattern and the strings being processed can contain null bytes and
    characters outside the US ASCII range.
    
    Regular expressions can contain both special and ordinary characters.
    Most ordinary characters, like "A", "a", or "0", are the simplest
    regular expressions; they simply match themselves.  You can
    concatenate ordinary characters, so last mat

In [3]:
# check the directory of the data 
!ls -d $PWD/*

/content/sample_data


In [0]:
import pandas as pd
import numpy as np

path will have to be changed to match the path of the relevant data

In [0]:
path = '/content/sample_data'

In [0]:
gold = pd.read_csv(f'{path}/GC1 Comdty_t.csv')

In [7]:
gold.head(5)

Unnamed: 0.1,Unnamed: 0,GC1 Comdty,Bid,Ask,Last,High,Low,Volume,Open_Interest
0,2,1/6/2000,,,282.4,282.8,280.2,26026.0,67505.0
1,3,1/7/2000,,,282.9,284.5,282.0,19396.0,68731.0
2,4,1/10/2000,,,282.7,283.9,281.8,11612.0,66778.0
3,5,1/11/2000,,,284.4,285.3,281.9,30928.0,64731.0
4,6,1/12/2000,,,283.7,285.0,282.5,13678.0,64629.0


Current dataframe only has a date field "GC1 Comdty"

add_datepart function 

In [0]:
def add_datepart(df, fldname, drop=True, time=False):
    "Helper function that adds columns relevant to a date."
    fld = df[fldname]
    fld_dtype = fld.dtype
    if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        fld_dtype = np.datetime64

    if not np.issubdtype(fld_dtype, np.datetime64):
        df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
    targ_pre = re.sub('[Dd]ate$', '', fldname)
    attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
            'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    if time: attr = attr + ['Hour', 'Minute', 'Second']
    for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
    df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9
    if drop: df.drop(fldname, axis=1, inplace=True)

In [0]:
add_datepart(gold, "GC1 Comdty", drop=True) #drop = True removes the original date attribute from the dataframe

In [10]:
gold.head(5)

Unnamed: 0.1,Unnamed: 0,Bid,Ask,Last,High,Low,Volume,Open_Interest,GC1 ComdtyYear,GC1 ComdtyMonth,...,GC1 ComdtyDay,GC1 ComdtyDayofweek,GC1 ComdtyDayofyear,GC1 ComdtyIs_month_end,GC1 ComdtyIs_month_start,GC1 ComdtyIs_quarter_end,GC1 ComdtyIs_quarter_start,GC1 ComdtyIs_year_end,GC1 ComdtyIs_year_start,GC1 ComdtyElapsed
0,2,,,282.4,282.8,280.2,26026.0,67505.0,2000,1,...,6,3,6,False,False,False,False,False,False,947116800
1,3,,,282.9,284.5,282.0,19396.0,68731.0,2000,1,...,7,4,7,False,False,False,False,False,False,947203200
2,4,,,282.7,283.9,281.8,11612.0,66778.0,2000,1,...,10,0,10,False,False,False,False,False,False,947462400
3,5,,,284.4,285.3,281.9,30928.0,64731.0,2000,1,...,11,1,11,False,False,False,False,False,False,947548800
4,6,,,283.7,285.0,282.5,13678.0,64629.0,2000,1,...,12,2,12,False,False,False,False,False,False,947635200


We extract metadata from the Date attribute to create 13 extra features. Certain features might be useful for financial data, as financial data tend to have some seasonal factors affecting it.

However, the time elapsted feature (GC1 ComdtyElapsed) might not be too particularly useful, so we can drop that feature.

In [0]:
gold = gold.drop('GC1 ComdtyElapsed', axis = 1)

In [18]:
gold.head(5)

Unnamed: 0.1,Unnamed: 0,Bid,Ask,Last,High,Low,Volume,Open_Interest,GC1 ComdtyYear,GC1 ComdtyMonth,GC1 ComdtyWeek,GC1 ComdtyDay,GC1 ComdtyDayofweek,GC1 ComdtyDayofyear,GC1 ComdtyIs_month_end,GC1 ComdtyIs_month_start,GC1 ComdtyIs_quarter_end,GC1 ComdtyIs_quarter_start,GC1 ComdtyIs_year_end,GC1 ComdtyIs_year_start
0,2,,,282.4,282.8,280.2,26026.0,67505.0,2000,1,1,6,3,6,False,False,False,False,False,False
1,3,,,282.9,284.5,282.0,19396.0,68731.0,2000,1,1,7,4,7,False,False,False,False,False,False
2,4,,,282.7,283.9,281.8,11612.0,66778.0,2000,1,2,10,0,10,False,False,False,False,False,False
3,5,,,284.4,285.3,281.9,30928.0,64731.0,2000,1,2,11,1,11,False,False,False,False,False,False
4,6,,,283.7,285.0,282.5,13678.0,64629.0,2000,1,2,12,2,12,False,False,False,False,False,False
