# Project: Predicting Stock Price & Return  
Extrat Transform Load (ETL)
## ver 0.1  
All guidance and guildlines provided by Romeo Kienzler

### Guidance - Extract, transform, load (ETL)  
This task is an important step in transforming the data from the source system into data suitable for analytics. In traditional data warehousing, this process includes accessing the online transaction processing (OLTP) system’s databases, transforming the data from a highly normalized data model into a Star or Snowflake Schema, and storing the data to a data warehouse. In data science projects, this step is usually much simpler. The data arrives in an exported format (for example, JSON or CSV). But, sometimes de-normalization must be done as well. The result usually ends up in a bulk storage like Cloud Object Store.

### Guildlines for Data Cleansing
During the transformation, we will also perform data cleansing. Here are the Guildlinesfrom IBM cloud garage method:  
- Data types  
Are data types of columns matching their content? E.g. is age stored as integer and not as string?
- Ranges  
Does the value distribution of values in a column make sense? Use stats (e.g. min, max, mean, standard deviation) and visualizations (e.g. boxplot, histogram) for help
- Emptiness  
Are all values non-null where mandaroty? E.g. client IDs  
- Uniqueness  
Are duplicates present where undesired? E.g. client IDs
- Set memberships  
Ar only allowed valus chosen for categorical or ordinal fields? E.g. Female, Male, Unknown
- Foreign key set memberships  
Are only allowed values chosen a field? E.g. ZIP code  
- Cross-field validation  
Some fields can impact validity of other fields. E.g. a male person can't be pregnant



### Data Integration - Technology Choice and Justification  

The data is integrated using pandas join function, because all datasets are indexed on datetimes series in a same perid, and the size of dataset is relatively small.

### Data Repository - Technology Choice and Justification  
The data is saved as csv files in the project directory. The size of dataset is relatively small, so no extra need for advanced data repository technique.

#### 0. Import packages

In [367]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [368]:
#let the notebook display full length of the data columns
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)

#### 1. Extract Data

##### Define extract functions  
These functions can be easily changed if we want to change the data acquisition methods.

In [369]:
def extract_stock_data(name):
    '''
    read stock data from csv file
    '''
    data = pd.read_csv(name + ".csv", parse_dates=['date'], index_col='date')
    return data

In [370]:
def extract_indecis_data(name):
    '''
    read index data from csv file
    '''
    data = pd.read_csv(name + ".csv", parse_dates=['date'], index_col='date')
    return data

In [371]:
def extract_exchange_data(name):
    '''
    read exchange data from csv file
    '''
    data = pd.read_csv(name + ".csv", parse_dates=['date'], index_col='date')
    return data

In [372]:
def extract_trend_data(name):
    '''
    read trend data from csv file
    '''
    data = pd.read_csv("trend_" + name + ".csv", parse_dates=['date'], index_col='date')
    return data

In [373]:
def extract_data(stock_list=['loblaw', 'metro', 'empa', 'gwl', 'atd', 'tsx', 'sp500'],
                 indecis_list=['BCPI', 'CPI', 'bank_interest'], 
                 exchange_list=['CEER'], 
                 trend_list=['grocery_store', 'loblaws', 'stock']):

    data = pd.DataFrame()
    for stock in stock_list:
        stock_data = extract_stock_data(stock)
        stock_data.columns = [stock + '_price', stock + '_volume']
        data = data.join(stock_data, how='outer')

    for index in indecis_list:
        indecis_data = extract_indecis_data(index)
        indecis_data.columns = [index]
        data = data.join(indecis_data, how='outer')

    for exchange in exchange_list:
        exchange_data = extract_indecis_data(exchange)
        exchange_data.columns = [exchange]
        data = data.join(exchange_data, how='outer')

    for trend in trend_list:
        trend_data = extract_trend_data(trend)
        trend_data.columns = ["trend_" + trend]
        data = data.join(trend_data, how='outer')

    return data


##### Extract the data using the functions  
Inspect the data and we will see lots of missing values. This is because the data have different starting and ending date, and some data have lower frequency such as weekly and monthly.

In [374]:
data_extracted = extract_data(stock_list=['loblaw', 'metro', 'empa', 'gwl', 'atd', 'tsx', 'sp500'],
                              indecis_list=['BCPI', 'CPI', 'bank_interest'], 
                              exchange_list=['CEER'], 
                              trend_list=['grocery_store', 'loblaws', 'stock'])

In [375]:
data_extracted.head()

Unnamed: 0_level_0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1999-01-01,,,,,,,,,,,,,,,,,,95.43,,,
1999-01-04,,,,,,,,,,,,,,,,,,95.47,,,
1999-01-05,,,,,,,,,,,,,,,,,,95.84,,,
1999-01-06,,,,,,,,,,,,,,,,,,96.66,,,
1999-01-07,,,,,,,,,,,,,,,,,,96.46,,,


In [376]:
data_extracted.tail()

Unnamed: 0_level_0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-14,,,,,,,,,,,,,2966.1499,2557020000.0,,,,,,,
2019-10-15,72.6,446100.0,56.59,627000.0,34.92,386557.0,109.3,107100.0,39.68,1290026.0,16418.4004,212333000.0,2995.6799,3340740000.0,,,,,,,
2019-10-16,71.62,716200.0,56.06,435900.0,34.7,805510.0,109.22,170900.0,39.58,1623424.0,16427.1992,165968300.0,2989.6899,3222570000.0,,,,,,,
2019-10-17,71.5,528200.0,56.23,700600.0,35.13,576685.0,109.28,120000.0,39.17,1868090.0,16426.3008,168397700.0,2997.95,3115960000.0,,,,,,,
2019-10-18,71.22,284800.0,55.95,633400.0,34.84,458759.0,108.84,72500.0,39.02,1669638.0,16377.0996,166347800.0,2986.2,3264290000.0,,,,,,,


In [377]:
data_extracted.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5495 entries, 1999-01-01 to 2019-10-18
Data columns (total 21 columns):
loblaw_price           5019 non-null float64
loblaw_volume          5019 non-null float64
metro_price            5020 non-null float64
metro_volume           5020 non-null float64
empa_price             5020 non-null float64
empa_volume            5020 non-null float64
gwl_price              5019 non-null float64
gwl_volume             5019 non-null float64
atd_price              5020 non-null float64
atd_volume             5020 non-null float64
tsx_price              5020 non-null float64
tsx_volume             5020 non-null float64
sp500_price            4981 non-null float64
sp500_volume           4981 non-null float64
BCPI                   1030 non-null float64
CPI                    237 non-null float64
bank_interest          1030 non-null float64
CEER                   5421 non-null float64
trend_grocery_store    190 non-null float64
trend_loblaws         

In [378]:
data_extracted.describe()

Unnamed: 0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
count,5019.0,5019.0,5020.0,5020.0,5020.0,5020.0,5019.0,5019.0,5020.0,5020.0,5020.0,5020.0,4981.0,4981.0,1030.0,237.0,1030.0,5421.0,190.0,190.0,190.0
mean,35.579635,529316.3,16.246786,702454.8,13.307532,228154.3,70.817707,115687.1,9.950892,2032111.0,11835.349384,172011600.0,1559.498333,3087279000.0,474.326641,115.605907,4.006359,119.529555,51.421053,38.878947,26.221053
std,12.863246,592203.5,15.055826,805103.2,8.321394,297472.5,21.306746,103432.5,12.264966,2337872.0,2967.700788,75760970.0,567.90356,1478798000.0,133.807475,11.783501,1.397609,13.244301,15.005169,8.555519,20.481081
min,0.0,0.0,1.1017,0.0,0.0,0.0,0.0,0.0,0.2445,0.0,0.0,0.0,676.53,356070000.0,222.7,93.5,2.25,95.43,34.0,24.0,11.0
25%,26.28925,251200.0,4.063425,346500.0,5.5083,60576.75,55.7085,58350.0,1.5241,952842.0,9276.5996,123839100.0,1149.5,1662000000.0,365.9825,105.4,3.0,110.3,41.25,35.0,14.0
50%,32.5224,414100.0,8.98675,545550.0,11.4928,147853.5,68.3369,92000.0,2.7479,1562724.0,12262.3003,172094300.0,1355.8101,3212320000.0,451.6,115.6,3.7,118.33,46.0,38.0,17.0
75%,44.0486,644400.0,23.034,854175.0,20.067175,301813.5,87.77395,139300.0,17.3324,2369748.0,14201.12475,213550000.0,1966.97,3919240000.0,580.9875,125.7,4.75,131.31,60.0,41.0,26.75
max,75.77,14825200.0,58.52,31572000.0,37.2337,5316156.0,117.9562,1635000.0,43.7946,65852270.0,16899.6992,858888100.0,3025.8601,11456230000.0,911.94,137.0,7.5,150.48,100.0,100.0,100.0



    
#### 2. Transform the data

In [379]:
def transform_data(data_extracted,
                              stock_list=['loblaw', 'metro', 'empa', 'gwl', 'atd', 'tsx', 'sp500'],
                              indecis_list=['BCPI', 'CPI', 'bank_interest'], 
                              exchange_list=['CEER'], 
                              trend_list=['grocery_store', 'loblaws', 'stock']):
    '''
    tranform the data into a usable form. include slicing the date, interpolating values, etc.
    '''
    data_extracted = data_extracted.resample('B').interpolate()

    data_extracted = data_extracted['2003':]

    data_extracted = data_extracted[1:]

    for column in [column for column in data_extracted.columns if ('price' in column) or ('volume' in column)]:
        for row in data_extracted[data_extracted[column] == 0].iterrows():
            '''
            last_price = data_extracted.loc[:row[0]].tail(2).head(1)
            while last_price[column].squeeze() == 0:
                last_price = data_extracted.loc[:last_price.index.max()].tail(2).head(1)
            last_price = last_price[column].squeeze()

            next_price = data_extracted.loc[row[0]:].head(2).tail(1)
            while next_price[column].squeeze() == 0:
                next_price = data_extracted.loc[next_price.index.min():].head(2).tail(1)
            next_price = next_price[column].squeeze()

            data_extracted.loc[row[0]][column] = np.mean([last_price, next_price])
            '''
            # The previous solution was to use mean value to subsitute the 0 value of price, now we decide to drop these rows
            data_extracted.drop(row[0], inplace=True)

    return data_extracted


In [380]:
data_transformed = transform_data(data_extracted)

In [381]:
data_extracted.describe()

Unnamed: 0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
count,5019.0,5019.0,5020.0,5020.0,5020.0,5020.0,5019.0,5019.0,5020.0,5020.0,5020.0,5020.0,4981.0,4981.0,1030.0,237.0,1030.0,5421.0,190.0,190.0,190.0
mean,35.579635,529316.3,16.246786,702454.8,13.307532,228154.3,70.817707,115687.1,9.950892,2032111.0,11835.349384,172011600.0,1559.498333,3087279000.0,474.326641,115.605907,4.006359,119.529555,51.421053,38.878947,26.221053
std,12.863246,592203.5,15.055826,805103.2,8.321394,297472.5,21.306746,103432.5,12.264966,2337872.0,2967.700788,75760970.0,567.90356,1478798000.0,133.807475,11.783501,1.397609,13.244301,15.005169,8.555519,20.481081
min,0.0,0.0,1.1017,0.0,0.0,0.0,0.0,0.0,0.2445,0.0,0.0,0.0,676.53,356070000.0,222.7,93.5,2.25,95.43,34.0,24.0,11.0
25%,26.28925,251200.0,4.063425,346500.0,5.5083,60576.75,55.7085,58350.0,1.5241,952842.0,9276.5996,123839100.0,1149.5,1662000000.0,365.9825,105.4,3.0,110.3,41.25,35.0,14.0
50%,32.5224,414100.0,8.98675,545550.0,11.4928,147853.5,68.3369,92000.0,2.7479,1562724.0,12262.3003,172094300.0,1355.8101,3212320000.0,451.6,115.6,3.7,118.33,46.0,38.0,17.0
75%,44.0486,644400.0,23.034,854175.0,20.067175,301813.5,87.77395,139300.0,17.3324,2369748.0,14201.12475,213550000.0,1966.97,3919240000.0,580.9875,125.7,4.75,131.31,60.0,41.0,26.75
max,75.77,14825200.0,58.52,31572000.0,37.2337,5316156.0,117.9562,1635000.0,43.7946,65852270.0,16899.6992,858888100.0,3025.8601,11456230000.0,911.94,137.0,7.5,150.48,100.0,100.0,100.0


In [382]:
data_transformed.describe()

Unnamed: 0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
count,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4104.0,4104.0,4104.0
mean,37.109403,569422.0,18.940323,719927.0,15.265296,248076.1,73.002794,120993.5,11.782132,2197581.0,12590.821939,186065500.0,1624.011852,3407804000.0,506.435424,118.977643,3.682266,124.289658,51.271107,38.847096,26.519336
std,13.029652,579191.1,15.023725,826194.1,7.693179,284101.9,20.707893,104031.6,12.59911,2227285.0,2530.510741,71232670.0,587.518266,1349572000.0,119.183529,9.740415,1.124192,10.070194,14.764096,8.278884,20.730267
min,16.836,24500.0,2.9861,11400.0,3.8082,2385.0,33.2111,5000.0,0.5754,65598.0,6228.6001,182400.0,676.53,487220000.0,277.38,102.00625,2.25,96.7,34.0,24.0,11.0
25%,26.480875,292625.0,6.716625,372600.0,8.646075,78748.5,56.3876,63800.0,2.166825,1180112.0,11423.9004,146595300.0,1182.6,2566928000.0,407.7325,111.606395,2.9925,116.29,41.314723,35.603488,14.046512
50%,34.1724,455300.0,12.10315,563600.0,14.22745,176817.0,69.6216,96150.0,3.7431,1680761.0,12813.75,184449800.0,1400.035,3400680000.0,498.376,119.85,3.0,123.42,46.0,38.043478,16.608696
75%,48.450775,682000.0,33.40475,860025.0,21.2398,326401.5,92.7288,144325.0,25.90615,2493138.0,14594.17505,221273400.0,2060.9975,4066398000.0,605.2085,126.986379,4.25,133.8325,59.965909,40.627907,25.662285
max,75.77,14825200.0,58.52,31572000.0,37.2337,4790751.0,117.9562,1635000.0,43.7946,65852270.0,16899.6992,858888100.0,3025.8601,11456230000.0,911.94,137.0,6.25,150.48,100.0,100.0,100.0


In [383]:
data_transformed.head()

Unnamed: 0_level_0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-01-02,31.6037,134900.0,3.1325,101400.0,5.1538,17982.0,65.4457,133000.0,0.7302,296784.0,6740.1001,76406700.0,909.03,1229200000.0,305.898,102.00625,4.5,96.7,,,
2003-01-03,31.3404,48500.0,3.1501,273600.0,5.1797,34761.0,65.367,9300.0,0.7091,2904300.0,6772.7002,74780400.0,908.59,1130800000.0,306.836,102.0125,4.5,97.08,,,
2003-01-06,31.5452,232700.0,3.1361,161700.0,5.1971,64512.0,65.5529,99400.0,0.7036,541584.0,6837.2998,142266300.0,929.01,1435900000.0,307.774,102.01875,4.5,97.34,,,
2003-01-07,31.6037,842600.0,3.1501,945900.0,5.2317,366657.0,65.7675,54400.0,0.7004,685632.0,6802.7998,150351200.0,922.93,1545200000.0,308.712,102.025,4.5,97.42,,,
2003-01-08,31.5452,423900.0,3.1325,301800.0,5.1624,179112.0,65.4814,227400.0,0.6759,1697196.0,6723.1001,145587400.0,909.93,1467600000.0,309.65,102.03125,4.5,97.21,,,


In [384]:
data_transformed.tail()

Unnamed: 0_level_0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-14,72.805,437050.0,56.5,505550.0,35.025,501072.0,109.77,104600.0,39.505,1352527.0,16416.7998,206539800.0,2966.1499,2557020000.0,432.33,136.8,3.95,117.75,80.0,34.0,64.0
2019-10-15,72.6,446100.0,56.59,627000.0,34.92,386557.0,109.3,107100.0,39.68,1290026.0,16418.4004,212333000.0,2995.6799,3340740000.0,432.33,136.8,3.95,117.75,80.0,34.0,64.0
2019-10-16,71.62,716200.0,56.06,435900.0,34.7,805510.0,109.22,170900.0,39.58,1623424.0,16427.1992,165968300.0,2989.6899,3222570000.0,432.33,136.8,3.95,117.75,80.0,34.0,64.0
2019-10-17,71.5,528200.0,56.23,700600.0,35.13,576685.0,109.28,120000.0,39.17,1868090.0,16426.3008,168397700.0,2997.95,3115960000.0,432.33,136.8,3.95,117.75,80.0,34.0,64.0
2019-10-18,71.22,284800.0,55.95,633400.0,34.84,458759.0,108.84,72500.0,39.02,1669638.0,16377.0996,166347800.0,2986.2,3264290000.0,432.33,136.8,3.95,117.75,80.0,34.0,64.0


In [385]:
data_transformed.to_csv('data_transformed.csv')

#### 3. Load the Data

In [386]:
def load_data(data_transformed_filename):
    '''
    save the transformed data to csv file
    '''
    data_loaded = pd.read_csv(data_transformed_filename, parse_dates=['date'], index_col='date')

    return data_loaded

In [387]:
data_loaded = load_data("data_transformed.csv")

In [388]:
data_loaded.head()

Unnamed: 0_level_0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-01-02,31.6037,134900.0,3.1325,101400.0,5.1538,17982.0,65.4457,133000.0,0.7302,296784.0,6740.1001,76406700.0,909.03,1229200000.0,305.898,102.00625,4.5,96.7,,,
2003-01-03,31.3404,48500.0,3.1501,273600.0,5.1797,34761.0,65.367,9300.0,0.7091,2904300.0,6772.7002,74780400.0,908.59,1130800000.0,306.836,102.0125,4.5,97.08,,,
2003-01-06,31.5452,232700.0,3.1361,161700.0,5.1971,64512.0,65.5529,99400.0,0.7036,541584.0,6837.2998,142266300.0,929.01,1435900000.0,307.774,102.01875,4.5,97.34,,,
2003-01-07,31.6037,842600.0,3.1501,945900.0,5.2317,366657.0,65.7675,54400.0,0.7004,685632.0,6802.7998,150351200.0,922.93,1545200000.0,308.712,102.025,4.5,97.42,,,
2003-01-08,31.5452,423900.0,3.1325,301800.0,5.1624,179112.0,65.4814,227400.0,0.6759,1697196.0,6723.1001,145587400.0,909.93,1467600000.0,309.65,102.03125,4.5,97.21,,,


In [389]:
data_loaded.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4356 entries, 2003-01-02 to 2019-10-18
Data columns (total 21 columns):
loblaw_price           4356 non-null float64
loblaw_volume          4356 non-null float64
metro_price            4356 non-null float64
metro_volume           4356 non-null float64
empa_price             4356 non-null float64
empa_volume            4356 non-null float64
gwl_price              4356 non-null float64
gwl_volume             4356 non-null float64
atd_price              4356 non-null float64
atd_volume             4356 non-null float64
tsx_price              4356 non-null float64
tsx_volume             4356 non-null float64
sp500_price            4356 non-null float64
sp500_volume           4356 non-null float64
BCPI                   4356 non-null float64
CPI                    4356 non-null float64
bank_interest          4356 non-null float64
CEER                   4356 non-null float64
trend_grocery_store    4104 non-null float64
trend_loblaws       

In [390]:
data_loaded.describe()

Unnamed: 0,loblaw_price,loblaw_volume,metro_price,metro_volume,empa_price,empa_volume,gwl_price,gwl_volume,atd_price,atd_volume,tsx_price,tsx_volume,sp500_price,sp500_volume,BCPI,CPI,bank_interest,CEER,trend_grocery_store,trend_loblaws,trend_stock
count,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4356.0,4104.0,4104.0,4104.0
mean,37.109403,569422.0,18.940323,719927.0,15.265296,248076.1,73.002794,120993.5,11.782132,2197581.0,12590.821939,186065500.0,1624.011852,3407804000.0,506.435424,118.977643,3.682266,124.289658,51.271107,38.847096,26.519336
std,13.029652,579191.1,15.023725,826194.1,7.693179,284101.9,20.707893,104031.6,12.59911,2227285.0,2530.510741,71232670.0,587.518266,1349572000.0,119.183529,9.740415,1.124192,10.070194,14.764096,8.278884,20.730267
min,16.836,24500.0,2.9861,11400.0,3.8082,2385.0,33.2111,5000.0,0.5754,65598.0,6228.6001,182400.0,676.53,487220000.0,277.38,102.00625,2.25,96.7,34.0,24.0,11.0
25%,26.480875,292625.0,6.716625,372600.0,8.646075,78748.5,56.3876,63800.0,2.166825,1180112.0,11423.9004,146595300.0,1182.6,2566928000.0,407.7325,111.606395,2.9925,116.29,41.314723,35.603488,14.046512
50%,34.1724,455300.0,12.10315,563600.0,14.22745,176817.0,69.6216,96150.0,3.7431,1680761.0,12813.75,184449800.0,1400.035,3400680000.0,498.376,119.85,3.0,123.42,46.0,38.043478,16.608696
75%,48.450775,682000.0,33.40475,860025.0,21.2398,326401.5,92.7288,144325.0,25.90615,2493138.0,14594.17505,221273400.0,2060.9975,4066398000.0,605.2085,126.986379,4.25,133.8325,59.965909,40.627907,25.662285
max,75.77,14825200.0,58.52,31572000.0,37.2337,4790751.0,117.9562,1635000.0,43.7946,65852270.0,16899.6992,858888100.0,3025.8601,11456230000.0,911.94,137.0,6.25,150.48,100.0,100.0,100.0
