# Data Preparation


In this notebook will contain all the process regarding data preparation. Starting transforming the data to pandas dataframes and then profiling the data to have a preview of the problem domain. Then I will start doing some correlations among the datasets and visualize some characteristics before proceeding with the feature engineering. In adition, we will add some moving averages and other trading technical indicators. At the end, we might consider a PCA analysis to reduce the features for better analysis. 

## Import Data and Transforme to Consistent Pandas Data Frames

In [1]:
import numpy as np
import pandas as pd
import pickle
from datetime import datetime
from collections import namedtuple
from IPython.display import display, HTML
import functools
%matplotlib inline
# Optional for interactive
# %matplotlib notebook (watch video for full details)

In [2]:
# Accounting format for floats for pandas.

pd.options.display.float_format = '{:,.2f}'.format

The raw files are in the folder raw_data. We have a total of 8 files for the exchanges buda(Peru, Colombia and Chile) and mercadobitcoin (Brazil). In the following cell I am going to extract the data to pandas dataframes

In [3]:
def dateparse_buda (time_in_secs): 
    return datetime.fromtimestamp(float(time_in_secs)/1000)

In [4]:
df_btc_clp=pd.read_csv('raw_data/buda_btc-clp_20161101_20180312.csv',index_col='datetime',parse_dates=True, date_parser=dateparse_buda)
df_btc_cop=pd.read_csv('raw_data/buda_btc-cop_20161101_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_btc_pen=pd.read_csv('raw_data/buda_btc-pen_20161101_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_eth_btc=pd.read_csv('raw_data/buda_eth-btc_20170701_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_eth_clp=pd.read_csv('raw_data/buda_eth-clp_20170501_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_eth_cop=pd.read_csv('raw_data/buda_eth-cop_20170601_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_eth_pen=pd.read_csv('raw_data/buda_eth-pen_20170601_20180312.csv',index_col='datetime',parse_dates=True,date_parser=dateparse_buda)
df_btc_brl=pd.read_csv('raw_data/mercadobitcoin_BTC_20130612_20180312.csv',index_col='date',parse_dates=True)
Dataset=namedtuple('Dataset','exchange df')

datasets={'btc_clp':Dataset('Buda',df_btc_clp),
          'btc_cop':Dataset('Buda',df_btc_cop),
          'btc_pen':Dataset('Buda',df_btc_pen),
          'eth_btc':Dataset('Buda',df_eth_btc),
          'eth_clp':Dataset('Buda',df_eth_clp),
          'eth_cop':Dataset('Buda',df_eth_cop),
          'eth_pen':Dataset('Buda',df_eth_pen),
          'btc_brl':Dataset('MercadoBitcoin',df_btc_brl)}


In [5]:
def all_datasets(func,datasets ):
    for key,dataset in datasets.items():
        display(HTML('<h3>'+key+'</h3>'))
        print(func.__name__)
        datasets[key]=func(dataset)
    
            

In [6]:
def rename_mercado_bitcoin(dataset):
    
    if dataset.exchange=='MercadoBitcoin': 
        columns_standard={'opening':'open'
                          , 'closing':'close'
                          , 'highest':'high'
                          , 'lowest':'low'}
        dataset.df.rename(columns=columns_standard,inplace=True)
        dataset.df.index.rename('datetime',inplace=True)
    
    return dataset

   
        
all_datasets(rename_mercado_bitcoin,datasets)

rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


rename_mercado_bitcoin


For the exchange buda, there are multiple entries for each day. Therefore it is necesary to resample the data for each day to have a consistant information vs mercado bitcoin.

In [7]:
df_btc_cop.resample(rule='D').count().tail(50)
df_btc_cop['2018-03-11']

Unnamed: 0_level_0,open,high,low,close,volume
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-03-11 05:00:00,24300700.0,25499990.0,24300700.0,24300707.0,0.36
2018-03-11 13:00:00,25499987.0,25796997.0,24900001.0,25796997.0,0.21
2018-03-11 21:00:00,25400000.0,25400000.0,25400000.0,25400000.0,0.05


In [8]:
def first_entry(entry):
    if entry.size==0:
        return None 
    return entry[0]
def last_entry(entry):
    if entry.size==0:
        return None 
    return entry[-1]


In [9]:
def fix_daily(dataset):
    if dataset.exchange=='Buda':
        open_series=dataset.df['open'].resample(rule='D').apply(first_entry)
        close_series=dataset.df['close'].resample(rule='D').apply(last_entry)
        high_series=dataset.df['high'].resample(rule='D').max()
        low_series=dataset.df['low'].resample(rule='D').min()
        volume_series=dataset.df['volume'].resample(rule='D').sum()
        dataframe_daily=pd.DataFrame(open_series)
        dataframe_daily=dataframe_daily.merge(pd.DataFrame(close_series),left_index=True,right_index=True)
        dataframe_daily=dataframe_daily.merge(pd.DataFrame(high_series),left_index=True,right_index=True)
        dataframe_daily=dataframe_daily.merge(pd.DataFrame(low_series),left_index=True,right_index=True)
        dataframe_daily=dataframe_daily.merge(pd.DataFrame(volume_series),left_index=True,right_index=True)
        dataset=Dataset('Buda',dataframe_daily )
    return dataset



In [10]:
all_datasets(fix_daily,datasets)



fix_daily


fix_daily


fix_daily


fix_daily


fix_daily


fix_daily


fix_daily


fix_daily


### Fill Missing Values

For some days there are missing values therefore we are going to apply the forward fill method of pandas to fill the closing price. The open, low and high will be assigned to the close value of the next prevoius day with information available. For the volume it will be set as 0.

In [11]:

datasets['btc_cop'].df.head(10)

Unnamed: 0_level_0,open,close,high,low,volume
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-10-31,2099950.0,2100000.0,2100000.0,2099950.0,2.28
2016-11-01,2115664.74,2100541.0,2115664.74,2100541.0,0.54
2016-11-02,2200000.0,2200000.0,2200000.0,2200000.0,0.86
2016-11-03,2250000.0,2038008.07,2250000.0,2026909.28,4.66
2016-11-04,2173361.5,2192224.31,2192224.31,2001026.82,2.76
2016-11-05,,,,,
2016-11-06,,,,,
2016-11-07,,,,,
2016-11-08,2040100.23,2030280.91,2041656.9,2021216.07,3.52
2016-11-09,2211234.0,2210750.0,2231104.84,2210000.0,2.53


In [12]:
def fill_nulls(dataset):
    dataset.df['close'].fillna(method='ffill', inplace=True)
    dataset.df['close'].fillna(method='bfill', inplace=True)
    dataset.df['open'].fillna(dataset.df['close'], inplace=True)
    dataset.df['high'].fillna(dataset.df['close'], inplace=True)
    dataset.df['low'].fillna(dataset.df['close'], inplace=True)
    dataset.df['volume'].fillna(0, inplace=True)
    return dataset

all_datasets(fill_nulls,datasets)

datasets['btc_cop'].df.head(10)

fill_nulls


fill_nulls


fill_nulls


fill_nulls


fill_nulls


fill_nulls


fill_nulls


fill_nulls


Unnamed: 0_level_0,open,close,high,low,volume
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-10-31,2099950.0,2100000.0,2100000.0,2099950.0,2.28
2016-11-01,2115664.74,2100541.0,2115664.74,2100541.0,0.54
2016-11-02,2200000.0,2200000.0,2200000.0,2200000.0,0.86
2016-11-03,2250000.0,2038008.07,2250000.0,2026909.28,4.66
2016-11-04,2173361.5,2192224.31,2192224.31,2001026.82,2.76
2016-11-05,2192224.31,2192224.31,2192224.31,2192224.31,0.0
2016-11-06,2192224.31,2192224.31,2192224.31,2192224.31,0.0
2016-11-07,2192224.31,2192224.31,2192224.31,2192224.31,0.0
2016-11-08,2040100.23,2030280.91,2041656.9,2021216.07,3.52
2016-11-09,2211234.0,2210750.0,2231104.84,2210000.0,2.53


### Profiling Data

Describe each dataset of the raw data.

In [13]:
#Index(['lowest', 'volume', 'amount', 'avg_price', 'opening', 'closing',
#       'highest', 'quantity'],
#      dtype='object')
def describe_dataset( dataset):
    display(HTML('<H1>Range Dates</H1>'))
    display(HTML('<H3>Min: '+dataset.df.index.min().strftime('%Y-%m-%d')+'</H3>'))
    display(HTML('<H3>Min: '+dataset.df.index.max().strftime('%Y-%m-%d')+'</H3>'))
    display(HTML('<H1>Min</H1>'))
    print(dataset.df.idxmin())
    display(HTML('<H1>Max</H1>'))
    print(dataset.df.idxmax())    
    display(HTML(dataset.df.describe().to_html()))
    return  dataset 
all_datasets(describe_dataset,datasets)

describe_dataset


open     2016-11-04
close    2016-11-12
high     2016-11-07
low      2016-11-03
volume   2016-12-31
dtype: datetime64[ns]


open     2017-12-18
close    2017-12-17
high     2017-12-07
low      2017-12-17
volume   2017-03-10
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,497.0,497.0,497.0,497.0,497.0
mean,3031204.69,3042093.59,3185056.81,2859642.35,34.9
std,3012846.52,3020119.25,3199563.2,2793238.47,29.0
min,465164.77,463059.45,472933.11,449879.51,0.0
25%,740000.0,740000.0,774500.0,717653.89,15.81
50%,1799000.0,1797000.0,1860000.0,1726270.0,29.2
75%,4549808.11,4601423.13,4875000.0,4280101.0,43.27
max,13794500.0,13794997.0,14200000.0,13000000.0,233.56


describe_dataset


open     2016-11-08
close    2016-11-08
high     2016-11-08
low      2017-07-12
volume   2016-11-05
dtype: datetime64[ns]


open     2017-12-17
close    2017-12-16
high     2017-05-05
low      2017-12-16
volume   2017-10-10
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,497.0,497.0,497.0,497.0,497.0
mean,12878911.58,12956022.67,33782208.3,12165157.75,4.42
std,12787751.52,12873744.45,448155054.3,12026074.87,4.66
min,2040100.23,2030280.91,2041656.9,30200.0,0.0
25%,3185544.16,3145147.15,3370685.23,3000487.6,0.96
50%,7499999.0,7419000.0,7838507.9,7010002.0,2.83
75%,19000000.0,19202001.0,20582980.0,18001000.0,6.49
max,51253009.0,51252704.0,10000000000.0,48400000.0,31.25


describe_dataset


open     2017-09-30
close    2017-09-29
high     2017-09-30
low      2017-10-24
volume   2017-08-15
dtype: datetime64[ns]


open     2017-12-07
close    2017-12-06
high     2017-12-06
low      2017-12-24
volume   2018-02-05
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,210.0,210.0,210.0,210.0,210.0
mean,29861.78,29952.57,33699.85,26675.23,0.69
std,23458.48,23680.31,31056.47,13270.35,1.14
min,9500.0,9500.0,9500.0,1.12,0.0
25%,15134.38,15221.1,16012.48,14449.05,0.08
50%,27008.82,27089.53,28837.22,25949.94,0.37
75%,37175.36,37090.65,39525.93,35182.31,0.81
max,299997.0,299997.0,299997.0,58997.27,11.3


describe_dataset


open     2017-12-07
close    2017-11-27
high     2017-11-30
low      2017-11-06
volume   2017-07-23
dtype: datetime64[ns]


open     2018-02-01
close    2018-01-31
high     2017-12-21
low      2018-01-31
volume   2017-12-07
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,240.0,240.0,240.0,240.0,240.0
mean,0.07,0.07,0.07,0.07,12.6
std,0.02,0.02,0.02,0.02,16.69
min,0.02,0.01,0.02,0.0,0.0
25%,0.05,0.05,0.06,0.05,1.74
50%,0.07,0.07,0.08,0.07,5.15
75%,0.08,0.08,0.09,0.08,16.97
max,0.12,0.11,0.17,0.11,84.71


describe_dataset


open     2017-07-16
close    2017-07-15
high     2017-07-16
low      2017-07-16
volume   2018-02-28
dtype: datetime64[ns]


open     2018-01-15
close    2018-01-14
high     2018-01-10
low      2018-01-14
volume   2017-12-13
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,252.0,252.0,252.0,252.0,252.0
mean,339486.84,340010.21,360639.79,317707.01,183.68
std,198466.29,198342.36,212333.23,181707.91,169.3
min,100867.67,100869.98,128345.0,100524.23,0.34
25%,194160.06,195000.0,201375.0,189000.0,70.19
50%,223690.0,225000.0,232000.0,210556.05,135.55
75%,517356.25,519059.0,543440.18,493252.25,231.51
max,896000.0,896000.0,934999.0,842800.0,1043.09


describe_dataset


open     2017-09-15
close    2017-09-25
high     2017-09-14
low      2017-11-02
volume   2017-09-30
dtype: datetime64[ns]


open     2018-01-15
close    2018-01-13
high     2018-01-10
low      2018-01-14
volume   2018-01-17
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,213.0,213.0,213.0,213.0,213.0
mean,1649724.04,1651486.47,1810637.36,1505734.45,30.85
std,918667.78,926660.79,1066928.95,854998.13,31.15
min,701420.0,510001.0,794626.32,98000.0,0.01
25%,854884.0,854884.0,912386.0,801000.0,9.85
50%,1200012.0,1249999.98,1388000.0,1100101.0,21.74
75%,2328000.0,2349994.99,2520000.0,2178000.0,41.01
max,4100000.0,4068448.91,6000000.0,3854163.93,184.19


describe_dataset


open     2017-10-24
close    2017-11-06
high     2017-09-14
low      2017-10-24
volume   2017-08-19
dtype: datetime64[ns]


open     2017-12-23
close    2017-12-15
high     2018-01-21
low      2017-12-23
volume   2018-02-15
dtype: datetime64[ns]


Unnamed: 0,open,close,high,low,volume
count,208.0,208.0,208.0,208.0,208.0
mean,1956.52,1994.46,2219.63,1853.97,3.08
std,1062.31,1105.04,1391.74,996.39,5.0
min,300.0,654.01,830.0,1.01,0.0
25%,1028.31,1047.88,1050.75,1017.65,0.1
50%,1463.06,1447.0,1564.5,1308.92,0.93
75%,2779.73,2858.75,3104.41,2717.64,3.98
max,4994.0,4997.0,8000.0,4994.0,37.77


describe_dataset


low         2018-02-16
volume      2013-08-18
amount      2013-08-18
avg_price   2013-07-09
open        2013-07-04
close       2013-07-03
high        2013-07-10
quantity    2013-08-18
dtype: datetime64[ns]


low         2017-12-18
volume      2017-12-07
amount      2017-12-07
avg_price   2017-12-18
open        2017-12-19
close       2017-12-18
high        2017-12-17
quantity    2017-11-29
dtype: datetime64[ns]


Unnamed: 0,low,volume,amount,avg_price,open,close,high,quantity
count,1681.0,1681.0,1681.0,1681.0,1681.0,1681.0,1681.0,1681.0
mean,5167.44,2147448.68,1290.21,5432.23,5421.96,5437.67,5641.17,158.6
std,10082.28,7653198.97,2837.94,10674.96,10672.42,10676.18,11164.1,196.91
min,0.0,24.55,2.0,200.24,193.0,190.01,209.0,0.1
25%,936.14,59930.05,161.0,962.42,959.7,960.0,985.0,52.42
50%,1630.05,131661.52,306.0,1690.37,1695.83,1690.11,1729.5,102.75
75%,2881.0,487363.8,839.0,2945.76,2949.58,2960.0,3046.0,181.59
max,67500.0,108023418.47,33169.0,68942.87,68800.0,69099.0,69950.0,2629.6


### Adding Technical Indicators on the Closing Price

Add technical Indicators on the closing prices. Bollinger Bands®, Exponential Moving Average (EMA), Moving Average Converge Divergence (MACD), Heikin-Ashi and Daily Returns.

In [14]:
def adding_technical_indicators(dataset):
    dataset.df['EWMA26']=dataset.df['close'].ewm(span=26).mean()
    dataset.df['EWMA12']=dataset.df['close'].ewm(span=12).mean()
    dataset.df['EWMA9']=dataset.df['close'].ewm(span=9).mean()
    dataset.df['MACD']=dataset.df['EWMA26']-dataset.df['EWMA12']
    dataset.df['Returns']=dataset.df['close'].pct_change(1)
    dataset.df['Bollinger Upper']=dataset.df['EWMA12']+2*dataset.df['close'].ewm(span=12).std()
    dataset.df['Bollinger Lower']=dataset.df['EWMA12']-2*dataset.df['close'].ewm(span=12).std()
    dataset.df['Heiking_Close']=(dataset.df['close']+dataset.df['high']+dataset.df['low']+dataset.df['open'])/4
    dataset.df['Heiking High']=dataset.df[['close','high','open']].max(axis=1)
    dataset.df['Heiking Low']=dataset.df[['close','low','open']].min(axis=1)
    dataset.df['Heiking Open']=(dataset.df['close'].shift(1)+dataset.df['open'].shift(1))/2
    return dataset

all_datasets(adding_technical_indicators,datasets)
datasets['btc_cop'].df.head(10)


    
    
    

adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


adding_technical_indicators


Unnamed: 0_level_0,open,close,high,low,volume,EWMA26,EWMA12,EWMA9,MACD,Returns,Bollinger Upper,Bollinger Lower,Heiking_Close,Heiking High,Heiking Low,Heiking Open
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2016-10-31,2099950.0,2100000.0,2100000.0,2099950.0,2.28,2100000.0,2100000.0,2100000.0,0.0,,,,2099975.0,2100000.0,2099950.0,
2016-11-01,2115664.74,2100541.0,2115664.74,2100541.0,0.54,2100280.9,2100293.04,2100300.56,-12.14,0.0,2101058.13,2099527.95,2108102.87,2115664.74,2100541.0,2099975.0
2016-11-02,2200000.0,2200000.0,2200000.0,2200000.0,0.86,2136109.01,2139208.69,2141160.98,-3099.68,0.05,2258904.21,2019513.17,2200000.0,2200000.0,2200000.0,2108102.87
2016-11-03,2250000.0,2038008.07,2250000.0,2026909.28,4.66,2108684.27,2107263.61,2106217.58,1420.66,-0.07,2251040.87,1963486.35,2141229.34,2250000.0,2026909.28,2200000.0
2016-11-04,2173361.5,2192224.31,2192224.31,2001026.82,2.76,2128057.55,2130347.13,2131802.64,-2289.58,0.08,2276593.2,1984101.06,2139709.24,2192224.31,2001026.82,2144004.04
2016-11-05,2192224.31,2192224.31,2192224.31,2192224.31,0.0,2140909.64,2145386.54,2148180.27,-4476.9,0.0,2283273.96,2007499.12,2192224.31,2192224.31,2192224.31,2182792.91
2016-11-06,2192224.31,2192224.31,2192224.31,2192224.31,0.0,2150035.69,2155838.22,2159326.64,-5802.53,0.0,2283114.71,2028561.73,2192224.31,2192224.31,2192224.31,2192224.31
2016-11-07,2192224.31,2192224.31,2192224.31,2192224.31,0.0,2156833.32,2163431.44,2167232.57,-6598.12,0.0,2280070.3,2046792.57,2192224.31,2192224.31,2192224.31,2192224.31
2016-11-08,2040100.23,2030280.91,2041656.9,2021216.07,3.52,2138075.48,2137089.53,2135596.06,985.95,-0.07,2291089.84,1983089.21,2033313.53,2041656.9,2021216.07,2192224.31
2016-11-09,2211234.0,2210750.0,2231104.84,2210000.0,2.53,2148103.85,2151048.16,2152434.9,-2944.3,0.09,2302113.17,1999983.15,2215772.21,2231104.84,2210000.0,2035190.57


In [15]:
datasets['btc_brl'].df.head(10)


Unnamed: 0_level_0,low,volume,amount,avg_price,open,close,high,quantity,EWMA26,EWMA12,EWMA9,MACD,Returns,Bollinger Upper,Bollinger Lower,Heiking_Close,Heiking High,Heiking Low,Heiking Open
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2015-08-26,832.5,90195.44,305,871.49,851.44,870.0,900.0,103.5,870.0,870.0,870.0,0.0,,,,863.48,900.0,832.5,
2013-06-12,249.0,2799.69,11,256.45,249.0,265.0,275.0,10.92,555.87,542.29,533.89,13.57,-0.7,1397.89,-313.31,259.5,275.0,249.0,860.72
2013-06-13,259.0,2830.41,16,266.4,265.0,269.0,269.0,10.62,452.8,435.63,425.33,17.17,0.02,1101.33,-230.07,265.5,269.0,259.0,257.0
2013-06-14,245.0,8694.71,35,255.42,267.0,250.0,268.0,34.04,396.1,377.03,365.93,19.07,-0.07,934.12,-180.06,257.5,268.0,245.0,267.0
2013-06-15,246.01,4481.41,8,256.87,250.0,246.01,259.99,17.45,361.3,341.43,330.26,19.86,-0.02,820.61,-137.74,250.5,259.99,246.01,258.5
2013-06-16,246.01,427.69,14,256.22,246.01,252.0,257.43,1.67,339.41,319.7,309.05,19.71,0.02,737.31,-97.92,250.36,257.43,246.01,248.0
2013-06-17,252.0,3628.96,12,254.52,257.43,254.96,257.43,14.26,324.39,305.25,295.36,19.14,0.01,673.31,-62.81,255.45,257.43,252.0,249.0
2013-06-18,246.11,7498.59,22,256.79,254.95,263.0,263.0,29.2,314.5,296.43,287.58,18.06,0.03,622.95,-30.08,256.76,263.0,246.11,256.19
2013-06-19,251.01,1137.09,18,262.96,251.01,260.0,264.98,4.32,306.42,289.23,281.21,17.19,-0.01,581.35,-2.89,256.75,264.98,251.01,258.98
2013-06-20,260.0,7253.13,28,267.51,263.0,269.0,269.0,27.11,301.26,285.39,278.47,15.86,0.03,547.57,23.21,265.25,269.0,260.0,255.5


### Save Data for Plotting

Save the data for plotting and profile the information graphically. 

In [16]:
def save_datasets(datasets,filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(datasets, output, pickle.HIGHEST_PROTOCOL)
    
save_datasets(datasets, 'data_for_plotting/datasetsplotting.pkl')