# T81-558: Applications of Deep Neural Networks
**Class 6: Preprocessing.**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Why is Preprocessing Necessary

The feature vector, the input to a model (such as a neural network), must be completely numeric. Converting non-numeric data into numeric is one major component of preprocessing.  It is also often important to preprocess numeric values.  Scikit-learn provides a large number of preprocessing functions: 

* [Scikit-Learn Preprocessing](http://scikit-learn.org/stable/modules/preprocessing.html)

However, this is just the beginning.  The success of your neural network's predictions is often directly tied to the data representation.

# Preprocessing Functions

The following functions will be used in conjunction with TensorFlow to help preprocess the data.  Some of these were [covered previously](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class2_tensor_flow.ipynb), some are new.

It is okay to just use them. For better understanding, try to see how they work.

These functions allow you to build the feature vector for a neural network. Consider the following:

* Predictors/Inputs 
    * Fill any missing inputs with the median for that column.  Use **missing_median**.
    * Encode textual/categorical values with **encode_text_dummy** or more creative means (see last part of this class session). 
    * Encode numeric values with **encode_numeric_zscore**, **encode_numeric_binary** or **encode_numeric_range**. 
    * Consider removing outliers: **remove_outliers**
* Output
    * Discard rows with missing outputs.
    * Encode textual/categorical values with **encode_text_index**. 
    * Do not encode output numeric values.
    * Consider removing outliers: **remove_outliers**
* Produce final feature vectors (x) and expected output (y) with **to_xy**. 

# Complete Set of Preprocessing Functions

In [3]:
import pandas as pd
import sklearn.preprocessing
from sklearn.feature_extraction.text import TfidfTransformer

# Encode text values to dummie variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)    
def encode_text_dummy(df,name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name,x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)
    
# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue)    
def encode_text_index(df,name): 
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_
                
# Encode a numeric column as zscores    
def encode_numeric_zscore(df,name,mean=None,sd=None):
    if mean is None:
        mean = df[name].mean()
        
    if sd is None:
        sd = df[name].std()
        
    df[name] = (df[name]-mean)/sd
    
# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)
    
# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name]-df[name].mean())>=(sd*df[name].std()))]
    df.drop(drop_rows,axis=0,inplace=True)
    
# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low =-1, normalized_high =1, 
                         data_low=None, data_high=None):
    
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])
    
    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
                * (normalized_high - normalized_low) + normalized_low

# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df,target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)

    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(target_type, '__iter__') else target_type
    
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.int32)
    else:
        # Regression
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.float32)
    

# Analyzing a Dataset

The following script can be used to give a high level overview of how a dataset appears.

In [4]:
ENCODING = 'utf-8'

def expand_categories(values):
    result = []
    s = values.value_counts()
    t = float(len(values))
    for v in s.index:
        result.append("{}:{}%".format(v,round(100*(s[v]/t),2)))
    return "[{}]".format(",".join(result))
        
def analyze(filename):
    print()
    print("Analyzing: {}".format(filename))
    df = pd.read_csv(filename,encoding=ENCODING)
    cols = df.columns.values
    total = float(len(df))

    print("{} rows".format(int(total)))
    for col in cols:
        uniques = df[col].unique()
        unique_count = len(uniques)
        if unique_count>100:
            print("** {}:{} ({}%)".format(col,unique_count,int(((unique_count)/total)*100)))
        else:
            print("** {}:{}".format(col,expand_categories(df[col])))
            expand_categories(df[col])

The analyze script can be run on the MPG dataset.

In [5]:
import tensorflow.contrib.learn as skflow
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
analyze(filename_read)


Analyzing: ./data/auto-mpg.csv
398 rows
** mpg:129 (32%)
** cylinders:[4:51.26%,8:25.88%,6:21.11%,3:1.01%,5:0.75%]
** displacement:[97.0:5.28%,350.0:4.52%,98.0:4.52%,250.0:4.27%,318.0:4.27%,140.0:4.02%,400.0:3.27%,225.0:3.27%,91.0:3.02%,121.0:2.76%,302.0:2.76%,232.0:2.76%,151.0:2.51%,120.0:2.26%,200.0:2.01%,351.0:2.01%,85.0:2.01%,90.0:2.01%,231.0:2.01%,122.0:1.76%,105.0:1.76%,304.0:1.76%,79.0:1.51%,156.0:1.51%,119.0:1.51%,258.0:1.26%,89.0:1.26%,107.0:1.26%,108.0:1.26%,135.0:1.26%,360.0:1.01%,86.0:1.01%,134.0:1.01%,116.0:1.01%,112.0:1.01%,305.0:1.01%,70.0:0.75%,113.0:0.75%,455.0:0.75%,307.0:0.75%,168.0:0.75%,198.0:0.75%,146.0:0.75%,260.0:0.75%,173.0:0.75%,429.0:0.75%,199.0:0.5%,141.0:0.5%,163.0:0.5%,262.0:0.5%,71.0:0.5%,440.0:0.5%,383.0:0.5%,88.0:0.25%,97.5:0.25%,340.0:0.25%,144.0:0.25%,390.0:0.25%,83.0:0.25%,96.0:0.25%,80.0:0.25%,78.0:0.25%,76.0:0.25%,72.0:0.25%,81.0:0.25%,104.0:0.25%,267.0:0.25%,100.0:0.25%,101.0:0.25%,110.0:0.25%,111.0:0.25%,183.0:0.25%,181.0:0.25%,114.0:0.25%,115.0

# Preprocessing Examples

The above preprocessing functions can be used in a variety of ways.

In [6]:
import tensorflow.contrib.learn as skflow
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# create feature vector
missing_median(df, 'horsepower')
df.drop('name',1,inplace=True)
encode_numeric_zscore(df, 'horsepower')
encode_numeric_zscore(df, 'weight')
encode_numeric_range(df, 'cylinders',0,1)
encode_numeric_range(df, 'displacement',0,1)
encode_numeric_zscore(df, 'acceleration')
#encode_numeric_binary(df,'mpg',20)
#df['origin'] = df['origin'].astype(str)
#encode_text_tfidf(df, 'origin')

# Drop outliers in horsepower
print("Length before MPG outliers dropped: {}".format(len(df)))
remove_outliers(df,'mpg',2)
print("Length after MPG outliers dropped: {}".format(len(df)))

print(df)


Length before MPG outliers dropped: 398
Length after MPG outliers dropped: 388
      mpg  cylinders  displacement  horsepower    weight  acceleration  year  \
0    18.0        1.0      0.617571    0.672271  0.630077     -1.293870    70   
1    15.0        1.0      0.728682    1.587959  0.853259     -1.475181    70   
2    18.0        1.0      0.645995    1.195522  0.549778     -1.656492    70   
3    16.0        1.0      0.609819    1.195522  0.546236     -1.293870    70   
4    17.0        1.0      0.604651    0.933897  0.565130     -1.837804    70   
5    15.0        1.0      0.932817    2.451322  1.618455     -2.019115    70   
6    14.0        1.0      0.997416    3.026898  1.633806     -2.381737    70   
7    14.0        1.0      0.961240    2.896085  1.584210     -2.563048    70   
8    14.0        1.0      1.000000    3.157710  1.717647     -2.019115    70   
9    15.0        1.0      0.832041    2.242022  1.038654     -2.563048    70   
10   15.0        1.0      0.813953    1.7

# Other Examples: Dealing with Addresses

Addresses can be difficult to encode into a neural network.  There are many different approaches, and you must consider how you can transform the address into something more meaningful.  Map coordinates can be a good approach.  [Latitude and longitude](https://en.wikipedia.org/wiki/Geographic_coordinate_system) can be a useful encoding.  Thanks to the power of the Internet, it is relatively easy to transform an address into its latitude and longitude values.  The following code determines the coordinates of [Washington University](https://wustl.edu/):

In [1]:
import requests

address = "1 Brookings Dr, St. Louis, MO 63130"

response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address='+address)

resp_json_payload = response.json()

print(resp_json_payload['results'][0]['geometry']['location'])

{'lat': 38.6470653, 'lng': -90.30263459999999}


If latitude and longitude are simply fed into the neural network as two features, they might not be overly helpful.  These two values would allow your neural network to cluster locations on a map.  Sometimes cluster locations on a map can be useful.  Consider the percentage of the population that smokes in the USA by state:

![Smokers by State](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_6_smokers.png "Smokers by State")

The above map shows that certian behaviors, like smoking, can be clustered by global region. 

However, often you will want to transform the coordinates into distances.  It is reasonably easy to estimate the distance between any two points on Earth by using the [great circle distance](https://en.wikipedia.org/wiki/Great-circle_distance) between any two points on a sphere:

The following code implements this formula:

$\Delta\sigma=\arccos\bigl(\sin\phi_1\cdot\sin\phi_2+\cos\phi_1\cdot\cos\phi_2\cdot\cos(\Delta\lambda)\bigr)$

$d = r \, \Delta\sigma$


In [7]:
from math import sin, cos, sqrt, atan2, radians

# Distance function
def distance_lat_lng(lat1,lng1,lat2,lng2):
    # approximate radius of earth in km
    R = 6373.0

    # degrees to radians (lat/lon are in degrees)
    lat1 = radians(lat1)
    lng1 = radians(lng1)
    lat2 = radians(lat2)
    lng2 = radians(lng2)

    dlng = lng2 - lng1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlng / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    return R * c

# Find lat lon for address
def lookup_lat_lng(address):
    response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address='+address)
    json = response.json()
    if len(json['results']) == 0:
        print("Can't find: {}".format(address))
        return 0,0
    map = json['results'][0]['geometry']['location']
    return map['lat'],map['lng']


# Distance between two locations

import requests

address1 = "1 Brookings Dr, St. Louis, MO 63130" 
address2 = "3301 College Ave, Fort Lauderdale, FL 33314"

lat1, lng1 = lookup_lat_lng(address1)
lat2, lng2 = lookup_lat_lng(address2)

print("Distance, St. Louis, MO to Ft. Lauderdale, FL: {} km".format(
        distance_lat_lng(lat1,lng1,lat2,lng2)))



Distance, St. Louis, MO to Ft. Lauderdale, FL: 1685.0833252717607 km


Distances can be useful to encode addresses as.  You must consider what distance might be useful for your dataset.  Consider:

* Distance to major metropolitan area
* Distance to competitor
* Distance to distribution center
* Distance to retail outlet

The following code calculates the distance between 10 universities and washu:

In [26]:
# Encoding other universities by their distance to Washington University

schools = [
    ["Princeton University, Princeton, NJ 08544", 'Princeton'],
    ["Massachusetts Hall, Cambridge, MA 02138", 'Harvard'],
    ["5801 S Ellis Ave, Chicago, IL 60637", 'University of Chicago'],
    ["Yale, New Haven, CT 06520", 'Yale'],
    ["116th St & Broadway, New York, NY 10027", 'Columbia University'],
    ["450 Serra Mall, Stanford, CA 94305", 'Stanford'],
    ["77 Massachusetts Ave, Cambridge, MA 02139", 'MIT'],
    ["Duke University, Durham, NC 27708", 'Duke University'],
    ["University of Pennsylvania, Philadelphia, PA 19104", 'University of Pennsylvania'],
    ["Johns Hopkins University, Baltimore, MD 21218", 'Johns Hopkins']
]

lat1, lng1 = lookup_lat_lng("1 Brookings Dr, St. Louis, MO 63130")

for address, name in schools:
    lat2,lng2 = lookup_lat_lng(address)
    dist = distance_lat_lng(lat1,lng1,lat2,lng2)
    print("School '{}', distance to wustl is: {}".format(name,dist))


School 'Princeton', distance to wustl is: 1354.209708261112
School 'Harvard', distance to wustl is: 1670.48400266576
School 'University of Chicago', distance to wustl is: 418.0768183943189
School 'Yale', distance to wustl is: 1504.9478116980558
School 'Columbia University', distance to wustl is: 1021.5557486863092
School 'Stanford', distance to wustl is: 2781.0358215314873
School 'MIT', distance to wustl is: 1671.8200768854172
School 'Duke University', distance to wustl is: 1047.4669155948627
School 'University of Pennsylvania', distance to wustl is: 1306.6967081436705
School 'Johns Hopkins', distance to wustl is: 1185.939948468073


# Other Examples: Bag of Words

The Bag of Words algorithm is a common means of encoding strings. (Harris, 1954) Each input represents the count of one particular word. The entire input vector would contain one value for each unique word. Consider the following strings.

```
Of Mice and Men
Three Blind Mice
Blind Man’s Bluff
Mice and More Mice
```

We have the following unique words. This is our “dictionary.”

```
Input 0 : and
Input 1 : blind
Input 2 : bluff
Input 3 : man’s
Input 4 : men
Input 5 : mice
Input 6 : more
Input 7 : of
Input 8 : three
```

The four lines above would be encoded as follows.

```
Of Mice and Men [ 0 4 5 7 ]
Three Blind Mice [ 1 5 8 ]
Blind Man ’ s Bl u f f [ 1 2 3 ]
Mice and More Mice [ 0 5 6 ]
```

Of course we have to fill in the missing words with zero, so we end up with
the following.

* Of Mice and Men [ 1 , 0 , 0 , 0 , 1 , 1 , 0 , 1 , 0 ]
* Three Blind Mice [ 0 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 1 ]
* Blind Man’s Bluff [ 0 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 ]
* Mice and More Mice [ 1 , 0 , 0 , 0 , 0 , 2 , 1 , 0 , 0 ]

Notice that we now have a consistent vector length of nine. Nine is the total
number of words in our “dictionary”. Each component number in the vector is
an index into our dictionary of available words. At each vector component is
stored a count of the number of words for that dictionary entry. Each string
will usually contain only a small subset of the dictionary. As a result, most of
the vector values will be zero.

As you can see, one of the most difficult aspects of machine learning programming
is translating your problem into a fixed-length array of floating point
numbers. The following section shows how to translate several examples.


* [CountVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html)

In [40]:
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    'This is the first document.',
    'This is the second second document.',
    'And the third one.',
    'Is this the first document?']

vectorizer = CountVectorizer(min_df=1)

vectorizer.fit(corpus)

print("Mapping")
print(vectorizer.vocabulary_)

print()
print("Encoded")
x = vectorizer.transform(corpus)
print(x.toarray())

Mapping
{'third': 7, 'this': 8, 'the': 6, 'first': 2, 'document': 1, 'second': 5, 'one': 4, 'is': 3, 'and': 0}

Encoded
[[0 1 1 1 0 0 1 0 1]
 [0 1 0 1 0 2 1 0 1]
 [1 0 0 0 1 0 1 1 0]
 [0 1 1 1 0 0 1 0 1]]


In [27]:
from sklearn.feature_extraction.text import CountVectorizer

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

corpus = df['name']

vectorizer = CountVectorizer(min_df=1)

vectorizer.fit(corpus)

print("Mapping")
print(vectorizer.vocabulary_)

print()
print("Encoded")
x = vectorizer.transform(corpus)
print(x.toarray())

print(len(vectorizer.vocabulary_))

# reverse lookup for columns
bag_cols = [0] * len(vectorizer.vocabulary_)
for i,key in enumerate(vectorizer.vocabulary_):
    bag_cols[i] = key


Mapping
{'99e': 56, 'gran': 151, 'c20': 77, 'valiant': 284, 'peugeot': 215, 'concours': 104, 'ciera': 97, 'vokswagen': 288, 'new': 206, '100': 1, 'liftback': 176, 'tc3': 271, 'arrow': 64, 'beetle': 70, 'premier': 223, 'matador': 190, 'b210': 69, 'skylark': 251, 'seville': 248, '2000': 21, 'volvo': 291, 'nissan': 208, 'astro': 66, 'dart': 121, 'luxus': 181, 'rabbit': 225, 'rampage': 226, 'capri': 80, 'cvcc': 118, 'grand': 153, 'c10': 76, 'v8': 283, 'se': 245, 'tr7': 278, 'camaro': 79, 'lemans': 174, '304': 35, 'sunbird': 265, 'zx': 299, '124b': 9, 'classic': 100, 'ford': 143, 'concord': 103, 'nova': 209, 'challenger': 88, 'supreme': 267, 'air': 60, 'chevy': 95, 'accord': 59, 'prix': 224, 'gx': 158, '5000s': 46, 'lebaron': 172, 'chevette': 92, 'aspen': 65, 'wagon': 293, 'audi': 67, 'cressida': 112, '244dl': 28, 'hatchback': 160, '1500': 16, '280s': 32, '131': 13, 'granada': 152, 'cuda': 115, 'woody': 294, 'fiesta': 141, 'thunderbird': 273, 'colt': 102, '111': 3, 'starlet': 261, 'cordoba'

In [32]:
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestRegressor

#x = x.toarray() #.as_matrix()
y = df['mpg'].as_matrix()

# Build a forest and compute the feature importances
forest = RandomForestRegressor(n_estimators=50,
                              random_state=0, verbose = True)
forest.fit(x, y)
importances = forest.feature_importances_
std = np.std([tree.feature_importances_ for tree in forest.estimators_],
             axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(x.shape[1]):
    print("{}. {} ({})".format(f + 1, bag_cols[f], importances[indices[f]]))


Feature ranking:
1. 99e (0.069858186873746)
2. gran (0.0657962678049988)
3. c20 (0.051008883870636394)
4. valiant (0.04581221475540706)
5. peugeot (0.04198230223754908)
6. concours (0.040399764416811264)
7. ciera (0.032608231867797544)
8. vokswagen (0.028083396361630455)
9. new (0.025365169053742708)
10. 100 (0.020822620576517337)
11. liftback (0.02065610924024198)
12. tc3 (0.019690892181255654)
13. arrow (0.019012669846068458)
14. beetle (0.01750032394359091)
15. premier (0.014938020006637028)
16. matador (0.014369710481955338)
17. b210 (0.01252127394359739)
18. skylark (0.010751429685292812)
19. seville (0.010702428667434061)
20. 2000 (0.010542674460379622)
21. volvo (0.01025593869126505)
22. nissan (0.00986759162389823)
23. astro (0.009133861291584486)
24. dart (0.008772300799934196)
25. luxus (0.008600514953250088)
26. rabbit (0.007821470049179373)
27. rampage (0.007294524170328526)
28. capri (0.007132502583116468)
29. cvcc (0.007099310584072058)
30. grand (0.007078688726990171)
31

[Parallel(n_jobs=1)]: Done  49 tasks       | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  50 out of  50 | elapsed:    0.3s finished


# Other Examples: Time Series

Time series data will need to be encoded for a regular feedforward neural network.  In a few classes we will see how to use a recurrent neural network to find patterns over time.  For now, we will encode the series into input neurons.

Financial forecasting is a very popular form of temporal algorithm. A temporal algorithm is one that accepts input for values that range over time. If the algorithm supports short term memory (internal state) then ranges over time are supported automatically. If your algorithm does not have an internal state then you should use an input window and a prediction window. Most algorithms do not have an internal state. To see how to use these windows, consider if you would like the algorithm to predict the stock market. You begin with the closing price for a stock over several days:

```
Day 1 : $45
Day 2 : $47
Day 3 : $48
Day 4 : $40
Day 5 : $41
Day 6 : $43
Day 7 : $45
Day 8 : $57
Day 9 : $50
Day 10 : $41
```

The first step is to normalize the data. This is necessary whether your algorithm has internal state or not. To normalize, we want to change each number into the percent movement from the previous day. For example, day 2 would become 0.04, because there is a 4% difference between $45 and $47. Once you perform this calculation for every day, the data set will look like the following:

```
Day 2 : 0. 04
Day 3 : 0. 02
Day 4:−0.16
Day 5 : 0. 02
Day 6 : 0. 04
Day 7 : 0. 04
Day 8 : 0. 04
Day 9:−0.12
Day 10:−0.18
```

In order to create an algorithm that will predict the next day’s values, we need to think about how to encode this data to be presented to the algorithm. The encoding depends on whether the algorithm has an internal state. The internal state allows the algorithm to use the last few values inputted to help establish trends.

Many machine learning algorithms have no internal state. If this is the case, then you will typically use a sliding window algorithm to encode the data. To do this, we use the last three prices to predict the next one. The inputs would be the last three-day prices, and the output would be the fourth day. The above data could be organized in the following way to provide training data.

These cases specified the ideal output for the given inputs:

```
[ 0.04 , 0.02 , −0.16 ] −> 0.02
[ 0.02 , −0.16 , 0.02 ] −> 0.04
[ −0.16 , 0.02 , 0.04 ] −> 0.04
[ 0.02 , 0.04 , 0.04 ] −> 0. 26
[ 0.04 , 0.04 , 0.26 ] −> −0.12
[ 0.04 , 0.26 , −0.12 ] −> −0.18
```

The above encoding would require that the algorithm have three inputs and one output.

In [22]:
import numpy as np

def normalize_price_change(history):
    last = None
    
    result = []
    for price in history:
        if last is not None:
            result.append( float(price-last)/last )
        last = price

    return result

def encode_timeseries_window(source, lag_size, lead_size):
    """
    Encode raw data to a time-series window.
    :param source: A 2D array that specifies the source to be encoded.
    :param lag_size: The number of rows uses to predict.
    :param lead_size: The number of rows to be predicted
    :return: A tuple that contains the x (input) & y (expected output) for training.
    """
    result_x = []
    result_y = []

    output_row_count = len(source) - (lag_size + lead_size) + 1
    

    for raw_index in range(output_row_count):
        encoded_x = []

        # Encode x (predictors)
        for j in range(lag_size):
            encoded_x.append(source[raw_index+j])

        result_x.append(encoded_x)

        # Encode y (prediction)
        encoded_y = []

        for j in range(lead_size):
            encoded_y.append(source[lag_size+raw_index+j])

        result_y.append(encoded_y)

    return result_x, result_y


price_history = [ 45, 47, 48, 40, 41, 43, 45, 57, 50, 41 ]
norm_price_history = normalize_price_change(price_history)

print("Normalized price history:")
print(norm_price_history)

print()
print("Rounded normalized price history:")
norm_price_history = np.round(norm_price_history,2)
print(norm_price_history)


print()
print("Time Boxed(time series encoded):")
x, y = encode_timeseries_window(norm_price_history, 3, 1)

for x_row, y_row in zip(x,y):
    print("{} -> {}".format(np.round(x_row,2), np.round(y_row,2)))


Normalized price history:
[0.044444444444444446, 0.02127659574468085, -0.16666666666666666, 0.025, 0.04878048780487805, 0.046511627906976744, 0.26666666666666666, -0.12280701754385964, -0.18]

Rounded normalized price history:
[ 0.04  0.02 -0.17  0.02  0.05  0.05  0.27 -0.12 -0.18]

Time Boxed(time series encoded):
[ 0.04  0.02 -0.17] -> [ 0.02]
[ 0.02 -0.17  0.02] -> [ 0.05]
[-0.17  0.02  0.05] -> [ 0.05]
[ 0.02  0.05  0.05] -> [ 0.27]
[ 0.05  0.05  0.27] -> [-0.12]
[ 0.05  0.27 -0.12] -> [-0.18]
