#                                                                     The Almighty U.S.Dollar


Early humans who were pre-dominantly hunter-gatherers used commonly found material - stones, bones of animal, feathers of birds to get things they did not have. As they settled down and agriculture became a way of life, cattle and grain was used as a medium of exchange. With the formation of governments and territorial boundaries to defend humans got organized and structured, a system of monetary units gradually took hold. Today almost every country has it's own currency - a general definition is that a currency is a system of monetary units used as a medium of exchange in order to avoid the inconveniences of the barter system.

The U.S. Dollar Index (USDX, DXY, DX) is an index (or measure) of the value of the United States dollar relative to a basket of foreign currencies,often referred to as a basket of U.S. trade partners' currencies. The Index goes up when the U.S. dollar gains "strength" (value) when compared to other currencies.

The index is maintained and published by ICE (Intercontinental Exchange, Inc.), with the name "U.S. Dollar Index" a registered trademark.

It is a weighted geometric mean of the dollar's value relative to following select currencies:

                                Euro (EUR), 57.6% weight
                                Japanese yen (JPY) 13.6% weight
                                Pound sterling (GBP), 11.9% weight
                                Canadian dollar (CAD), 9.1% weight
                                Swedish krona (SEK), 4.2% weight
                                Swiss franc (CHF) 3.6% weight


##                                                                                                    Calculating U.S. Dollar Index Movements
An index value of 115 suggests that the U.S. dollar has appreciated 15 percent versus the basket of currencies over the time period in question. Subtracting the initial value of 100 from the current value of 115 yields 15; dividing the difference by the initial value of 100 gives an appreciation of 15 percent. Simply, if the USDX goes up, that means the U.S. dollar is gaining strength or value when compared to the other currencies. 

Similarly, if the index is currently 85, falling 15 from its initial value, then the same calculation would give a depreciation of 15 percent. The appreciation and depreciation results are a factor of the time period in question.

##    Trading the U.S.Dollar Index Movement
The index was established in 1973 with a starting value of 100. That means that if the USDX is measuring less than 100 then the USD has lost relative value compared to what it was worth in 1973 and if it is above 100 then the USD is stronger than it was in 1973. The index is particularly useful for traders in the bond, currency and gold markets. For example, a strong USD is usually correlated with falling gold prices, which means that gold traders are very interested in a break out on the dollar even though they may not be trading it directly.

In sections below Machine Learning techniques along with freely available data is used to predict the price of the U.S.Dollar Index.


## Factors influencing the U.S. Dollar

There are a broad range of factors ( which nicely translate to  features in Machine Language parlance ) that affect the U.S. Dollar. 
For this project only the following factors are being considered due to the lack of availability of free data and time contraints

###### 1. Daily prices of the U.S.Dollar Index - 
The open, close, high, low prices for the U.S. Dollar every day. The next day's price cannot be too far away from the current day's price, barring catastrophic events. 
###### 2. Federal Reserve Fund Rate - 
The interest rate that the Federal Reserve charges the banks on short term loans. This influences the money supply in the country and hence the value of the currency.

###### 3. Oil prices - 
Globally the oil prices are quoted in U.S.Dollar, this means that the as the USD goes up or down, so does the amount of oil a country can buy or sell based on it's US Dollar reserves.

###### 4. Personal Cosumption - 
There are 320 million people living in the United States. Their spending habits affect the supply and demand for the U.S. Dollar.

###### 5. Consumer Prices - 
A large chunk of money earned by Americans is spent on consumer goods, the Consumer Price  Index measures just that.

###### 6. Home prices - 
Buying a home is every American's dream. Home Price Index measures the price of houses across the country.

###### 7. Inflation  - 
Inflation is an indication of how much a U.S. Dollar is worth in the future -  1 year from now, 2 years from now, 5 years from now and so on. 
Monetary policy of the Federal Reserve Bank of the United States is to control inflation as it is one of the most important factors affecting the overall U.S. economy.

###### 8. Stock Market Prices -  
S & P Index measures the stock price of 500 of the largest U.S. companies. All of them go business globally and pay taxes in U.S.Dollar.

###### 9. Social Security - 
Government spending on benefits programmes like Social Security and Medicare accounts from anywhere between 17-20 % of U.S. GDP ( Gross Domestic Product).

###### 10. Employment Numbers - 
The number of people employment in a wide variety of industries, only some of which like Construction, NaturalResources and Mining, Manufacturing and Professional Services.

###### 11. Payroll Data - 
Wages paid out to people working the non-farm sectors. A large number of U.S. workers get paid via  ADP, a payroll processing company which pays workers          salaries on behalf of company

###### 12. Employment growth - 
Is the overall economy growing as a result of U.S. monetary & fiscal policies, foreign policies, innovation, political conditions ? This can be measured by the employment growth numbers. A healthy growth contributes to a strong dollar.

###### 13. Federal retirement benefits - 
Payments made out to retired federal government workers. The U.S. government is one of the largest employers.

###### 14. Saving rate of the residents of the U.S.
A low savings rate means a country imports more than it exports. Export / Import imbalances as a percentage of the GDP affects the U.S Dollar.

###### 15. Home construction - 
Cost of building  Index measures how expensive it is to build new homes across the country. Home construction is a strong measure of the overall performance of the U.S. economy

###### 16. Return on Investment (ROI) on the SnP - 
Foreign Investment in the United States is dependent on the ROI that the investor receives. A good ROI brings more investors to United States and maintain a "strong" dollar.

  

## Data Souces 

Two main data sources were used for this project 

1. https://www.investing.com/quotes/us-dollar-index-historical-data 

2. https://www.quandl.com/

The first site provides the daily USD Index pricing data. The second data provides all economic data needed for the system described below.


# Building a Trade Recommender System using GraphLab

In the section below, a complete trade recommender system is built and deployed.

The steps involved are the following -

    1. Data acquisition. Data is pulled in from 2 sources - a file containing the US Dollar Index daily close of business data and economic data from QUANDL
    2. The acquired data is split into training and test data
    3. The training data is used to train a model
    4. The test data is then used to predict 
    5. The predicted data is filtered for suitable trades. Any data that is outside of a certain range is filtered out
    6. The filtered data is returned to the user to make trades.


In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

## Data Acquisition 

##### Function          : 
create_trade_sFrame

##### Input               :  
Path contains the path of the file that has the USD Index daily price data

##### Output            : 
sFrame containing al the data needed by the Machine Learning pipeline

##### Description    : 

The function makes use of the GraphLab sFrame object. It read the daily price data from the file provided in the input variable path. Next it makes quandl calls to get all the economic data which are stored in their own data frames. All of this data is then joined using the DataFrame methods. Once the DataFrame is created, the calculated column are then computed - Forward price is calculated by taking the price on the next business day. The Forward Percent Change is calculated by subtracting the Current Price from the Forward Price and dividing by the Current Price. This give us the Forward Percent Change which is used as target variable. A catgorical variable is created based on the whether the Forward Percent Change is positive or negative. This is important since it will informs the caller whether to make a BUY/SELL decision or a SELL /BUY decision.

In [38]:
def create_trade_sFrame (path):
    
    #Import all the modules that are going to be used
    import datetime as dt
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from datetime import datetime
    from sklearn import clone
    from sklearn.metrics import accuracy_score
    from matplotlib.colors import ListedColormap
    from scipy.stats import pearsonr
    
    #Imports 
    import quandl
    import graphlab as gl
    
    from sklearn import clone
    from sklearn.metrics import accuracy_score
    from matplotlib.colors import ListedColormap
    from scipy.stats import pearsonr
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import Lasso
    from sklearn.linear_model import Ridge
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier,AdaBoostClassifier)
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline
    from sklearn.linear_model import ElasticNet
    from sklearn.metrics import mean_squared_error
    

    
    ##################################################################################################
    # Get Daily USD Index values from CSV
    ##################################################################################################
    df_USDIndexDaily                = pd.read_csv(path,sep=',', delimiter=None, header='infer')
    df_USDIndexDaily.columns        = ['Date','Price','Open','High','Low','Vol','PercentChange']
    
    df_USDIndexDaily       = df_USDIndexDaily.dropna()

    df_USDIndexDaily['Date']  = pd.to_datetime(df_USDIndexDaily['Date'])
    df_USDIndexDaily['Price'] = df_USDIndexDaily['Price'].astype('float64')
    df_USDIndexDaily['Open']  = df_USDIndexDaily['Open'].astype('float64') 
    df_USDIndexDaily['High']  = df_USDIndexDaily['High'].astype('float64') 
    df_USDIndexDaily['Low']   = df_USDIndexDaily['Low'].astype('float64') 
    df_USDIndexDaily['Vol']   = df_USDIndexDaily['Vol'].astype('float64')
    df_USDIndexDaily.set_index(pd.DatetimeIndex(df_USDIndexDaily['Date']), inplace=True)

    #Daily Data 
    qd_d_oil_barrel_price      = quandl.get("EIA/PET_RWTC_D" , authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
    qd_d_fed_fund_rate         = quandl.get("FED/RIFSPFF_N_B", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
    # Monthly Data
    qd_m_social_security       = quandl.get("SOCSEC/RETWORK", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
    qd_m_personal_consumption  = quandl.get("BEA/T20806_M", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_m_SnP_Composite         = quandl.get("YALE/SPCOMP", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_m_CPI                   = quandl.get("YALE/CPIQ", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_m_real_home_price_index = quandl.get("YALE/RHPI", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_m_inflation             = quandl.get("RATEINF/INFLATION_USA", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")

    # Not enough data for this ( 15 years only)
    qd_m_emp_ind               = quandl.get("ADP/EMPL_IND", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_m_emp_ind.columns = ['Construction','NaturalResourcesAndMining', 'Manufacturing', 'ProfBusinessServices','ProfessionalServices']
    # Not enough data for this ( 15 years only)
    qd_m_emp_nonfarm_payroll   = quandl.get("ADP/EMPL_NONFARM_PRI", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    # Not enough data for this ( 15 years only)
    qd_m_emp_growth            = quandl.get("ADP/EMPL_SEC", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")


    #Quarterly
    qd_q_fed_retirement      = quandl.get("FED/FU346403033_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_q_import              = quandl.get("BEA/T40205_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_q_wages_0             = quandl.get("BEA/T20200A_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_q_wages_1             = quandl.get("BEA/T20200B_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")


    #Yearly
    qd_y_savings_by_sector     = quandl.get("BEA/T50100_A", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_y_SnP_ROI               = quandl.get("YALE/SP_RSPC", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
    qd_y_building_cost_index   = quandl.get("YALE/RBCI",    authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")


    #Rename Columns 
    qd_d_oil_barrel_price.rename(columns={'Value': 'OilBarrelPriceUSD'}, inplace=True)
    qd_d_fed_fund_rate.rename(columns={'Value': 'FedFundsAnnualInterestRate'}, inplace=True)
    qd_m_social_security.rename(columns={'Total Number'  : 'Number_SS_Recepients'       , 'Total Avg amount' : 'AmountUSD'
                                     ,'Male Number'  : 'Number_Male_SS_Recepients'  ,  'Male Avg Amount' : 'MaleAmountUSD'
                                     ,'Female Number': 'Number_Female_SS_Recepients', 'Female Avg Amount': 'FemaleAmountUSD'}, inplace=True)

    qd_m_personal_consumption.columns = ['PCE_USD'
                                     ,'Goods_USD'
                                     ,'Durable_Goods_USD'
                                     ,'NonDurable_Goods_USD' 
                                     ,'Services_USD' 
                                     ,'PCEExcludingFoodAndEnergy_USD' 
                                     ,'Food_USD'
                                     ,'EnergyGoodsAndServices_USD'  
                                     ,'MarketBasedPCE_USD'          
                                     ,'MarketBasedPCEExcludingFoodAndEnergy_USD']

    qd_m_real_home_price_index.columns = ['HomePriceIndex']
    qd_m_inflation.columns             = ['InflationRate']
    qd_q_fed_retirement.columns        = ['FundsPaidInMillionsUSD']
    qd_q_wages_0.columns               = ['WagesAndSalaries','PrivateIndustries','GoodsProducingIndustries','Manufacturing','DistributiveIndustries','ServiceIndustries','Government']
    qd_q_wages_1.columns               = ['WagesAndSalaries','PrivateIndustries','GoodsProducingIndustries','Manufacturing','ServicesProducingIndustries','TradeTransportationAndUtilities'
                                      ,'OtherServicesProducingIndustries','Government']
    qd_y_SnP_ROI.columns               = ['SnPROI']
    qd_y_building_cost_index.columns   = ['CostIndex', 'PopulationInMillions','LongRate']



    dataIndex      = df_USDIndexDaily.join(qd_m_social_security,lsuffix='_index', rsuffix='_ss',  how='outer')
    dataIndex      = dataIndex.join(qd_d_oil_barrel_price,lsuffix='_index', rsuffix='_oil',  how='outer')
    dataIndex      = dataIndex.join(qd_d_fed_fund_rate,lsuffix='_index', rsuffix='_ffr',  how='outer')
    dataIndex      = dataIndex.join(qd_m_personal_consumption,lsuffix='_index', rsuffix='_percon',  how='outer')
    dataIndex      = dataIndex.join(qd_m_SnP_Composite,lsuffix='_index', rsuffix='_SnP',  how='outer')
    dataIndex      = dataIndex.join(qd_m_CPI,lsuffix='_index', rsuffix='_CPI',  how='outer')
    dataIndex      = dataIndex.join(qd_m_real_home_price_index,lsuffix='_index', rsuffix='_HPI',  how='outer')
    dataIndex      = dataIndex.join(qd_m_inflation,lsuffix='_index', rsuffix='_Infl',  how='outer')
    dataIndex      = dataIndex.join(qd_m_emp_ind,lsuffix='_index', rsuffix='_EmpInd',  how='outer')
    dataIndex      = dataIndex.join(qd_m_emp_nonfarm_payroll,lsuffix='_index', rsuffix='_NonFarmPayRoll',  how='outer')
    dataIndex      = dataIndex.join(qd_m_emp_growth,lsuffix='_index', rsuffix='_EmpGrwth',  how='outer')
    dataIndex      = dataIndex.join(qd_q_fed_retirement,lsuffix='_index', rsuffix='_FedRet',  how='outer')
    dataIndex      = dataIndex.join(qd_y_SnP_ROI,lsuffix='_index', rsuffix='_SnPROI',  how='outer')
    dataIndex      = dataIndex.join(qd_y_building_cost_index,lsuffix='_index', rsuffix='_BldgCostInd',  how='outer')

    dataIndex      = dataIndex.ffill()
    dataIndex      = dataIndex.dropna()

    # Create Extra columns 
    dataIndex['RowNum']            = range(len(dataIndex))
    dataIndex['ForwardPerChange']  = range(len(dataIndex))
    dataIndex['ForwardPrice']      = range(len(dataIndex))
    dataIndex.set_index(dataIndex['RowNum'], inplace=True)

    # Get the price for the next day. This will be used as the target  
    fPrice = np.empty(len(dataIndex.index))

    for idx, row in dataIndex.iterrows() :    # Create Extra columns 
        dataIndex['RowNum']            = range(len(dataIndex))
        dataIndex['ForwardPerChange']  = range(len(dataIndex))
        dataIndex['ForwardPrice']      = range(len(dataIndex))
        dataIndex.set_index(dataIndex['RowNum'], inplace=True)

    # Get the price for the next day. This will be used as the target  
    fPrice = np.empty(len(dataIndex.index))

    for idx, row in dataIndex.iterrows() :
        if idx < len(dataIndex.index)-1 :
            fp = dataIndex.iloc[idx+1].Price
            fPrice[idx] = fp

    dataIndex['ForwardPrice'] = fPrice
    dataIndex['ForwardPerChange'] = (dataIndex['ForwardPrice'] - dataIndex['Price']) / dataIndex['Price']  * 100.00  
    dataIndex['ForwardPerChangeDirection']   = np.where(dataIndex['ForwardPerChange'] > 0 , 1 , 0)

    dataIndex.set_index(pd.DatetimeIndex(dataIndex['Date']), inplace=True)
    dataIndex_ts   = gl.TimeSeries(dataIndex)
    sfIndex        = dataIndex_ts.to_sframe()

    if idx < len(dataIndex.index)-1 :
        fp = dataIndex.iloc[idx+1].Price
        fPrice[idx] = fp

    dataIndex['ForwardPrice'] = fPrice
    dataIndex['ForwardPerChange'] = (dataIndex['ForwardPrice'] - dataIndex['Price']) / dataIndex['Price']  * 100.00  
    dataIndex['ForwardPerChangeDirection']   = np.where(dataIndex['ForwardPerChange'] > 0 , 1 , 0)

    dataIndex.set_index(pd.DatetimeIndex(dataIndex['Date']), inplace=True)
    dataIndex_ts   = gl.TimeSeries(dataIndex)
    sfIndex        = dataIndex_ts.to_sframe()

    return sfIndex


## Model Training

In [39]:
def train_trade_model(data):
    """
    Takes an SFrame as input and uses it to train a model,
    setting the train and test sets as outputs along with the trained
    model.
    """
    import graphlab as gl
        
    #Select the best model based on your data.
    model = gl.regression.create(data, target='ForwardPerChange'
                                                     ,features=['Price','Open', 'High','Low','PercentChange', 'Number_SS_Recepients' , 'AmountUSD', 'Number_Male_SS_Recepients', 'MaleAmountUSD'
                                                               ,'Number_Female_SS_Recepients','FemaleAmountUSD','OilBarrelPriceUSD'
                                                               ,'FedFundsAnnualInterestRate','PCE_USD','Goods_USD','Durable_Goods_USD','NonDurable_Goods_USD','Services_USD','PCEExcludingFoodAndEnergy_USD'
                                                               ,'Food_USD','EnergyGoodsAndServices_USD','MarketBasedPCE_USD','MarketBasedPCEExcludingFoodAndEnergy_USD','S&P Composite','Dividend'
                                                               ,'Earnings','CPI_index','Long Interest Rate','Real Price','Real Dividend','Real Earnings','Cyclically Adjusted PE Ratio','CPI_CPI'
                                                               ,'HomePriceIndex','InflationRate','Construction','NaturalResourcesAndMining','Manufacturing','ProfBusinessServices','ProfessionalServices'
                                                               ,'1-19','20-49','1-49','50-499','500+','500-999','1000+','Total private','Goods producing','Service providing','FundsPaidInMillionsUSD','SnPROI'
                                                               ,'CostIndex','PopulationInMillions','LongRate'
                                                               ])
    
    
    return model

## Making Recommendations

In [40]:
def recommend_trades(model, data):
    
    predictions = model.predict(data)
    results     = model.evaluate(data)
    
    sf = gl.SFrame()
    
    sf= sf.add_column(data['Date']             , name='Date')
    sf= sf.add_column(predictions              , name='PredictForwardPerChange')
    sf= sf.add_column(data['ForwardPerChange'] , name='ActualForwardPerChange' )
    sf= sf.add_column(data['Price']            , name= 'Price')
    
    # Get the values where the predicted percent change is less than 1.5 %
    sfTrade = sf[( abs(sf['PredictForwardPerChange']) <= 1.5)  ] 
    
    return results, sfTrade

## Building a Machine Learning Pipeline in GraphLab

In [41]:
def trade_workflow(path):
    
    # Get Data 
    data = create_trade_sFrame(path)

    # Make a train-test split
    train_data, test_data = data.random_split(0.8)
    
    # Train model.
    model = train_trade_model(train_data)

    # Make recommendations
    results, sfTrade = recommend_trades(model, test_data)
    
    print(results)

    # Return the SFrame of recommendations.

    return sfTrade

## Creating and starting a job in GraphLab ( on the Local machine )

In [42]:
import graphlab as gl

# Deploy the job locally 
job_local = gl.deploy.job.create(trade_workflow, path = 'D:/SpringBoard/Capstone2/Data/US Dollar Index Historical Data_17Yr.csv')


[INFO] graphlab.deploy.job: Validating job.
[INFO] graphlab.deploy.job: Creating a LocalAsync environment called 'async'.
[INFO] graphlab.deploy.job: Validation complete. Job: 'trade_workflow-Sep-16-2018-11-39-07' ready for execution.
[INFO] graphlab.deploy.job: Job: 'trade_workflow-Sep-16-2018-11-39-07' scheduled.


## Getting the status of a job in GraphLab ( on the Local machine )

In [46]:
# get status immediately after creating this job.
job_local.get_status()

u'Completed'

## Displaying the result of the job run in the GraphLab environment ( on the Local machine)

In [47]:
# get the results
job_local.get_results()

Date,PredictForwardPerChange,ActualForwardPerChange,Price
2005-01-31 00:00:00+00:00,0.106586962938,-0.179425837321,83.6
2005-02-16 00:00:00+00:00,0.106586962938,-0.239034301422,83.67
2005-02-22 00:00:00+00:00,0.0733024179935,0.327630141973,82.41
2005-03-07 00:00:00+00:00,0.0418294370174,-0.978379031284,82.79
2005-03-21 00:00:00+00:00,0.08510440588,0.579150579151,82.88
2005-03-23 00:00:00+00:00,0.149861931801,0.154761904762,84.0
2005-03-30 00:00:00+00:00,0.0779440701008,-0.272867481315,84.29
2005-04-14 00:00:00+00:00,0.0706679224968,-0.517647058824,85.0
2005-04-15 00:00:00+00:00,0.0733024179935,-0.662251655629,84.56
2005-04-29 00:00:00+00:00,0.0273929536343,0.0,84.43


###### ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



#   Code Walkthrough, Visualization and Detailed Analysis



The following section displays the commands run in an individual fashion ( unpacking the code in the function ). This was done before the functions were written. The visualization outputs and evaluation metrics were examined in builing the recommender sytem described above.




##### All the imports ... 

In [3]:
#Import all the modules that are going to be used
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
from sklearn import clone
from sklearn.metrics import accuracy_score
from matplotlib.colors import ListedColormap
from scipy.stats import pearsonr

In [4]:
#Imports 
import quandl
import graphlab as gl

In [5]:
from sklearn import clone
from sklearn.metrics import accuracy_score
from matplotlib.colors import ListedColormap
from scipy.stats import pearsonr
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier,AdaBoostClassifier)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error

In [6]:
#Pandas Options 
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

height has been deprecated.



##### Reading from a file ..

In [7]:
##################################################################################################
# Get Daily USD Index values from CSV
##################################################################################################
df_USDIndexDaily                = pd.read_csv('D:/SpringBoard/Capstone2/Data/US Dollar Index Historical Data_17Yr.csv',sep=',', delimiter=None, header='infer')
df_USDIndexDaily.columns        = ['Date','Price','Open','High','Low','Vol','PercentChange']
df_USDIndexFutureDaily          = pd.read_csv('D:/SpringBoard/Capstone2/Data/US Dollar Index Futures Historical Data_17Yr.csv'
                                              ,sep=',', delimiter=None, header='infer')
df_USDIndexFutureDaily.columns  = ['Date','Price','Open','High','Low','Vol','PercentChange'] 


In [8]:
df_USDIndexDaily       = df_USDIndexDaily.dropna()
df_USDIndexFutureDaily = df_USDIndexFutureDaily.dropna()

In [9]:
df_USDIndexDaily['Date']  = pd.to_datetime(df_USDIndexDaily['Date'])
df_USDIndexDaily['Price'] = df_USDIndexDaily['Price'].astype('float64')
df_USDIndexDaily['Open']  = df_USDIndexDaily['Open'].astype('float64') 
df_USDIndexDaily['High']  = df_USDIndexDaily['High'].astype('float64') 
df_USDIndexDaily['Low']   = df_USDIndexDaily['Low'].astype('float64') 
df_USDIndexDaily['Vol']   = df_USDIndexDaily['Vol'].astype('float64')
df_USDIndexDaily.set_index(pd.DatetimeIndex(df_USDIndexDaily['Date']), inplace=True)

##### All the Quandl calls to get economic data ..

In [10]:
#Daily Data 
qd_d_oil_barrel_price      = quandl.get("EIA/PET_RWTC_D" , authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
qd_d_fed_fund_rate         = quandl.get("FED/RIFSPFF_N_B", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
# Monthly Data
qd_m_social_security       = quandl.get("SOCSEC/RETWORK", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1989-12-01", end_date="2016-12-31")
qd_m_personal_consumption  = quandl.get("BEA/T20806_M", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_m_SnP_Composite         = quandl.get("YALE/SPCOMP", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_m_CPI                   = quandl.get("YALE/CPIQ", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_m_real_home_price_index = quandl.get("YALE/RHPI", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_m_inflation             = quandl.get("RATEINF/INFLATION_USA", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")

# Not enough data for this ( 15 years only)
qd_m_emp_ind               = quandl.get("ADP/EMPL_IND", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_m_emp_ind.columns = ['Construction','NaturalResourcesAndMining', 'Manufacturing', 'ProfBusinessServices','ProfessionalServices']
# Not enough data for this ( 15 years only)
qd_m_emp_nonfarm_payroll   = quandl.get("ADP/EMPL_NONFARM_PRI", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
# Not enough data for this ( 15 years only)
qd_m_emp_growth            = quandl.get("ADP/EMPL_SEC", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")


In [11]:
#Quarterly
qd_q_fed_retirement      = quandl.get("FED/FU346403033_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_q_import              = quandl.get("BEA/T40205_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_q_wages_0             = quandl.get("BEA/T20200A_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_q_wages_1             = quandl.get("BEA/T20200B_Q", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")

In [12]:
#Yearly
qd_y_savings_by_sector     = quandl.get("BEA/T50100_A", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_y_SnP_ROI               = quandl.get("YALE/SP_RSPC", authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")
qd_y_building_cost_index   = quandl.get("YALE/RBCI",    authtoken="BQtsSEErwHxjuXLVL4Wd",start_date="1990-01-01", end_date="2016-12-31")

In [13]:
#Rename Columns 
qd_d_oil_barrel_price.rename(columns={'Value': 'OilBarrelPriceUSD'}, inplace=True)
qd_d_fed_fund_rate.rename(columns={'Value': 'FedFundsAnnualInterestRate'}, inplace=True)
qd_m_social_security.rename(columns={'Total Number'  : 'Number_SS_Recepients'       , 'Total Avg amount' : 'AmountUSD'
                                     ,'Male Number'  : 'Number_Male_SS_Recepients'  ,  'Male Avg Amount' : 'MaleAmountUSD'
                                     ,'Female Number': 'Number_Female_SS_Recepients', 'Female Avg Amount': 'FemaleAmountUSD'}, inplace=True)

qd_m_personal_consumption.columns = ['PCE_USD'
                                     ,'Goods_USD'
                                     ,'Durable_Goods_USD'
                                     ,'NonDurable_Goods_USD' 
                                     ,'Services_USD' 
                                     ,'PCEExcludingFoodAndEnergy_USD' 
                                     ,'Food_USD'
                                     ,'EnergyGoodsAndServices_USD'  
                                     ,'MarketBasedPCE_USD'          
                                     ,'MarketBasedPCEExcludingFoodAndEnergy_USD']

qd_m_real_home_price_index.columns = ['HomePriceIndex']
qd_m_inflation.columns             = ['InflationRate']
qd_q_fed_retirement.columns        = ['FundsPaidInMillionsUSD']
qd_q_wages_0.columns               = ['WagesAndSalaries','PrivateIndustries','GoodsProducingIndustries','Manufacturing','DistributiveIndustries','ServiceIndustries','Government']
qd_q_wages_1.columns               = ['WagesAndSalaries','PrivateIndustries','GoodsProducingIndustries','Manufacturing','ServicesProducingIndustries','TradeTransportationAndUtilities'
                                      ,'OtherServicesProducingIndustries','Government']
qd_y_SnP_ROI.columns               = ['SnPROI']
qd_y_building_cost_index.columns   = ['CostIndex', 'PopulationInMillions','LongRate']

##### Joining all the data on the Date column to get one DataFrame ..

In [14]:
dataIndex      = df_USDIndexDaily.join(qd_m_social_security,lsuffix='_index', rsuffix='_ss',  how='outer')
dataIndex      = dataIndex.join(qd_d_oil_barrel_price,lsuffix='_index', rsuffix='_oil',  how='outer')
dataIndex      = dataIndex.join(qd_d_fed_fund_rate,lsuffix='_index', rsuffix='_ffr',  how='outer')
dataIndex      = dataIndex.join(qd_m_personal_consumption,lsuffix='_index', rsuffix='_percon',  how='outer')
dataIndex      = dataIndex.join(qd_m_SnP_Composite,lsuffix='_index', rsuffix='_SnP',  how='outer')
dataIndex      = dataIndex.join(qd_m_CPI,lsuffix='_index', rsuffix='_CPI',  how='outer')
dataIndex      = dataIndex.join(qd_m_real_home_price_index,lsuffix='_index', rsuffix='_HPI',  how='outer')
dataIndex      = dataIndex.join(qd_m_inflation,lsuffix='_index', rsuffix='_Infl',  how='outer')
dataIndex      = dataIndex.join(qd_m_emp_ind,lsuffix='_index', rsuffix='_EmpInd',  how='outer')
dataIndex      = dataIndex.join(qd_m_emp_nonfarm_payroll,lsuffix='_index', rsuffix='_NonFarmPayRoll',  how='outer')
dataIndex      = dataIndex.join(qd_m_emp_growth,lsuffix='_index', rsuffix='_EmpGrwth',  how='outer')
dataIndex      = dataIndex.join(qd_q_fed_retirement,lsuffix='_index', rsuffix='_FedRet',  how='outer')
dataIndex      = dataIndex.join(qd_y_SnP_ROI,lsuffix='_index', rsuffix='_SnPROI',  how='outer')
dataIndex      = dataIndex.join(qd_y_building_cost_index,lsuffix='_index', rsuffix='_BldgCostInd',  how='outer')

dataIndex      = dataIndex.ffill()
dataIndex      = dataIndex.dropna()

# Create Extra columns 
dataIndex['RowNum']            = range(len(dataIndex))
dataIndex['ForwardPerChange']  = range(len(dataIndex))
dataIndex['ForwardPrice']      = range(len(dataIndex))
dataIndex.set_index(dataIndex['RowNum'], inplace=True)

# Get the price for the next day. This will be used as the target  
fPrice = np.empty(len(dataIndex.index))

for idx, row in dataIndex.iterrows() :
    if idx < len(dataIndex.index)-1 :
        fp = dataIndex.iloc[idx+1].Price
        fPrice[idx] = fp

dataIndex['ForwardPrice'] = fPrice
dataIndex['ForwardPerChange'] = (dataIndex['ForwardPrice'] - dataIndex['Price']) / dataIndex['Price']  * 100.00  
dataIndex['ForwardPerChangeDirection']   = np.where(dataIndex['ForwardPerChange'] > 0 , 1 , 0)

dataIndex.set_index(pd.DatetimeIndex(dataIndex['Date']), inplace=True)
dataIndex_ts   = gl.TimeSeries(dataIndex)
sfIndex        = dataIndex_ts.to_sframe()

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: C:\Users\srini\AppData\Local\Temp\graphlab_server_1537111679.log.0


This non-commercial license of GraphLab Create for academic use is assigned to sk@mashinlearn.com and will expire on February 12, 2019.


#### Let's use GraphLab's EDA capabilites to do some EDA on the data 

In [15]:
gl.canvas.set_target('ipynb', port=None)
sfIndex.show()
sfIndex.column_names()

['index',
 'Date',
 'Price',
 'Open',
 'High',
 'Low',
 'Vol',
 'PercentChange',
 'Number_SS_Recepients',
 'AmountUSD',
 'Number_Male_SS_Recepients',
 'MaleAmountUSD',
 'Number_Female_SS_Recepients',
 'FemaleAmountUSD',
 'OilBarrelPriceUSD',
 'FedFundsAnnualInterestRate',
 'PCE_USD',
 'Goods_USD',
 'Durable_Goods_USD',
 'NonDurable_Goods_USD',
 'Services_USD',
 'PCEExcludingFoodAndEnergy_USD',
 'Food_USD',
 'EnergyGoodsAndServices_USD',
 'MarketBasedPCE_USD',
 'MarketBasedPCEExcludingFoodAndEnergy_USD',
 'S&P Composite',
 'Dividend',
 'Earnings',
 'CPI_index',
 'Long Interest Rate',
 'Real Price',
 'Real Dividend',
 'Real Earnings',
 'Cyclically Adjusted PE Ratio',
 'CPI_CPI',
 'HomePriceIndex',
 'InflationRate',
 'Construction',
 'NaturalResourcesAndMining',
 'Manufacturing',
 'ProfBusinessServices',
 'ProfessionalServices',
 '1-19',
 '20-49',
 '1-49',
 '50-499',
 '500+',
 '500-999',
 '1000+',
 'Total private',
 'Goods producing',
 'Service providing',
 'FundsPaidInMillionsUSD',
 'SnP

##### Split the data into train and test data ...

In [27]:
# Make a train-test split
train_data, test_data = sfIndex.random_split(0.8)

##### Using GraphLab's regression object which automatically selects the best Model given the features set and input data ..

In [28]:
#Select the best model based on your data.
model = gl.regression.create(train_data, target='ForwardPerChange'
                                                     ,features=['Price','Open', 'High','Low','PercentChange', 'Number_SS_Recepients' , 'AmountUSD', 'Number_Male_SS_Recepients', 'MaleAmountUSD'
                                                               ,'Number_Female_SS_Recepients','FemaleAmountUSD','OilBarrelPriceUSD'
                                                               ,'FedFundsAnnualInterestRate','PCE_USD','Goods_USD','Durable_Goods_USD','NonDurable_Goods_USD','Services_USD','PCEExcludingFoodAndEnergy_USD'
                                                               ,'Food_USD','EnergyGoodsAndServices_USD','MarketBasedPCE_USD','MarketBasedPCEExcludingFoodAndEnergy_USD','S&P Composite','Dividend'
                                                               ,'Earnings','CPI_index','Long Interest Rate','Real Price','Real Dividend','Real Earnings','Cyclically Adjusted PE Ratio','CPI_CPI'
                                                               ,'HomePriceIndex','InflationRate','Construction','NaturalResourcesAndMining','Manufacturing','ProfBusinessServices','ProfessionalServices'
                                                               ,'1-19','20-49','1-49','50-499','500+','500-999','1000+','Total private','Goods producing','Service providing','FundsPaidInMillionsUSD','SnPROI'
                                                               ,'CostIndex','PopulationInMillions','LongRate'  
                                                               ])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



### In this case it selected the Gradient Boosted Decision Trees Model for regresion ....

In [29]:
model

Class                          : BoostedTreesRegression

Schema
------
Number of examples             : 2585
Number of feature columns      : 55
Number of unpacked features    : 55

Settings
--------
Number of trees                : 10
Max tree depth                 : 6
Training time (sec)            : 0.0469
Training rmse                  : 0.5994
Validation rmse                : 0.4986
Training max_error             : 19.7859
Validation max_error           : 2.2403

##### Make predictions and evaluate the model ...

In [30]:
#Make predictions and evaluate results.
predictions = model.predict(test_data)
results     = model.evaluate(test_data)
print(results)

{'max_error': 2.534599542617798, 'rmse': 0.5210959975858873}


##### RMSE and Max Error 
The RMSE is close enough to zero. Since the Forward Percent Price change has a standard deviation of 1.7 and the Prices themselves have a mean of 84, this is a good RMSE to work with.
The reasoning being that with a low enough RMSE,( as compared to something like 3 ) we will tend to see predictions within the 1.5% Day 1 to Day 2 price change.
This is the maximum risk that the Recommender will allow the user to take, i.e. predictions with Forward Price change of more than 1.5% are filtered out in the final data set that is returned.


In [31]:
##### Save to Disk
#sfIndex.save('D:/SpringBoard/Capstone2/Data/sframe')
#sfIndex =  gl.load_sframe('D:/SpringBoard/Capstone2/Data/sframe')

In [32]:
sf = gl.SFrame()

In [33]:
sf= sf.add_column(test_data['Date']             , name='Date')
sf= sf.add_column(predictions                   , name='PredictForwardPerChange')
sf= sf.add_column(test_data['ForwardPerChange'] , name='ActualForwardPerChange' )
sf= sf.add_column(test_data['Price']            , name= 'Price')

#### Filter out the Predicted Forward Percent Price changes that are greater than 1.5% - as a 1.5% fluctuation is the maximum that the system will allow.

In [34]:
# Get the values where the predicted percent change is less than 1.5 %
sfTrade = sf[( abs(sf['PredictForwardPerChange']) <= 1.5)  ] 

### Predict vs Actual Scatter plot

From the scatter plot below we see that the predicted and forward percent price changes are in the same direction with a few exceptions in origin (0,0). 

This presents a very good picture since if the Predicted Percent Price change is opposite in sign to the Actual Percent Price change the trade would result 

in a loss for the trader using the recommender to make trades. However if the signs of the Predicted and Actuals match, the trader using the system is guranteed to 

NOT make a loss, even if the trader did not make as much of a profit if the Predicted and Actuals match.

In [35]:
gl.canvas.set_target('ipynb', port=None)
sfTrade.show(view="Scatter Plot" , x='ActualForwardPerChange', y='PredictForwardPerChange')

#### Heatmap of the Actuals vs Predicted

In [36]:
gl.canvas.set_target('ipynb', port=None)
sfTrade.show(view="Heat Map" , x='ActualForwardPerChange', y='PredictForwardPerChange')

#### Some sample out of the predicted data.

In [37]:
sfTrade

Date,PredictForwardPerChange,ActualForwardPerChange,Price
2005-02-04 00:00:00+00:00,0.0156857669353,0.770233439981,84.39
2005-02-09 00:00:00+00:00,0.0706386864185,-0.564573041637,85.02
2005-02-22 00:00:00+00:00,0.330184578896,0.327630141973,82.41
2005-02-24 00:00:00+00:00,0.0156857669353,-0.265444015444,82.88
2005-03-01 00:00:00+00:00,0.0156857669353,0.374531835206,82.77
2005-03-10 00:00:00+00:00,0.265852749348,-0.183891136447,81.57
2005-03-23 00:00:00+00:00,0.0821212232113,0.154761904762,84.0
2005-03-30 00:00:00+00:00,0.123004704714,-0.272867481315,84.29
2005-04-01 00:00:00+00:00,0.0156857669353,0.414544593154,84.43
2005-04-12 00:00:00+00:00,0.0156857669353,0.0,84.42


# Conclusions and further work.

##### Conclusions :
Currency trading whilst very risky, due to the variety of factors that affect it, can be tamed using Machine Learning algorithms as shown by the simple experiment above.

However it should be noted that it is rather seductive and probably even naive to think that huge profits can be made and reasons are outlined below -

A) Even if the "PredictedForwardPercentChange" is in the right direction, it might not be large enough to profit from, due to other frictional costs ( brokerage, cost of capital etc)

B) In the above experiment, the RMSE was a low number i.e. 0.52, which was convenient for the financial intrument being traded and the risk tolerance in terms or percent change i.e. 1.5%.
   To elaborate - RMSE give us the mean of the deviation between the predicted and actual values, which tells us that on an average, we could be on the wrong side of the trade by 0.52 % ( the target variable is a percentage variable). Since a  1.5% loss is acceptable according to our trading strategy, large wrong predictions are automatically avoided. Note, that the standard deviation (STD) = 1.783, mean = -0.026 and the median = 0 in the input dataset.

C) Subtantial amount of capital must be employed to get even a small gain; i.e. if the price is 100(USD), then 100 units would cost 10,000.00(USD) and a max 1.5% gain would fetch 150(USD).

##### Further work :
Enhancements could be made to actually run the trading sytem with backtesting and calculating the profit/loss in this hypothetical scenario.