# Facebook Prophet
* [Docs for R and Python](https://facebook.github.io/prophet/docs/installation.html)
* [Paper](https://peerj.com/preprints/3190.pdf)

## Abstract
Forecasting is a common data science task that helps organizations with capacity
planning, goal setting, and anomaly detection. Despite its importance, there are
serious challenges associated with producing reliable and high quality forecasts –
especially when there are a variety of time series and analysts with expertise in
time series modeling are relatively rare. To address these challenges, we describe
a practical approach to forecasting “at scale” that combines configurable models
with analyst-in-the-loop performance analysis. We propose a modular regression
model with interpretable parameters that can be intuitively adjusted by analysts
with domain knowledge about the time series. We describe performance analyses
to compare and evaluate forecasting procedures, and automatically flag forecasts for
manual review and adjustment. Tools that help analysts to use their expertise most
effectively enable reliable, practical forecasting of business time series.

# ML Applied to Trading
https://medium.com/auquan/https-medium-com-auquan-machine-learning-techniques-trading-b7120cee4f05
    
DIRECTION: identify if an asset is cheap/expensive/fair value
ENTRY TRADE: if an asset is cheap/expensive, should you buy/sell it
EXIT TRADE: if an asset is fair priced and if we hold a position in that asset(bought or sold it earlier), should you exit that position
PRICE RANGE: which price (or range) to make this trade at
QUANTITY: Amount of capital to trade(example shares of a stock)

http://www.newsweek.com/business-technology-trading-684303

## ML (Q-Learning) able to arbitrage known advantage


Ritter explained: "I was really trying to answer the question, does machine learning have any application to trading at all, or no application; sort of a binary question. Can machine learning be applied to the problem of trading?

"I reasoned that in a system that I know admits a profitable trading strategy, because I constructed it that way, can the machine find it. If the answer to that is no, then what chance would it have in the real world where you don't even necessarily know that a profitable strategy exists in the space you are looking at.

"So luckily the answer to that was yes. In the system where I knew there was a profitable opportunity, the machine did learn to find it. So that then allows for further study."

A question on many people's lips when it comes to machine learning and AI-driven automation concerns the role that will be left for humans.

Ritter takes a sober view of this: "I think it's important to know what human beings are good at and what they are not good at. So human beings are typically not good at knowing what their true costs are.

"For example, we all know that doing a large trade or a trade that's a large fraction of the volume can have an impact in the market; you'll move the price as a result of your trading. Well, how much?

"Humans are not really good at answering that question. That's a better question for a mathematical model. So we sometimes get asked the question, does this mean humans are done? I think the answer to that is definitely not.

"Humans are probably good at coming up with the idea for a new strategy. Take the technology in my paper, for example: where should we apply it? What should we apply it to? What should the signal that drives the trade be, if any?

"But humans are not good at interacting with the microstructure; humans are not good at looking at a bid and an offer and saying, I think if I execute this many basis points here's what my impact will be. That kind of decision should be left to a machine really."


https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3015609

## Abstract

Abstract. In multi-period trading with realistic market impact, determining
the dynamic trading strategy that optimizes expected utility
of final wealth is a hard problem. In this paper we show that, with an
appropriate choice of the reward function, reinforcement learning techniques
(specifically, Q-learning) can successfully handle the risk-averse
case. We provide a proof of concept in the form of a simulated market
which permits a statistical arbitrage even with trading costs. The
Q-learning agent finds and exploits this arbitrage.

## Conclusion
According to neoclassical finance theory, no investor should hold a portfolio
(or follow a dynamic trading strategy) that does not maximize expected
utility of final wealth, E[u(wT )]. The concave utility function u expresses the
investor’s risk aversion, or preference for a less risky portfolio over a more
risky one with the same expected return. Only a risk-neutral investor would
maximize E[wT ]. The main contribution of this paper is that we show how
to handle the risk-averse case in a model-free way using Q-learning. We provide
a proof of concept in a controlled numerical simulation which permits
an approximate arbitrage, and we verify that the Q-learning agent finds and
exploits this arbitrage.
It is instructive to consider how this differs from approaches such as
Gˆarleanu and Pedersen (2013); to use their approach in practice requires
three models: a model of expected returns, a risk model which forecasts
the variance, and a (pre-trade) transaction cost model. The methods of
Gˆarleanu and Pedersen (2013) provide an explicit solution only when the
cost model is quadratic. By contrast, the methods of the present paper can,
in principle, be applied without directly estimating any of these three models,
or they can be applied in cases where one has an asset return model,
but one wishes to use machine learning techniques to infer the cost function
and the optimal strategy.

https://medium.com/machine-learning-world/neural-networks-for-algorithmic-trading-enhancing-classic-strategies-a517f43109bf

Use NNs to improve EMA crossover signal quality

# TF / DNNs for financial time series
https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data

Model SP500 closes based on markets that close earlier



# Inductive Bias
http://www.lauradhamilton.com/inductive-biases-various-machine-learning-algorithms

https://arxiv.org/pdf/1706.00948.pdf

Financial Series Prediction: Comparison Between Precision of Time
Series Models and Machine Learning Methods

Investors collect information from trading market and make investing decision based on collected
information, i.e. belief of future trend of security’s price. Therefore, several mainstream trend analysis
methodology come into being and develop gradually. However, precise trend predicting has long been a
difficult problem because of overwhelming market information. Although traditional time series models
like ARIMA and GARCH have been researched and proved to be effective in predicting, their performances
are still far from satisfying. Machine learning, as an emerging research field in recent years, has brought
about many incredible improvements in tasks such as regressing and classifying, and it’s also promising
to exploit the methodology in financial time series predicting. In this paper, the predicting precision of
financial time series between traditional time series models ARIMA, and mainstream machine learning
models including logistic regression, multiple-layer perceptron, support vector machine along with deep
learning model denoising auto-encoder are compared through experiment on real data sets composed of
three stock index data including Dow 30, S&P 500 and Nasdaq. The result show

# FINANCIAL TIME SERIES FORECASTING – A MACHINE LEARNING APPROACH 

https://pdfs.semanticscholar.org/7955/af1f5b8226ff13f915bead877c181a2917dc.pdf

## Survey

# BenjiKCF/Neural-Network-with-Financial-Time-Series-Data
https://github.com/BenjiKCF/Neural-Network-with-Financial-Time-Series-Data

# scikit-learn

## choosing the right estimator
http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

# Class boundaries
https://en.wikipedia.org/wiki/Support_vector_machine#/media/File:Svm_separating_hyperplanes_(SVG).svg

## SVMs
* Separate p-dimensional space with (p-1)-dimensional hyperplane

# Hyperparameter tuning
* http://scikit-learn.org/stable/modules/grid_search.html#grid-search
* https://github.com/EpistasisLab/tpot
* https://github.com/automl/auto-sklearn

# TODO

* feature selection http://scikit-learn.org/stable/modules/feature_selection.html

# STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS
https://arxiv.org/pdf/1607.01958.pdf

* text mining
* RF: 90%, SVM 86%, 83%
* Optimized # of classes at ~5 (from prior research)

Then after comparing their
results, Random Forest worked very well for all test cases ranging from 88% to 92% accuracy.
Accuracy followed by SVM is also considerable around 86%. Naive Bayes algorithm
performance is around 83%

# Multivariate Time Series Forecasting of Crude Palm Oil Price
Using Machine Learning Techniques

Abstract. The aim of this paper was to study the correlation between crude palm oil
(CPO) price, selected vegetable oil prices (such as soybean oil, coconut oil, and olive
oil, rapeseed oil and sunflower oil), crude oil and the monthly exchange rate.
Comparative analysis was then performed on CPO price forecasting results using the
machine learning techniques. Monthly CPO prices, selected vegetable oil prices, crude
oil prices and monthly exchange rate data from January 1987 to February 2017 were
utilized. Preliminary analysis showed a positive and high correlation between the CPO
price and soy bean oil price and also between CPO price and crude oil price.
Experiments were conducted using multi-layer perception, support vector regression
and Holt Winter exponential smoothing techniques. The results were assessed by using
criteria of root mean square error (RMSE), means absolute error (MAE), means
absolute percentage error (MAPE) and Direction of accuracy (DA). Among these three
techniques, support vector regression(SVR) with Sequential minimal optimization
(SMO) algorithm showed relatively better results compared to multi-layer perceptron
and Holt Winters exponential smoothing method. 

4 Conclusions
Support vector regression, multi-layer perceptron and Holt Winter exponential smoothing were
utilized in this study to forecast the CPO price using multivariate time series. The prediction results
exhibits that the support vector regression had higher predicted accuracy compared to multi-layer
perceptron and Holt Winter exponential smoothing methods. In this study nine attributes were
chosen and the results of this analysis showed the strength of support vector regression in
forecasting multivariate time series of CPO price. In future, more relevant attributes could be included
to improve forecasting of the CPO price. Feature selection method also can be added in future studies
in order to improve accuracy of CPO price forecasting.

http://iopscience.iop.org/article/10.1088/1757-899X/226/1/012117/pdf

# Machine Learning Strategies for Time Series Forecasting
https://s3.amazonaws.com/academia.edu.documents/41231210/Machine_Learning_Strategies_for_Time_Ser20160114-28513-1pqgoe9.pdf20160115-19908-phg1xc.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1522946447&Signature=b3eDHMvLqu8bo4S%2BnnHK2pFqN2U%3D&response-content-disposition=inline%3B%20filename%3DMachine_Learning_Strategies_for_Time_Ser.pdf


time series [25].
In the last two decades, machine learning models have drawn attention and
have established themselves as serious contenders to classical statistical models
in the forecasting community [1,43,61]. These models, also called black-box or
data-driven models [40], are examples of nonparametric nonlinear models which
use only historical data to learn the stochastic dependency between the past and
the future. For instance, Werbos found that Artificial Neural Networks (ANNs)
outperform the classical statistical methods such as linear regression and BoxJenkins
approaches [59,60]. A similar study has been conducted by Lapedes and
Farber [33] who conclude that ANNs can be successfully used for modeling and
forecasting nonlinear time series. Later, other models appeared such as decision
trees, support vector machines and nearest neighbor regression [29,3]. Moreover,
the empirical accuracy of several machine learning models has been explored in a
number of forecasting competitions under different data conditions (e.g. the NN3,
NN5, and the annual ESTSP competitions [19,20,34,35]) creating interesting
scientific debates in the area of data mining and forecasting [28,45,21].


# Time series prediction using SVMs
https://pdfs.semanticscholar.org/4092/7c5d81988a1151639fad150cbc74f64e0d68.pdf
    

# Financial Time Series Forecasting with Machine Learning Techniques: A Survey (2010)

http://epublications.bond.edu.au/cgi/viewcontent.cgi?article=1113&context=infotech_pubs

Abstract. Stock index forecasting is vital for making informed investment decisions. This
paper surveys recent literature in the domain of machine learning techniques and artificial
intelligence used to forecast stock market movements. The publications are categorised
according to the machine learning technique used, the forecasting timeframe, the input
variables used, and the evaluation techniques employed. It is found that there is a consensus
between researchers stressing the importance of stock index forecasting. *Artificial Neural
Networks (ANNs)* are identified to be the dominant machine learning technique in this area.
We conclude with possible future research directions

![methods](img/ts-data.png)

Selecting the right input variables is very important for machine learning techniques.
Even the best machine learning technique can only learn from an input if there is
actually some kind of correlation between input and output variable.
 Table 3 shows that over 75% of the reviewed papers rely in some form on
lagged index data. The most commonly used parameters are daily opening, high, low
and close prices. Also used often are technical indicators which are mathematical
transformations of lagged index data. The most common technical indicators found in
the surveyed literature are the simple moving average (SMA), exponential moving
average (EMA), relative strength index (RSI), rate of change (ROC), moving average
convergence / divergence (MACD), William’s oscillator and average true range
(ATR)

# An Empirical Comparison of Machine Learning Models for Time Series Forecasting (2010)
https://www.researchgate.net/profile/Nesreen_Ahmed3/publication/227612766_An_Empirical_Comparison_of_Machine_Learning_Models_for_Time_Series_Forecasting/links/00b7d526c47935d41b000000/An-Empirical-Comparison-of-Machine-Learning-Models-for-Time-Series-Forecasting.pdf

Abstract
In this work we present a large scale comparison study for the major machine
learning models for time series forecasting. Specifically, we apply the models on
the monthly M3 time series competition data (around a thousand time series).
There have been very few, if any, large scale comparison studies for machine
learning models for the regression or the time series forecasting problems, so we
hope this study would fill this gap. The models considered are multilayer perceptron,
Bayesian neural networks, radial basis functions, generalized regression
neural networks (also called kernel regression), K-nearest neighbor regression,
CART regression trees, support vector regression, and Gaussian processes. The
study reveals significant differences between the different methods. The best
two methods turned out to be the multilayer perceptron and the Gaussian process
regression. In addition to model comparisons, we have tested different
preprocessing methods and have shown that they have different impacts on the
performance

The two best models turned out to be MLP (ANN) and GP (Gaussian Processes). This is an interesting result, as GP up until few years
ago has not been a widely used or studied method. We believe that there is still room
for improving GP in a way that may positively reflect on its performance

# A Comprehensive Review of Sentiment Analysis of Stocks
https://pdfs.semanticscholar.org/42b1/0e23482cd2a0dcbd4c9ac1295620d4c80be5.pdf

The algorithms,
Naive Bayes and Support Vector Machine (SVM) are basic machine
learning algorithms currently used however hybrid versions
are upcoming

#  Time Series Forecasting as Supervised Learning
* Multivariate vs univariate
* One-step vs multi-step forecast


https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

http://www.ulb.ac.be/di/map/gbonte/ftp/time_ser.pdf

: ML is just a buzzword which equates to statistics plus
marketing