# Machine Learning for Trading
## Chapter 1 :: Introduction

**Why you should use machine learning?** The world is moving towards automation and machine learning, and the finance industry is no different.

**What about machine learning in trading?** Machine Learning (ML) algorithms can create an investment portfolio for you.

**Is ML only for hedge funds and large institutions?** That was the case for a long time but not anymore. With the advent of technology and open source data science resources, retail traders and individuals have also started to learn and use machine learning. A trader uses ML to implement a trading strategy, he will allow the machine to go through hundreds of technical indicators. He lets the machine decide which indicator performs best in predicting the correct market trend. The ML algo will be able to go through many more technical indicators and pick the best ones, in terms of prediction power.

This technique is known as feature selection, where machine learning chooses between different features of indicators and chooses the most efficient ones for prediction.

**How do you implement machine learning?** ML can be implemented in any programming language. But an overwhelming majority use Python because it is easier to learn and it supports various ML libraries. The syntax is also cleaner and easy to understand. Depending on your requirements, you can choose any Python library be it `scikit-learn` or `TensorFlow`.

You won't need a computer science or programming degree to implement ML algorithms in your domain. You simply need to know the Python ML libraries and syntax. And with a few lines of code, implement your ML project in real-time.

This book gives you a detailed and step-by-step guide to create a machine learning trading strategy:
1. Use Python libraries, which will help you read the data.
2. Use machine learning to find patterns in the data.
3. Generate signals on whether you should buy or sell the asset.
4. Learn to analyse the performance of the model.

**What is ML?** A key difference between a regular algorithm (algo) and a machine learning algo is the "learning" model which allows the algorithm to learn from the data and make its own decisions.

**How do machines learn?** Well, the simple answer is, just like humans!. ML is one of the most popular approaches in Artificial Inteligence. One of the key aspects of ML is the usage of new/continuous data to iterate and keep on learning.

**What is the difference between Machine Learning, Deep Learning, and Artificial Intelligence?** The following graph illustrates this:

![alt text](graph001.png)



### How Machine Learning approaches different types of problems?

A machine learning algorithm can perform simple classification tasks and complex mathematical computations like regression. It involves the building of mathematical models that are used in classification or regression. To "train" these mathematical models, you need a set of training data. This is the dataset over which the system builds the model.

The mathematical models are divided into two categories, depending on their training data: 
1. Supervised Learning Models
2. Unsupervised LEarning Models


![alt text](assets/graph002.png)


#### Supervised Learning
Think of supervised learning as a kid learning multiplication tables. When building supervised learning models, the training data contains the required answers or the expected output. These required answers are called labels.

For example, if the training data contains the technical indicators such as RSI or ADX, as well as the trading position to take such as buy or sell, then it is known as the supervised learning approach.

With enough data points, the machine learning algorithm will be able to classify the trading signal correctly more often than not. Supervised learning models can also be used to predict continuous numeric values such as the share price of Disney. These models are known as regression models. In this case, the labels would be the share price of Disney.

##### Types of Supervised Models
Supervised models are trained on labelled datasets. It can either be a continous label (regression) or a categorical label (classification).

**Regression Models:** Regression is used when one is dealing with continuous values. Popular regression models are:
+ Linear regression
+ Lasso regression
+ Ridge regression

**Classification Models:** Classification is used for data that is separated into categories, with each category represented by a label. The training data must contain the labels and must have sufficient observations of each label. Some popular classification models include:
+ Random Trees Classifiers
+ Random Forests Classifiers
+ Neural Network Classifiers

There are various evaluation methods to find out the performance of these models, that will be discussed.

**Types of Classification:** There are mainly three types of classification:
*Binary Classification:* This type of classification has only two categories. Usually, they are Boolean values: `1` or `0` (sometimes known as `True`/`False`, or `High`/`Low`). You may build a trading binary classification program with labels such as buy and sell, or buy and no position.

*Multi-class Classification:* Multi-class classifiers or multinomial classifiers can distinguish between more than two classes. For example, you can have three labels such as `buy`, `neutral`, or `sell`.

*Multi-label Classification:* This type of classification occurs when a single observation contains multiple labels. For example, a single image could contain a car, a truck and a human. The algorithm must be able to classify each of them separately. Thus, it has to be trained for many labels and should report `True` for each of the objects i.e. a car, truck, and human and `False` for any other labels it has trained for.

### Unsupervised Learning
In unsupervised learning, as the name suggests, the dataset used for training does not contain the required answers. Instead, the algorithm uses techniques such as clustering to group similar objects together.

The assumption in such a system is that the clusters discovered will match reasonably well with an intuitive classification. For example, the clustering of stocks based on historical data. This will result in clustering the stocks that belong to the same sector or industry group together.

Another application of unsupervised learning is anomaly detection. This uses a clustering algorithm to find out major outliers in a graph. These are used in credit card fraud detection and to detect a market crash or black swan events.

Apart from the two types of learning models mentioned earlier, Machine learning includes a third type which is known as reinforcement learning. Reinforcement learning involves methods that retro feed the model with rewards or punishment to improve performance. In order to improve performance, the model needs to be able to interpret the inputs correctly. Further, the model will decide on action and compare the outcome against a predefined reward system.

Reinforcement learning takes actions to maximise the rewards. This is not a supervised type of learning because it does not strictly depend on supervised or labeled data. And it is not unsupervised learning either since it modifies the model according to the final reward. The special feature of such models is that they try to learn from their mistakes and make better decisions in subsequent runs.

A few algorithms are a mix of supervised and unsupervised algorithms. Such algorithms combine a small amount of labelled data with a large amount of unlabelled data during training. For example, text-based classification (Natural Language Processing). The knowledge of the types of machine learning algorithms will help you appropriately select a model for a given problem stated.



We have created a step-by-step explanation of the different tasks carried out while creating and executing machine learning algorithms in the next part of the book.