Skip to content

📊 Technitrade lets user track a portfolio of stocks, periodically getting buy, sell, or hold opinions based on ML technical and sentiment analysis.

Notifications You must be signed in to change notification settings

illyanyc/technitrade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 

Repository files navigation

header


Disclosure

⚠️ NOT INVESTMENT ADVICE ⚠️

The content produced by this application is for informational purposes only, you should not construe any such information or other material as legal, tax, investment, financial, or other advice. Nothing contained in this article, Git Repo or withing the output produced by this application constitutes a solicitation, recommendation, endorsement, or offer by any member involved working on this project, any company they represent or any third party service provider to buy or sell any securities or other financial instruments in this or in in any other jurisdiction in which such solicitation or offer would be unlawful under the securities laws of such jurisdiction.

The use of word "opinion" or "recommendation" or any other word with a similar meaning, in this article, within the Technitrade application, or within information produced by the application is for demonstration purposes only, and is not a recommendation to buy or sell any securities or other financial instruments!

This application was created solely to satisfy the requirements of Columbia University FinTech Bootcamp Project #2 Homework, and the results produced by this application may be incorrect.


Table of Contents


Overview

Technitrade lets user track a portfolio of stocks, periodically getting News Sentiment, Twitter Sentiment, and Machine Learning AI Stock Opinion. The machine learning model calculates "opinion" based on market data and technical analysis, while the investor sentiment calculated by natural language processing analysis of recent news articles and Tweets.

The user interacts with the program via an Amazon Lex chatbot. The machine learning analysis is performed using LSTM (Long Short-Term Memory) model. The model is trained on technical analysis indicators. Sentiment analysis is performed by Google Cloud Natural Language using NewsAPI and Twitter APIs as data source.

Demo Jupyter Notebooks

  1. Technical Analysis Demo : technicals_demo.ipynb
  2. Machine Learning Demo : lstm_demo.ipynb
  3. Sentiment Analysis Demo : nlp_demo.ipynb

Production Code

  • Flask API
  • Application (Production Machine Learning LSTM model, Sentiment Analysis, etc. )
  • Infrastructure
  • Docker container

Can all be found here: code/api/


Application Logic

flowchart


Libraries

The following libraries are used:

Data Computation and Visualization

  • Numpy - "The fundamental package for scientific computing with Python".
  • Pandas - data analysis and manipulation tool.
  • Matplotlib - comprehensive library for creating static, animated, and interactive visualizations in Python.

Database

  • boto3 - AWS SDK for Python to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services.
  • psycopg2 - database adapter for the Python programming language.

Data Source APIs

  • Dotenv - python-dotenv reads key-value pairs from a .env file and can set them as environment variables.
  • Alpaca Trade API - Internet brokerage and market data connection service.
  • NewsAPI - NewsAPI locates articles and breaking news headlines from news sources and blogs across the web and returns them as JSON.
  • Twitter API - Twitter API enables programmatic access to Twitter.
    • tweepy - An easy-to-use Python library for accessing the Twitter API.

Machine Learning

  • Scikit-Learn - Machine learning library for python
  • Tensorflow - end-to-end open source platform for machine learning.
  • Keras - aython API used to interact with Tensorflow.
  • NLTK - leading platform for building Python programs to work with human language data.
  • Google Cloud language_v1 - API that connects to Google Cloud Natural Language

Other Development Frameworks

  • Flask - micro web framework written in Python.
  • AWS Lex Bot - service for building conversational interfaces into any application using voice and text.
  • Twilio - service to programaticly send and receve SMS messages via Python API.
  • Twilio SendGrid - communication platform for transactional and marketing email.

Interface

User interfaces with the application using SMS enabled by Twilio service. Twilio service connects to AWS Lex Bot which handles all the conversation logig.

Amazon Lex Bot gathers the following user info:

  1. Name
  2. Email
  3. n number of portfolio stock tickers

The user gets the News Sentiment, Twitter Sentiment, and Machine Learning AI Stock Opinion via periodic emails. The first email is received right after the Machine Learning model finished training and is fitted with data to predict future stock prices.

The emails are distributed via Twilio's SendGrid service.

The resulting email looks something like this:


Flask API

Overview

A Flask API was built in order to handle all tasks between the:

  1. Amazon Lex Bot via Lambda
  2. Data sources: Market Data Connection (see [code/marketdata/] folder), NewsAPI, Twitter API
  3. Technical Analysis module : technicals.py
  4. Machine Learning module : lstm_model.py
  5. Sentiment Analysis service
  6. Amazon RDS PostgreSQL server

All events are triggered by AWS Cloudwatch. AWS Lambda function handle all of the production python code.

Flask API steps

The steps by which the Flask API executes application workflow is outlines in the table below.

Objective Action Trigger
1 User Data User & Portfolio Creation Amazon LEX
2 Model - Training Trigger the API to run the training Lambda / CloudWatch
3 Model - Training Save the model in Amazon S3 API
4 Model - Forecast Forecast the tickers Lambda / CloudWatch / API
5 User Data Update the user portfolio Lambda / CloudWatch / API
6 User Data Send email to the users Lambda / CloudWatch / API

SQL Database

Database Overview

A PostgreSQL database hosted on Amazon RDS is utilized to store all the user data and machine learning models.

All database code can be viewed here: code/src/

Amazon RDS

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.

Postgres

PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

psycopg2 was used to interface python with PostgreSQL database. pgAdmin was used for testing and debugging.

Database Schematics

database_flowchart


Technical Analysis

Technical analysis is performed via technicals module. A demonstration of the module can be seen in technicals_demo.ipynb

Indicators

Relative Strength Index (RSI)

RSI is a momentum indicator which measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock. [Investopedia]

RSI Equation

where:
relative strenght (RS) = average gain - average loss

William's Percent Range (Williams %R)

Williams %R is a momentum indicator which measures overbought and oversold levels. It has a domain between 0 and -100.The Williams %R may be used to find entry and exit points in the market. [Investopedia]

Williams %R Equation

where:
Highest High = Highest price in the lookback period.
Close = Most recent closing price.
Lowest Low = Lowest price in the lookback period.

Money Flow Index

The money flow index (MFI) is an oscillator that ranges from 0 to 100. It is used to show the money flow (an approximation of the dollar value of a day's trading) over several days. [Wikipedia]

Money Flow Index Equation - Positive money flow is calculated by adding the money flow of all the days where the typical price is higher than the previous day's typical price.
- Negative money flow is calculated by adding the money flow of all the days where the typical price is lower than the previous day's typical price.
- If typical price is unchanged then that day is discarded.
- The money flow is divided into positive and negative money flow.





Stochastic Oscillator

The stochastic oscillator is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time. The sensitivity of the oscillator to market movements is reducible by adjusting that time period or by taking a moving average of the result. It is used to generate overbought and oversold trading signals, utilizing a 0–100 bounded range of values. [Investopedia]

Stochastic Oscillator Equation

where:
C = The most recent closing price
Lown = The lowest price traded of the n previous trading sessions
Highn = The highest price traded during the same n-day period
%K = The current value of the stochastic indicator

Moving Average Convergence Divergence (MACD)

MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period exponential moving average (EMA) from the 12-period EMA. [Investopedia]

MACD Equation

Exponential moving average is a moving average that places a greater weight to most recent data points and less to the older data points. In finance, EMA reacts more significantly to recent price changes than a simple moving average (SMA)which applies an equal weight to all observations in the period. In statistics, a moving average (MA), also known as simple moving average (SMA) in finance, is a calculation used to analyze data points by creating a series of averages of different subsets of the full data set.

Moving Average

The moving average is a calculation used to smooth data and in finance used as a stock indicator. [Investopedia]

Moving Average Equation

where:
A = Average in period n
n = Number of time periods

Exponential Moving Average

The exponential moving average is a type of moving average that gives more weight to recent prices in an attempt to make it more responsive to new information. [Investopedia]

EMA Equation

where:
EMAt = EMA today
EMAy = = EMA yesterday
Vt = Value today
s = smoothing
d = number of days

High Low and Close Open

the high-low and close-open indicators are the difference between the high and low prices of the day and close and open prices of the day respectively.

High-Low and Close-Open Equations


Bollinger Bands

A Bollinger Band® is a technical analysis tool defined by a set of trendlines plotted two standard deviations (positively and negatively) away from a simple moving average (SMA) of a security's price. Bollinger Bands® were developed and copyrighted by famous technical trader John Bollinger, designed to discover opportunities that give investors a higher probability of properly identifying when an asset is oversold or overbought. [Bollinger Bands],[Investopedia]

Bollinger Bands Equation


where:
σ = standard deviation
m = number of standard deviations
n = number of days in the smoothing period

Machine Learning Model

LSTM (Long Short-Term Memory) model using TensorFlow and Keras is used. An example of the machine learning model code is provided in lstm_demo.ipynb notebook.

LSTM Overview

This application utilizes LSTM (Long Short-Term Memory) machine learning model. LSTM model was developed by Sepp Hochreiter and published in Neural Computation in 1997 [Hochreiter 1997]. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell Wikipedia.

lstm_cell

Machine Learning Libraries

TensorFlow

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets developers easily build and deploy ML powered applications.

Keras

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Keras allows for easy implementation of TensorFlow methods without the need to build out complex machine learning infrastructure.

Implementation

Data Acquisition

Data is acquired from Alpaca Trade API and processed using the technicals module. The resulting DataFrame contains Closing price and all of the technical indicators.

The market data is obtained by calling the ohlcv() method within the alpaca module. The methods takes a list of tickers, as well as the start_data and end_date, and returns a pd.DataFrame.

end_date  = datetime.now().strftime('%Y-%m-%d')
start_date  = (end_date - timedelta(days=1000)).strftime('%Y-%m-%d')

ohlcv_df = alpaca.ohlcv(['tickers'], start_date=start_date, end_date=end_date)

The TechnicalAnalysis class must first be instantiated with the pd.DataFrame containing market data.

tech_ind = technicals.TechnicalAnalysis(ohlcv_df)
tech_ind_df = tech_ind.get_all_technicals('ticker')

LSTM model class

The LSTM model is contained within the MachineLearningModel class located in the lstm_model module. The class must first me instantiated with a pd.DataFrame containing the technical analysis data.

my_model = lstm_model.MachineLearningModel(tech_ind_df)

Build, fit and save model

Building and fitting the model is done by calling the build_model() class method.

hist = my_model.build_model()

The model is then saved as an .h5 file.

my_model.save_model('model.h5')

MachineLearningModel.build_model() Description

The MachineLearningModel is used to handle all machine learning methods. The build_model() class method, builds and fits the model. The class method implements the following methodology:

Model overview

The LSTM model is programmed to look back 100 days to predict 14 days. The number of features is set by the shape of the DataFrame.

n_steps_in = 100
n_steps_out = 14
n_features = tech_ind_df.shape[1]

Scaling

A RobustScaler is used to scale the technical analysis data [ScikitLearn].

sklearn.preprocessing.RobustScaler()

Scale features using statistics that are robust to outliers.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the transform method.

Parsing

The DataFrame is then parsed to np.array and spit into X and y subsets.

X, y = split_sequence(tech_ind_df.to_numpy(), n_steps_in, n_steps_out)

Where split_sequence() is a helper method that splits the multivariate time sequences.

Model type

Sequential() model is utilized as it groups a linear stack of layers into a tf.keras.Model [TensorFlow]

model = tf.keras.Sequential()

Activation function

A hyperbolic tangent activation function is used : tanh[TensorFlow]

activation_function = tf.keras.activations.tanh

Input and hidden layers

LSTM input and hidden layers are utilized. [TensorFlow]

The input layer contains 60 nodes, while the hidden layers contain 30 nodes by default but can be set by the administrator to n arbitrary amount by setting the n_nodes variable. The number of hidden layers default to 1 but can also be modified by the administrator.

Hidden layers are added with a add_hidden_layers() helper function.

n_nodes = 30

# input layer
model.add(LSTM(60, 
               activation=activation_function, 
               return_sequences=True, 
               input_shape=(n_steps_in, n_features)))

# hidden layers ...
model.add(LSTM(n_nodes, activation=activation_function, return_sequences=True))

Dense layers

Two dense layers are used in the model. Dense layers are added using add_dense_layers class method.

model.add(Dense(30))

Optimizer

The model uses Adam optimizer (short for Adaptive Moment Estimation) [TensorFlow]. Adam is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. Adam optimizer was developed by Diederik Kingma and Jimmy Ba and published in 2014 [Kingma et. al. 2014]. Adam optimizer is defined by its creators as "an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments."

optimizer = tf.keras.optimizers.Adam

Loss function

The model uses Mean Squared Error loss function, which computes the mean of squares of errors between labels and predictions [TensorFlow]

loss = tf.keras.losses.MeanSquaredError

Other model parameters

Model is trained for 16 epochs using 128 unit batch size. The validation split is 0.1.

Compiling and fitting

The model is then compiled and fit.

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
hist = model.fit(X, y, epochs=16, batch_size=128, validation_split=0.1)

Training Results

An example of model training results with conducted with The Coca-Cola Company stock : KO.

Accuracy

model_accuracy_KO

Loss

model_loss_KO

Predictions

Predictions are calculated with a validator() helper method.

model_pred_KO

Forecasting stock prices

Implementation

To forecast stock prices using the saved model, the application uses the ForecastPrice class located within the lstm_model module.

The module pre-processes the date using the aforementioned methods and then utilizes <code.model.predict() TensorFlow method.

The application accomplished this by:

  1. Getting stock prices for past 200 days using alpaca module
  2. Getting technical indicators using the get_all_technicals() method withing the technicals.TechnicalAnalysis class
  3. Instantiating the ForecastPrice class with the technical data
forecast_model = lstm_model.ForecastPrice(tech_ind_df)
  1. Calling forecast() method within the ForecastPrice class
forecast = forecast_model.forecast()

ForecastPrice.forecast() Description

ForecastPrice class handles all of the forecasting functions. The forecast() class method implements the following methodology:

  1. Load model using load_model Keras method.
from tensorflow.keras.models import load_model
forecast_model = load_model("model.h5")
  1. Pre-processes the data following the same methodology as MachineLearningModel class.

  2. Predicts the prices.

forecasted_price = forecast_model.predict(tech_ind_df)
  1. Inverse scale the prices.
forecasted_price = scaler.inverse_transform(forecasted_price)[0]

Forecast Result

model_pred_KO

If the predicted price 14 days from now is higher than the current price, the application will issue a buy "opinion", if the price is lower that the current price it will issue a sell "opinion" on the date of the highest predicted price.


Sentiment Analysis

Sentiment analysis is performed using the Google Cloud Natural Language service.

gc_nlp

The data utilized in sentiment analysis is obtained from 2 sources:

  1. NewsAPI
  2. Tweepy

Implementation of NewsAPI and Tweepy can be found in the demo notebook: nlp_demo.ipynb

The sentiment analysis implementation:

from google.cloud import language_v1
from google.oauth2.credentials import Credentials

def GetSentimentAnalysisGoogle(text_content):
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '../your_credentials_file.json'
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT
    document = {'content': text_content, 'type_': type_}
    encoding_type = language_v1.EncodingType.UTF8
    response = client.analyze_sentiment(request={'document': document, 
                                                 'encoding_type': encoding_type})
    return {'score' : response.document_sentiment.score , 
            'magnitude' : response.document_sentiment.magnitude}

Team

About

📊 Technitrade lets user track a portfolio of stocks, periodically getting buy, sell, or hold opinions based on ML technical and sentiment analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published