Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML based Pairs Selection #5

Merged
merged 37 commits into from
Dec 9, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a0f1c69
Initial commit for code, docs and unit test
AaronDeb Nov 8, 2020
294cde6
Got everything up to coverage; with some exclusions like plotting rel…
AaronDeb Nov 10, 2020
6ba5f69
Code style fixes
AaronDeb Nov 11, 2020
68171b2
Fixed doc style issues; still have the research notebook link
AaronDeb Nov 11, 2020
c65baf8
Added some spacings in the doc strings
AaronDeb Nov 11, 2020
c24e82e
More styling fixes and refactor of get_ticker_sector_info method in t…
AaronDeb Nov 11, 2020
c4b0984
Added some more comments
AaronDeb Nov 12, 2020
0872119
Merge branch 'develop' into ml_based_pairs_selection
PanPip Nov 12, 2020
cd62486
Added documentation section for the data importer
AaronDeb Nov 13, 2020
26a2e3c
Merge branch 'ml_based_pairs_selection' of https://github.com/hudson-…
AaronDeb Nov 13, 2020
6bb1b08
Fix ML Approach requirements
PanPip Nov 13, 2020
b3e3ae4
Fix pylint in init ML Approach
PanPip Nov 13, 2020
059d469
Add lxml to requirements
PanPip Nov 13, 2020
8fb4724
Data Importer unit test fix
AaronDeb Nov 13, 2020
80d574c
Merge branch 'develop' into ml_based_pairs_selection
PanPip Nov 13, 2020
4daed63
Added more fixes
AaronDeb Nov 14, 2020
d8758d7
Merge branch 'ml_based_pairs_selection' of https://github.com/hudson-…
AaronDeb Nov 14, 2020
f84719e
Typo fixes, added images to data importer docs
AaronDeb Nov 16, 2020
ded9734
More typo fixes
AaronDeb Nov 16, 2020
17bb707
Set limit on pair plotter and added time_delta parameter
AaronDeb Nov 16, 2020
47f9eee
Merge branch 'develop' into ml_based_pairs_selection
PanPip Nov 16, 2020
83b55f4
Remove Risk Metrics from init
PanPip Nov 16, 2020
5a5dd0f
Moved test files
AaronDeb Nov 17, 2020
b2ff7f1
Added introduction section to ML approach section
AaronDeb Nov 17, 2020
aade331
Fixed latest comments
AaronDeb Nov 18, 2020
621b70d
Merge branch 'develop' into ml_based_pairs_selection
PanPip Nov 19, 2020
6f89bff
Add ML Approach import to secure import
PanPip Nov 19, 2020
220fced
Added Knee Plot title
AaronDeb Nov 20, 2020
bcd264b
Merge branch 'ml_based_pairs_selection' of https://github.com/hudson-…
AaronDeb Nov 20, 2020
c1cf7b6
Added doc warning relating to pair selection performance
AaronDeb Nov 20, 2020
8109c4c
Added ht clients research repo link in docs
AaronDeb Nov 20, 2020
e9f7c2e
Fixed unit test to handle time varying values from asset getter methods
AaronDeb Nov 20, 2020
aad7c7e
Added single pair plotting method
AaronDeb Nov 20, 2020
a9bc418
More Fixes
AaronDeb Nov 23, 2020
0a1d6fe
Added progress bar to Select Pairs step - ML Pairs Selection
PanPip Nov 24, 2020
d743cd6
Added my changes to changelog
AaronDeb Nov 26, 2020
8a98d2e
Merge branch 'develop' into ml_based_pairs_selection
PanPip Nov 26, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions arbitragelab/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
ArbitrageLab helps portfolio managers and traders who want to leverage the power of statistical arbitrage by providing
reproducible, interpretable, and easy to use tools.
"""

from arbitragelab.ml_based_pairs_selection import pairs_selector
from arbitragelab.ml_based_pairs_selection import data_importer
7 changes: 7 additions & 0 deletions arbitragelab/ml_based_pairs_selection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
This module implements the ML Based Pairs Selection Technique
"""

from arbitragelab.ml_based_pairs_selection.pairs_selector import PairsSelector

AaronDeb marked this conversation as resolved.
Show resolved Hide resolved
from arbitragelab.ml_based_pairs_selection.data_importer import DataImporter
121 changes: 121 additions & 0 deletions arbitragelab/ml_based_pairs_selection/data_importer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""
This module is a user data helper wrapping various yahoo finance libraries
"""

import pandas as pd

import yfinance as yf
import yahoo_fin.stock_info as ys


class DataImporter:
"""
Wrapper class that imports data from yfinance and yahoo_fin.
"""

@staticmethod
def get_sp500_tickers() -> list:
"""
Gets all S&P 500 stock tickers.

:return tickers: (list) : list of tickers
"""
tickers_sp500 = ys.tickers_sp500()

return tickers_sp500

@staticmethod
def get_dow_tickers() -> list:
"""
Gets all DOW stock tickers.

:return tickers: (list) : list of tickers
"""
tickers_dow = ys.tickers_dow()

return tickers_dow

@staticmethod
def remove_nuns(dataframe: pd.DataFrame, threshold: int = 100) -> pd.DataFrame:
"""
Remove tickers with nulls in value over a threshold.

:param dataframe: (pd.DataFrame) : Asset price data
:param threshold: (int) : The number of null values allowed
:return dataframe: (pd.DataFrame) : Price Data without any null values.
"""
null_sum_each_ticker = dataframe.isnull().sum()
tickers_under_threshold = \
null_sum_each_ticker[null_sum_each_ticker <= threshold].index
dataframe = dataframe[tickers_under_threshold]

return dataframe

@staticmethod
def get_price_data(tickers: list,
start_date: str,
end_date: str,
interval: str = '5m') -> pd.DataFrame:
"""
Get the price data with custom start and end date and interval.
For daily price, only keep the closing price.

:param tickers: (list) : List of tickers to download
:param start_date: (str) : Download start date string (YYYY-MM-DD)
:param end_date: (str) : Download end date string (YYYY-MM-DD)
:param interval: (str) : Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
:return price_data: (pd.DataFrame) : The requested price_data.
"""
price_data = yf.download(tickers,
start=start_date, end=end_date,
interval=interval,
group_by='column')['Close']

return price_data

@staticmethod
def get_returns_data(price_data: pd.DataFrame) -> pd.DataFrame:
"""
Calculate return data with custom start and end date and interval.

:param price_data: (pd.DataFrame) : Asset price data
:return returns_df: (pd.DataFrame) : Price Data converted to returns.
"""
returns_data = price_data.pct_change()
returns_data = returns_data.iloc[1:]

return returns_data

def get_ticker_sector_info(self, tickers: list, yf_call_chunk: int = 20) -> pd.DataFrame:
"""
This method will loop through all the tickers, using the yfinance library
do a ticker info request and retrieve back 'sector' and 'industry' information.

:param tickers: (list) : List of asset symbols
:param yf_call_chunk: (int) : Ticker values allowed per 'Tickers' object. This should always be less than 200.
:return augmented_tickers: (pd.DataFrame) : DataFrame with input asset tickers and their respective sector and industry information
"""

if len(tickers) > yf_call_chunk:
ticker_sector_queue = []
for i in range(0, len(tickers), yf_call_chunk):
end = i+yf_call_chunk if i <= len(tickers) else len(tickers)
ticker_sector_queue.append(
self.get_ticker_sector_info(tickers[i: end]))
return pd.concat(ticker_sector_queue, axis=0).reset_index(drop=True)

tckrs = yf.Tickers(' '.join(tickers))

tckr_info = []

for i, tckr in enumerate(tickers):
try:
ticker_info = tckrs.tickers[i].info
tckr_info.append(
(tckr, ticker_info['industry'], ticker_info['sector']))
except ValueError: # pragma: no cover # can only happen if the server sends back corrupted data
pass
except RuntimeError: # pragma: no cover # can only happen if server is down
pass

return pd.DataFrame(data=tckr_info, columns=['ticker', 'industry', 'sector'])