Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spread Modeling - ML Approach #25

Merged
merged 37 commits into from
Feb 15, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c87c461
Init spread modeling commit
AaronDeb Nov 23, 2020
83c6265
misc stash
AaronDeb Nov 25, 2020
dc777af
Added latest changes
AaronDeb Nov 30, 2020
f016d50
Fixed filter indexing issue
AaronDeb Dec 1, 2020
9d80cf2
Set full Coverage/test commit
AaronDeb Dec 5, 2020
bb59564
Added keras requirement
AaronDeb Dec 5, 2020
952daf7
Added tensorflow requirement
AaronDeb Dec 5, 2020
7731607
Merge branch 'develop' into spread_modeling
PanPip Dec 10, 2020
9aaf52b
Update license links in Spread Modeling - ML Approach
PanPip Dec 10, 2020
a96bcc5
Merge branch 'develop' into spread_modeling
PanPip Dec 15, 2020
c3a67bb
Slightly messy commit, showcases docs
AaronDeb Dec 15, 2020
03fdff4
Merge
AaronDeb Dec 15, 2020
abe6537
removed deprecated files
AaronDeb Dec 15, 2020
18bf05b
Added new tests and full coverage
AaronDeb Dec 21, 2020
4183742
Fixed lint/coverage issues
AaronDeb Dec 21, 2020
9dee091
Reversed some changes
AaronDeb Dec 21, 2020
b263050
Added option to dynamically set the Open/Close columns in the dataset…
AaronDeb Dec 22, 2020
74dd683
Added more docstrings/comments and tidied up some sections
AaronDeb Dec 29, 2020
97bac50
Improve pylint for Spread Modelling
PanPip Jan 21, 2021
cfd4fe3
Minor code style fixes for Spread Modelling
PanPip Jan 21, 2021
e463d19
Improve docs style for Spread Modelling
PanPip Jan 21, 2021
2eb8faf
Small config file fix
PanPip Jan 22, 2021
7bd6cc6
Added latest changes.
AaronDeb Feb 10, 2021
223646e
Merge branch 'spread_modeling' of https://github.com/hudson-and-thame…
AaronDeb Feb 10, 2021
c7da3b0
pylint fix
AaronDeb Feb 10, 2021
8ea404c
another minor pylint fix
AaronDeb Feb 10, 2021
44b382d
Docs fixes
AaronDeb Feb 11, 2021
53b1d0d
Merge branch 'develop' into spread_modeling
PanPip Feb 12, 2021
eb0ee51
Minor code adjustments Spread Modelling
PanPip Feb 12, 2021
cfa318b
Minor docs adjustments Spread Modelling
PanPip Feb 12, 2021
31462b4
fixes for PR comments
AaronDeb Feb 13, 2021
e82445b
pylint fixes
AaronDeb Feb 13, 2021
1c3ee2f
Added changelog
AaronDeb Feb 13, 2021
4ea2ac1
Merge branch 'develop' into spread_modeling
PanPip Feb 15, 2021
7f72a8b
Update versions in Spread Modeling
PanPip Feb 15, 2021
180cfe7
Minor docs adjustments Spread Modelling
PanPip Feb 15, 2021
f025d1b
Added installation warning note
PanPip Feb 15, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions arbitragelab/ml_approach/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,8 @@
"""

from arbitragelab.ml_approach.pairs_selector import PairsSelector
from arbitragelab.ml_approach.tar import TAR
from arbitragelab.ml_approach.feature_expander import FeatureExpander
from arbitragelab.ml_approach.regressor_committee import RegressorCommittee
from arbitragelab.ml_approach.filters import ThresholdFilter, CorrelationFilter, VolatilityFilter
from arbitragelab.ml_approach.neural_networks import MultiLayerPerceptron, RecurrentNeuralNetwork, PiSigmaNeuralNetwork
156 changes: 156 additions & 0 deletions arbitragelab/ml_approach/feature_expander.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Copyright 2019, Hudson and Thames Quantitative Research
# All rights reserved
# Read more: https://hudson-and-thames-arbitragelab.readthedocs-hosted.com/en/latest/additional_information/license.html
"""
This module implements the Feature Expansion class.
"""

import itertools
import numpy as np
import pandas as pd

# This silencer is related to dangerous-default-value in __init__
# pylint: disable=W0102

class FeatureExpander:
"""
Higher-order term Feature Expander implementation. The implementation consists
of two major parts. The first part consists of using a collection of orthogonal
polynomials' coefficients, ordered from lowest order term to highest. The implemented
series are [chebyshev, legendre, laguerre, power] polynomials. The second part is a combinatorial
version of feature crossing, which involves the generation of feature collections
of the n order and multiplying them together. This can be used by adding [product]
in the 'methods' parameter in the constructor.
"""

def __init__(self, methods: list = [], n_orders: int = 1):
"""
Initializes main variables.

:param methods: (list) Possible expansion methods [chebyshev, legendre,
laguerre, power, product].
:param n_orders: (int) Number of orders.
"""

self.methods = methods
self.n_orders = n_orders
self.dataset = None

@staticmethod
def _chebyshev(series: pd.Series, degree: int):
"""
Retrieves the chebyshev polynomial coefficients of a specific
degree.

:param series: (pd.Series) Series to use.
:param degree: (int) Degree to use.
:return: (np.array) Resulting polynomial.
"""

return np.polynomial.chebyshev.chebvander(series, degree)

@staticmethod
def _legendre(series: pd.Series, degree: int):
"""
Retrieves the legendre polynomial coefficients of a specific
degree.

:param series: (pd.Series) Series to use.
:param degree: (int) Degree to use.
:return: (np.array) Resulting polynomial.
"""

return np.polynomial.legendre.legvander(series, degree)

@staticmethod
def _laguerre(series: pd.Series, degree: int):
"""
Retrieves the laguerre polynomial coefficients of a specific
degree.

:param series: (pd.Series) Series to use.
:param degree: (int) Degree to use.
:return: (np.array) Resulting polynomial.
"""

return np.polynomial.laguerre.lagvander(series, degree)

@staticmethod
def _power(series: pd.Series, degree: int):
"""
Retrieves the power polynomial coefficients of a specific
degree.

:param series: (pd.Series) Series to use.
:param degree: (int) Degree to use.
:return: (np.array) Resulting polynomial.
"""

return np.polynomial.polynomial.polyvander(series, degree)

@staticmethod
def _product(series: pd.Series, degree: int):
"""
Implements the feature crossing method of feature expansion,
which involves the generation of feature groups and appending
the resulting product to the original series.

:param series: (pd.Series) Series to use.
:param degree: (int) Degree to use.
:return: (pd.DataFrame) Resulting polynomial.
"""

# Get feature count.
comb_range = range(len(series[0]))

# Generate N degree combinations in relation to feature count.
combinations = [list(comb) for comb in itertools.combinations(comb_range, degree)]

vectorized_x = pd.DataFrame(series)

# N-wise product for C combinations.
return [np.prod(vectorized_x.iloc[:, comb], axis=1) for comb in combinations]

def fit(self, frame: pd.DataFrame):
"""
Stores the dataset inside the class object.

:param frame: (np.array) Dataset to store.
"""

self.dataset = frame
return self

def transform(self) -> pd.DataFrame:
"""
Returns the original dataframe with features requested from
the 'methods' parameter in the constructor.

:return: (pd.DataFrame) Original DataFrame with the expanded values appended to it.
"""

new_dataset = []

for row in self.dataset:
expanded_row = list(row)
for meth in self.methods:
if meth != "product":
# Dynamically call the needed method using 'getattr'.
math_return = getattr(self, '_' + meth)(row, self.n_orders)
# Ravel result and concatenate it to expanded_row.
expanded_row.extend(np.ravel(math_return))

new_dataset.append(np.ravel(expanded_row).tolist())

new_dataset_df = pd.DataFrame(new_dataset)

if "product" in self.methods:
# Dynamically call the '_product' method using 'getattr' method.
prod_result = getattr(self, '_product')(self.dataset, self.n_orders)
# Transpose the result to make it compatible with original structure.
prod_result_df = pd.DataFrame(prod_result).T

# Return concatenated dataset parts.
return pd.concat([new_dataset_df, prod_result_df], axis=1)

return new_dataset_df