Skip to content

Commit

Permalink
pystacknet
Browse files Browse the repository at this point in the history
  • Loading branch information
kaz-Anova committed Sep 4, 2018
0 parents commit 7785cc2
Show file tree
Hide file tree
Showing 11 changed files with 2,713 additions and 0 deletions.
21 changes: 21 additions & 0 deletions LICENSE.txt
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2017 Marios Michailidis

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
90 changes: 90 additions & 0 deletions README.md
@@ -0,0 +1,90 @@
## About

`pystacknet` is a light python version of [StackNet](https://github.com/kaz-Anova/StackNet) which was originally made in Java.

It supports many of the original features, with some new elements.


## Installation

```
git clone https://github.com/h2oai/pystacknet
cd pystacknet
python setup.py install
```

## New features

`pystacknet`'s main object is a 2-dimensional list of sklearn type of models. This list defines the StackNet structure. This is the equivalent of [parameters](https://github.com/kaz-Anova/StackNet#parameters-file) in the Java version. A representative example could be:

```
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
models=[
######## First level ########
[RandomForestClassifier (n_estimators=100, criterion="entropy", max_depth=5, max_features=0.5, random_state=1),
ExtraTreesClassifier (n_estimators=100, criterion="entropy", max_depth=5, max_features=0.5, random_state=1),
GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=5, max_features=0.5, random_state=1),
LogisticRegression(random_state=1)
],
######## Second level ########
[RandomForestClassifier (n_estimators=200, criterion="entropy", max_depth=5, max_features=0.5, random_state=1)]
]
```

`pystacknet` is not as strict as in the `Java` version and can allow `Regressors`, `Classifiers` or even `Transformers` at any level of StackNet. In other words the following could work just fine:

```
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier,GradientBoostingRegressor
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.decomposition import PCA
models=[
[RandomForestClassifier (n_estimators=100, criterion="entropy", max_depth=5, max_features=0.5, random_state=1),
ExtraTreesRegressor (n_estimators=100, max_depth=5, max_features=0.5, random_state=1),
GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=5, max_features=0.5, random_state=1),
LogisticRegression(random_state=1),
PCA(n_components=4,random_state=1)
],
[RandomForestClassifier (n_estimators=200, criterion="entropy", max_depth=5, max_features=0.5, random_state=1)]
]
```

**Note** that not all transformers are meaningful in this context and you should use it at your own risk.


## Parameters

A typical usage for classification could be :

```
from pystacknet.pystacknet import StackNetClassifier
model=StackNetClassifier(models, metric="auc", folds=4,
restacking=False,use_retraining=True, use_proba=True,
random_state=12345,n_jobs=1, verbose=1)
model.fit(x,y)
preds=model.predict_proba(x_test)
```
Where :


Command | Explanation
--- | ---
models | List of models. This should be a 2-dimensional list . The first level hould defice the stacking level and each entry is the model.
metric | Can be "auc","logloss","accuracy","f1","matthews" or your own custom metric as long as it implements (ytrue,ypred,sample_weight=)
folds | This can be either integer to define the number of folds used in `StackNet` or an iterable yielding train/test splits.
restacking | True for [restacking](https://github.com/kaz-Anova/StackNet#restacking-mode) else False
use_proba | When evaluating the metric, it will use probabilities instead of class predictions if `use_proba==True`
use_retraining | If `True` it does one model based on the whole training data in order to score the test data. Otherwise it takes the average of all models used in the folds ( however this takes more memory and there is no guarantee that it will work better.)
random_state | WInteger for randomised procedures
n_jobs | Number of models to run in parallel. This is independent of any extra threads allocated
n_jobs | Number of models to run in parallel. This is independent of any extra threads allocated from the selected algorithms. e.g. it is possible to run 4 models in parallel where one is a randomforest that runs on 10 threads (it selected).
verbose | Integer value higher than zero to allow printing at the console.
1 change: 1 addition & 0 deletions pystacknet/__init__.py
@@ -0,0 +1 @@
__version__ = '0.0.1'
157 changes: 157 additions & 0 deletions pystacknet/metrics.py
@@ -0,0 +1,157 @@
# -*- coding: utf-8 -*-
"""
Created on Fri Aug 31 18:33:58 2018
@author: Marios Michailidis
metrics and method to check metrics used within StackNet
"""

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score , mean_squared_log_error #regression metrics
from sklearn.metrics import roc_auc_score, log_loss ,accuracy_score, f1_score ,matthews_corrcoef
import numpy as np

valid_regression_metrics=["rmse","mae","rmsle","r2","mape","smape"]
valid_classification_metrics=["auc","logloss","accuracy","f1","matthews"]

############ classification metrics ############

def auc(y_true, y_pred, sample_weight=None):
return roc_auc_score(y_true, y_pred, sample_weight=sample_weight)

def logloss(y_true, y_pred, sample_weight=None):
return log_loss(y_true, y_pred, sample_weight=sample_weight)

def accuracy(y_true, y_pred, sample_weight=None):
return accuracy_score(y_true, y_pred, sample_weight=sample_weight)

def f1(y_true, y_pred, sample_weight=None):
return f1_score(y_true, y_pred, sample_weight=sample_weight)

def matthews(y_true, y_pred, sample_weight=None):
return matthews_corrcoef(y_true, y_pred, sample_weight=sample_weight)

############ regression metrics ############

def rmse(y_true, y_pred, sample_weight=None):
return np.sqrt(mean_squared_error(y_true, y_pred, sample_weight=sample_weight))

def mae(y_true, y_pred, sample_weight=None):
return mean_absolute_error(y_true, y_pred, sample_weight=sample_weight)

def rmsle (y_true, y_pred, sample_weight=None):
return np.sqrt(mean_squared_log_error(y_true, y_pred, sample_weight=sample_weight))

def r2(y_true, y_pred, sample_weight=None):
return r2_score(y_true, y_pred, sample_weight=sample_weight)


def mape(y_true, y_pred, sample_weight=None):
y_true = y_true.ravel()
y_pred = y_pred.ravel()
if sample_weight is not None:
sample_weight = sample_weight.ravel()
eps = 1E-15
ape = np.abs((y_true - y_pred) / (y_true + eps)) * 100
ape[y_true == 0] = 0
return np.average(ape, weights=sample_weight)


def smape(y_true, y_pred, sample_weight=None):

y_true = y_true.ravel()
y_pred = y_pred.ravel()
if sample_weight is not None:
sample_weight = sample_weight.ravel()
eps = 1E-15
sape = (np.abs(y_true - y_pred) / (0.5 * (np.abs(y_true) + np.abs(y_pred)) + eps)) * 100
sape[(y_true == 0) & (y_pred == 0)] = 0
return np.average(sape, weights=sample_weight)


"""
metric: string or class that returns a metric given (y_true, y_pred, sample_weight=None)
Curently supported metrics are "rmse","mae","rmsle","r2","mape","smape"
"""


def check_regression_metric(metric):

if type(metric) is type(None):
raise Exception ("metric cannot be None")
if isinstance(metric, str) :
if metric not in valid_regression_metrics:
raise Exception ("The regression metric has to be one of %s " % (", ".join([str(k) for k in valid_regression_metrics])))
if metric=="rmse":
return rmse,metric
elif metric=="mae":
return mae,metric
elif metric=="rmsle":
return rmsle,metric
elif metric=="r2":
return r2,metric
elif metric=="mape":
return mape,metric
elif metric=="smape":
return smape,metric
else :
raise Exception ("The metric %s is not recognised " % (metric) )
else : #customer metrics is given
try:
y_true_temp=[[1],[2],[3]]
y_pred_temp=[[2],[1],[3]]
y_true_temp=np.array(y_true_temp)
y_pred_temp=np.array(y_pred_temp)
sample_weight_temp=[1,0.5,1]
metric(y_true_temp,y_pred_temp, sample_weight=sample_weight_temp )
return metric,"custom"

except:
raise Exception ("The custom metric has to implement metric(y_true, y_pred, sample_weight=None)" )


"""
metric: string or class that returns a metric given (y_true, y_pred, sample_weight=None)
Curently supported metrics are "rmse","mae","rmsle","r2","mape","smape"
"""


def check_classification_metric(metric):

if type(metric) is type(None):
raise Exception ("metric cannot be None")
if isinstance(metric, str) :
if metric not in valid_classification_metrics:
raise Exception ("The classification metric has to be one of %s " % (", ".join([str(k) for k in valid_classification_metrics])))
if metric=="auc":
return auc,metric
elif metric=="logloss":
return logloss,metric
elif metric=="accuracy":
return accuracy,metric
elif metric=="r2":
return r2,metric
elif metric=="f1":
return f1,metric
elif metric=="matthews":
return matthews,metric
else :
raise Exception ("The metric %s is not recognised " % (metric) )
else : #customer metrics is given
try:
y_true_temp=[[1],[0],[1]]
y_pred_temp=[[0.4],[1],[0.2]]
y_true_temp=np.array(y_true_temp)
y_pred_temp=np.array(y_pred_temp)
sample_weight_temp=[1,0.5,1]
metric(y_true_temp,y_pred_temp, sample_weight=sample_weight_temp )
return metric,"custom"

except:
raise Exception ("The custom metric has to implement metric(y_true, y_pred, sample_weight=None)" )





0 comments on commit 7785cc2

Please sign in to comment.