Skip to content

funatsu-lab/Stochastic-Threshold-Model-Trees

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stochastic Threshold Model Trees

Stochastic Threshold Model Trees provides reasonable extrapolation predictions for physicochemical and other data that are expected to have a certain degree of monotonicity.

Requirements

Requirements for notebook

Installation

You can install the repository into your local environment by the following command.

$ pip install git+https://github.com/funatsu-lab/Stochastic-Threshold-Model-Trees.git

Examples

As shown in the figure below, the proposed method makes predictions that reflect the trend of the sample near the extrapolation area.

discontinuous_Proposed_5sigma

Sphere_Proposed_MLR_noise_scaling

1dim_comparison

Usage

The module is imported and used as follows.

from StochasticThresholdModelTrees.regressor.stmt import StochasticThresholdModelTrees
from StochasticThresholdModelTrees.threshold_selector import NormalGaussianDistribution
from StochasticThresholdModelTrees.criterion import MSE
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
model = StochasticThresholdModelTrees(
  n_estimators=100, # The number of regression trees to create
  criterion=MSE(), # Criteria for setting divisional boundaries
  regressor=LinearRegression(), # Regression model applied to each terminal node
  threshold_selector=NormalGaussianDistribution(5), # Parameters for determining the candidate division boundary
  min_samples_leaf=1.0, # Minimum number of samples required to make up a node
  max_features='auto', # Number of features to consider for optimal splitting
  f_select=True, # Whether or not to choose features to consider when splitting
  ensemble_pred='median', # During the ensemble, whether to take the mean or the median
  scaling=False, # Whether to perform standardization as a pre-processing to each terminal node
  random_state=None
  )
data = pd.read_csv('./data/logSdataset1290.csv', index_col=0)
X = data[data.columns[1:]]
y = data[data.columns[0]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model.fit(X_train, y_train)
y_pred = model.predict(X_test) # Model predictions

Reference:

Stochastic Threshold Model Trees: A Tree-Based Ensemble Method for Dealing with Extrapolation

License

MIT

About

A tree-based ensemble method for dealing with extrapolation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published