Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build Status codecov made-with-python

Random Forests In Python


Intoduction


I started this project to better understand the way Decision trees and random forests work. At this point the classifiers are only based off the gini-index and the regression models are based off the mean square error. Both the classifiers and regression models are built to work with Pandas and Scikit-Learn

Examples

Basic classification example using Scikit-learn:

from randomforests import RandomForestClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer()

cols = [dataset.data[:,i] for i in range(4)]

X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)

pipe   = Pipeline([("forest", RandomForestClassifier())])

params = {"forest__max_depth": [1,2,3]}

grid   = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model  = grid.fit(X_train,y_train)

preds  = model.predict(X_test)

print("Accuracy: ", accuracy_score(preds, y_test))

>> Accuracy:  0.9020979020979021

Basic regression example using Scikit-learn:

from randomforests import RandomForestRegressor
from sklearn.metrics import r2_score,
from sklearn.datasets import load_boston
dataset = load_boston()

cols = [dataset.data[:,i] for i in range(4)]

X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)

pipe   = Pipeline([("forest", RandomForestRegressor())])

params = {"forest__max_depth": [1,2,3]}

grid   = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model  = grid.fit(X,y)

preds  = model.predict(X_test)

print("R^2 : ", r2_score(y_test,preds))

>> R^2 : 0.37948488681649484

Installing


Uses the setup.py generated by PyScaffold. To install the library in development mode use the following:

python setup.py install

Test


Uses the setup.py generated by PyScaffold:

python setup.py test

Dependencies


Dependencies are minimal:

- Python (>= 3.6)
- [Scikit-Learn](https://scikit-learn.org/stable/) (>=0.23)
- [Pandas](https://pandas.pydata.org/) (>=1.0)

References


You can’t perform that action at this time.