Random Forests In Python
I started this project to better understand the way decision trees and random forests work. At this point the classifiers are only based off the gini-index and the regression models are based off the mean square error. Both the classifiers and regression models are built to work with datasets that are lists of lists, where the target variable values are the right most column. It can also work with datasets that use Pandas DataFrames and Pandas Series.
The dependencies for this project are rather minimal, including,
You can install all the dependencies using pip (except for python and Sphinx) by entering into the commandline,
pip install -r requirements.txt
>>> dataset = [[2.771244718, 1.784783929, 0], [1.728571309, 1.169761413, 0], [3.678319846, 2.81281357, 1], [3.961043357, 2.61995032, 1], [2.999208922, 2.209014212, 0], [7.497545867, 3.162953546, 0], [9.00220326, 3.339047188, 1], [7.444542326, 0.476683375, 1], [10.12493903, 3.234550982, 0], [6.642287351, 3.319983761, 1]] >>> >>> data_point = pd.Series([2.0, 23.0], index=['feature_1','feature_2']) >>> import pandas as pd >>> df = pd.DataFrame(data=dataset,columns =['feature_1','feature_2','target']) >>> >>> from TreeMethods.DecisionTreeClassifier import DecisionTreeClassifier >>> tree = DecisionTreeClassifier(max_depth=2,min_size=1) >>> tree.fit(df,target='target') >>> >>> tree.predict(data_point) 0 >>> >>> from TreeMethods.RandomForestClassifier import RandomForestClassifier >>> forest = RandomForestClassifier(n_trees=10 max_depth=5, min_size=1) >>> forest.fit(df, target='target') >>> forest.predict(data_point) 0
To test the code type the following command from the terminal in the
More tests will be added in the near future.
To build the documentation on your local machine type the following commands from
sphinx-apidoc -F -o doc/ TreeMethods/
Then cd into the
doc/ directory and type,
The html documentation will be in the directory
_build/html/. Open the file