Skip to content

Code for my talk on regression trees, prediction power and generating confidence intervals.

Notifications You must be signed in to change notification settings

suvasama/presRandomForests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The repository contains the codes used for my Women in Data Science conference presentation. You can find the slides for my presentation here. I compared the performance of several random forests packages and showed how to generate confidence intervals for such models.

I am using California Housing dataset for estimates. You can download the dataset here. A modified version of the dataset is available at Kaggle.

I fitted the model using four R models from different packages: linear regression from base R, random forest estimates from ranger and grf and extreme boosting from xgboost. A minimal amount of hyperparameter tuning was performed to improve the performance xgboost.

The repository is organized as follows

  1. Load and preprocess data here.
  2. Visualize the data using point plots, maps and decision trees. I used the original dataset for visualizations and the preprocessed dataset to estimate the models.
  3. Estimate the models. That is, fit the models, make predictions and compute confidence intervals for predictions.Choose the optimal amount of trees by cross validation for xgboost. Also, plot figures of the most important features chosen by the models. Estimate variance and confidence intervals using grf.

A snapshot of names and versions of packages I used is available here.

About

Code for my talk on regression trees, prediction power and generating confidence intervals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages