Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress bar to RandomForest #33

Open
kenryd opened this issue Aug 30, 2011 · 1 comment
Open

Add progress bar to RandomForest #33

kenryd opened this issue Aug 30, 2011 · 1 comment

Comments

@kenryd
Copy link
Collaborator

kenryd commented Aug 30, 2011

classify.RandomForest uses the VIGRA library, which is written in C++ and there's not a straightforward way to have a progress bar. Instead, we could train a 1-tree random forest before beginning the full training to see how long it takes to train a tree.

On one test dataset, indications are that this would provide a good estimate of total time (i.e., initial overhead and other factors won't mess things up), since we get the following times for training 1, 2, and 3 trees:

1 tree: 118.812502146 seconds
2 trees: 236.317131042 seconds
3 trees: 351.319090128 seconds

Each successive tree is very close to a multiple of the 1-tree training time. The progress bar could then be based on time.

@jni
Copy link
Owner

jni commented Aug 31, 2011

A few more thoughts on this:

  • using the threading library we could avoid any time wastage at all: train the 2 or 3 tree forest in one thread, full forest in another. The GIL doesn't actually stop C++ processes so these could run concurrently.
  • Taking this idea further, we could use vigra simply as a fast tree classifier, and make all the forest code Python side. This would enable multithreading (currently vigra is not), meaning 8-16-fold speedup. We could also build a feature-selection stage into the classifier.
  • We could then also swap out the vigra trees for something else if we found it, which is great since vigra is a pain to install and thus not on the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants