Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing i386 test #105

Merged
merged 3 commits into from Nov 21, 2018
Merged

Fix failing i386 test #105

merged 3 commits into from Nov 21, 2018

Conversation

falkben
Copy link
Contributor

@falkben falkben commented Nov 21, 2018

The problem was that we were splitting in such a way that resulted in an empty node. This only happened on i386 and windows.

For reference, the error occurred on the test predictions file, on the first OOB prediction:

test_that("Iris OOB Predictions", {
  # Build as large of trees as possible
  forest <- RerF(X, Y,
    seed = 1L, num.cores = 1L, store.oob = TRUE, min.parent = 1,
    max.depth = 0
  )
  oob.predictions <- OOBPredict(X, forest, num.cores = 1L)
  accuracy <- mean(Y == oob.predictions)
  expect_equal(accuracy, 143 / 150)

  # Limit depth of trees
  forest <- RerF(X, Y,
    seed = 1L, num.cores = 1L, store.oob = TRUE, min.parent = 1,
    max.depth = 2
  )
  oob.predictions <- OOBPredict(X, forest, num.cores = 2L)
  accuracy <- mean(Y == oob.predictions)
  expect_equal(accuracy, 133 / 150)
})

4 potential solutions:

  1. Set a new seed. Yes... this actually gets the tests to pass...
  2. if ClProb has any NaNs, we set it to ClProb <- rep(1/length(ClassCounts),length(ClassCounts)) and continue building the tree This also passes tests, but just is messy. It would result in at least one node with zero elements. Also, when doing predictions, elements that traverse the tree down to the node with zero elements would not yield any useful information. I tested this and it passes as well.
  3. if ClProb has any NaNs we throw out the tree and start over. This is a better solution in that we won't ever result in trees that have nodes of zero length. The one issue with this is that having to restart growing trees like this could get us into trouble (slightly slower, possible infinite loop in some weird edge cases), but the problem occurs so rarely that this shouldn't be a problem. And it has the advantage of never having trees with nodes of zero length.
  4. the proper, more long term solution, would be to fix the cause of the error - we should never split in the first place when the result would be a node of zero length. This is a fix which would take me a longer amount of time and I think we should tackle in the next release.

I chose option 3 for this PR. I vote we do option 4 in the next Sprint.

check each node and rebuild tree if no members in node

in testGeneral use the rerf command that generated the bad tree

remove the progress bar on urerf test
@falkben falkben added the bug label Nov 21, 2018
@falkben falkben requested review from j1c and MrAE November 21, 2018 20:33
@falkben falkben added this to In progress in Refactor via automation Nov 21, 2018
Copy link
Member

@j1c j1c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with option 3.

@MrAE
Copy link
Collaborator

MrAE commented Nov 21, 2018

I agree with option 3 as well for now

@falkben falkben merged commit 1035f02 into staging Nov 21, 2018
Refactor automation moved this from In progress to Done Nov 21, 2018
@falkben falkben deleted the fix-failing-i386-test branch November 21, 2018 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Refactor
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants