Option Tree in HATR causes instability? #532
-
Been experimenting with HATR on my personal datasets and noticed severe instabilities of predictions (sudden spikes of errors), which I couldn't counter by hyperparam tuning (spikes changed in volume, appeared/disappeared additional spikes, but the pattern remained) Played with HATR internals in return pred / len(found_nodes) to return pred inside Added logging and saw stuff like this:
Unfortunately cannot share the concrete dataset here because reasons. @MaxHalford, could you give some insights? tested on river v03.03.2021 UPD: add details |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
I believe @smastelini will have more insights, he's our tree hugger :) |
Beta Was this translation helpful? Give feedback.
-
Here's the deal: option-vote (of sorts, as registered in my old comment in the code) is an unspecified "feature" of HATC and HATR. It took me a while to understand the inner workings of the code Firstly some background about these votes: usually only one path from the root to a leaf is evaluated. HAT* trees, however, might carry alternate branches if a concept drift was previously detected. Once the "background" subtree surpasses the original one in performance, they are swapped and the old branch is discarded. The idea with the option votes (I can only conjecture since it is not formally discussed) is to leverage the predictive power of a subtree specialized in the new concept to alleviate performance drops. Only alternate subtrees with Okay, so in your case error spikes are perceived. I can think of some possibilities:
That said, I don't think removing the denominator is a viable solution. In your case, the nodes are clearly underestimating the true value, i.e., they are biased towards predicting values that are lower than the expected output. That's why summing up the nodes' responses removes the spikes. But this workaround is data-dependent and should not reflect all situations. Imagine a toy example where all target values are positive and the nodes are fairly accurate: by just adding the multiple predictions, the answers would be much higher than the expected outputs. So, what are our options if we indeed want to change HATR a bit?
|
Beta Was this translation helpful? Give feedback.
Here's the deal: option-vote (of sorts, as registered in my old comment in the code) is an unspecified "feature" of HATC and HATR. It took me a while to understand the inner workings of the code
skmultiflow
inherited from MOA (that's why I left the comments for future generations). This non-documented change was recently discussed in this paper. The authors claim that the option votes usually bring benefits concerning the performance, but this conclusion not necessarily applies to all cases. To be fair, it's an unspecified feature of HATC, since HATR is not backed up by a research paper. So, from start, we have been always walking a tightrope with this one, because no extensive benchmarki…