Option Tree in HATR causes instability? #532

ColdTeapot273K · 2021-03-19T11:15:49Z

ColdTeapot273K
Mar 19, 2021

Been experimenting with HATR on my personal datasets and noticed severe instabilities of predictions (sudden spikes of errors), which I couldn't counter by hyperparam tuning (spikes changed in volume, appeared/disappeared additional spikes, but the pattern remained)

Played with HATR internals in river/tree/hoeffding_adaptive_tree_regressor.py, esp the Option Tree part and came to realization that changing

return pred / len(found_nodes)

to

return pred

inside predict_one method solved the problem (spikes disappeared). I guess this abandons the Option Tree algorithm which implies averaging results from several nodes.

Added logging and saw stuff like this:

on spikes the terminal leaves predicted 9 (true value was around 10-11 for a bunch of datapoints), but Option Tree prediction was 6 and the output was 6 (so systemic episodic underprediction)
the relevant node was just below the root node (so closest to the root without being the root)

Unfortunately cannot share the concrete dataset here because reasons.

@MaxHalford, could you give some insights?

tested on river v03.03.2021

UPD: add details

Answered by smastelini

Mar 19, 2021

Here's the deal: option-vote (of sorts, as registered in my old comment in the code) is an unspecified "feature" of HATC and HATR. It took me a while to understand the inner workings of the code skmultiflow inherited from MOA (that's why I left the comments for future generations). This non-documented change was recently discussed in this paper. The authors claim that the option votes usually bring benefits concerning the performance, but this conclusion not necessarily applies to all cases. To be fair, it's an unspecified feature of HATC, since HATR is not backed up by a research paper. So, from start, we have been always walking a tightrope with this one, because no extensive benchmarki…

View full answer

MaxHalford · 2021-03-19T11:25:03Z

MaxHalford
Mar 19, 2021
Maintainer

I believe @smastelini will have more insights, he's our tree hugger :)

1 reply

smastelini Mar 19, 2021
Maintainer

Tree hugger, I guess that fits my current description haha

My PhD research involves trees and more trees :D

smastelini · 2021-03-19T14:26:01Z

smastelini
Mar 19, 2021
Maintainer

Here's the deal: option-vote (of sorts, as registered in my old comment in the code) is an unspecified "feature" of HATC and HATR. It took me a while to understand the inner workings of the code skmultiflow inherited from MOA (that's why I left the comments for future generations). This non-documented change was recently discussed in this paper. The authors claim that the option votes usually bring benefits concerning the performance, but this conclusion not necessarily applies to all cases. To be fair, it's an unspecified feature of HATC, since HATR is not backed up by a research paper. So, from start, we have been always walking a tightrope with this one, because no extensive benchmarking was performed with this tree algorithm, as far as I'm concerned.

Firstly some background about these votes: usually only one path from the root to a leaf is evaluated. HAT* trees, however, might carry alternate branches if a concept drift was previously detected. Once the "background" subtree surpasses the original one in performance, they are swapped and the old branch is discarded. The idea with the option votes (I can only conjecture since it is not formally discussed) is to leverage the predictive power of a subtree specialized in the new concept to alleviate performance drops. Only alternate subtrees with depth > 0 are considered for this end.

Okay, so in your case error spikes are perceived. I can think of some possibilities:

false alarms. HATR starts building alternate branches after incorrect drift signals. So it could be a matter of changing the ADWIN confidence level and make the detector more conservative in its drift signals.
there are indeed drifts, but the subtrees are not ready yet for the real-world. I am not sure I follow your example. Did the option-vote performed better here?

but Option Tree prediction was 6 and the output was 6

That said, I don't think removing the denominator is a viable solution. In your case, the nodes are clearly underestimating the true value, i.e., they are biased towards predicting values that are lower than the expected output. That's why summing up the nodes' responses removes the spikes. But this workaround is data-dependent and should not reflect all situations. Imagine a toy example where all target values are positive and the nodes are fairly accurate: by just adding the multiple predictions, the answers would be much higher than the expected outputs.

So, what are our options if we indeed want to change HATR a bit?

Use weighted votes to aggregate the main tree's predictions and the alternate ones (accordingly to their error on recent data). O, if only I could count on how many occasions I was tempted to do that 🤣 But this would be yet another unspecified feature, so let's abstain from that one for now (Note that HATC does weight the predicted probabilities).
Add an option to disable option-votes. Much simpler. In this case, HATR should travel only one path until it reaches a terminal node. No fancy alternate tree visits. So, it should be up to the user to decide whether to use this feature or not. Feel free to implement that!

4 replies

ColdTeapot273K Mar 19, 2021
Author

Thanks for the in-depth dive!

I should clarify, that I got better predictions (w/o spikes) w/o the denominator and worse predictions (w/ spikes) w/ the denominator.
I.e. w/o denominator I get prediction=9 (vs truth=10), while w/ denominator I get prediction=6 (vs truth=10).

I testing the effect on different dataset and got inconclusive results. I'd sum up my experience as:

depending on the case, removing the denominator may or may not give better error pattern (no spikes, lower variance, etc.)
removing the denominator changes the error pattern (shape, like occasional highs/lows/spikes)

So I suppose there's some value in optionally disabling/enabling it, i.e. in your second proposed solution. I can implement that.

But I'm still eager to experiment with ADWIN and what not (so far I tweaked mainly grace_period to adjust drift sensitivity).

smastelini Mar 19, 2021
Maintainer

I suggest changing adwin_confidence and drift_window_threshold (in this relevance order) to tweak the drift detection capabilities.

ColdTeapot273K Mar 24, 2021
Author

@smastelini okay so, I've tried tuning adwin_confidence & drift_window_threshold, logarithmicaly (changing orders of magnitude). What I found was:

increasing adwin_confidence & drift_window_threshold from defaults made things worse
decreasing adwin_confidence & drift_window_threshold from defaults made things not [significantly] better

Acceptable prediction quality with default Option Trees behaviour was achieved when I added GroupDetrender which turned out to be as good as "no denominator" behaviour with no GroupDetrender (a simpler model with fewer moving parts)

Note that my data, from high-level view, is:

with trend
hetereskedastic
multidimensional

I've managed to reproduce that phenomena on an open-source dataset btw, I can attach later, so as to support the claim.

So I think there's some reason to have an option to turn off Option Trees, if "no denominator" is what it does. @smastelini please verify if I understand that correctly, since Option Trees is not just averaging but traversing aswell which I haven't addessed here, the node traversal code in HATR predict_one looks kinda cryptic to me 😌

smastelini Mar 24, 2021
Maintainer

Hi @ColdTeapot273K.

increasing adwin_confidence & drift_window_threshold from defaults made things worse

decreasing adwin_confidence & drift_window_threshold from defaults made things not [significantly] better

If the first option makes things worse, then I have reasons to believe that your data is indeed changing in some way.

with trend

hetereskedastic

Indeed, non-stationary data. I see why you chose HATR (concept drift adaptation capabilities). But keep in mind: if your data has strong temporal dependencies, river's decision trees are not your best bet, at least not in their pure form. Hoeffding Trees (in their most basic formulation) assume i.i.d. data, so now I can have a better picture of what is happening. If you want to use classic data stream models as multivariate time series forecasters, the temporal dependencies must be encoded into the training data somehow. I'm far from being an expert, but I believe feature engineering is a viable solution.

So I think there's some reason to have an option to turn off Option Trees, if "no denominator" is what it does. @smastelini please verify if I understand that correctly, since Option Trees is not just averaging but traversing aswell which I haven't addessed here, the node traversal code in HATR predict_one looks kinda cryptic to me

HATR mimics the behavior of Option Trees when making predictions, that's why I keep saying "option votes (of sorts)". Option Trees have a special kind of decision node that sends the incoming samples to all of its branches, rather than choosing only a branch (for instance, with the classical <= or > binary test). The responses of all the reached leaves are used to produce the final output. I added bringing Option Trees to river in our roadmap. In the card, you can find the paper for the regression case. HATR does not have option nodes, but it carries alternate subtrees whose building is triggered by a concept drift adaptation.

HATR sends incoming samples to all the reachable nodes, be these nodes members of the "main" tree or members of the "hidden" alternate subtrees. Each decision node could potentially carry one alternate/background subtree in construction. Therefore, we average the predictions in predict_one.

As a final remark, let me share some intuition regarding the hyperparameters that you've adjusted:

adwin_confidence: determines "how sure" about a concept drift ADWIN has to be in order to trigger an alarm. I now realize that this parameter name is confusing since it is in fact the significance level: confidence = 1 - adwin_confidence :/. Pinging @jacobmontiel, who was involved in writing the original code in skmultiflow. Should we refactor that?
drift_window_threshold: after drift is signaled in a decision node, a subtree is started and rooted at this decision node. The window threshold determines how many instances the alternate subtree has to monitor before being eligible to replace the main one.

I hope this delve in into HATR might help you! :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option Tree in HATR causes instability? #532

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Option Tree in HATR causes instability? #532

ColdTeapot273K Mar 19, 2021

Replies: 2 comments · 5 replies

MaxHalford Mar 19, 2021 Maintainer

smastelini Mar 19, 2021 Maintainer

smastelini Mar 19, 2021 Maintainer

ColdTeapot273K Mar 19, 2021 Author

smastelini Mar 19, 2021 Maintainer

ColdTeapot273K Mar 24, 2021 Author

smastelini Mar 24, 2021 Maintainer

ColdTeapot273K
Mar 19, 2021

Replies: 2 comments 5 replies

MaxHalford
Mar 19, 2021
Maintainer

smastelini Mar 19, 2021
Maintainer

smastelini
Mar 19, 2021
Maintainer

ColdTeapot273K Mar 19, 2021
Author

smastelini Mar 19, 2021
Maintainer

ColdTeapot273K Mar 24, 2021
Author

smastelini Mar 24, 2021
Maintainer