Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive Random Forest Regressor/Hoeffding Tree Regressor splitting gives AttributeError #393

Closed
JPalm1 opened this issue Nov 24, 2020 · 23 comments · Fixed by #394
Closed
Assignees
Labels

Comments

@JPalm1
Copy link

JPalm1 commented Nov 24, 2020

Versions

river version: River 0.1.0
Python version:Python 3.7.1
Operating system: Windows 10 Enterprise v1803

Describe the issue

Have been playing around with River/Creme for a couple of weeks now, and it's super useful for a project I am currently working on. That being said, I'm still pretty new to the workings of the algorithms, so unsure whether this is a bug or an issue with my setup.

When I call learn_one on the ARFR or HTR, I receive: "Attribute Error "NoneType" object has no attribute '_left'" from line 113 in best_evaluated_split_suggestion.py.

I have implemented a delayed prequential evaluation algorithm, and inspecting the loop, the error seems to be thrown when the model after the first delay period has been exceeded - ie when the model can first make a prediction that isn't zero. Before this point, learn_one doesn't throw this error.

Currently, I am using the default ARFR as in the example, with a simple Linear Regression as the leaf_model. The linear regression model itself has been working with my data, when not used in the ARFR. I want to try the ensemble/tree models with the data to see if accuracy is improved due to the drift detection methods that are included.

Has anyone also seen this error being thrown, or know the causes of it? Let me know if more information is needed! Thanks.

@MaxHalford
Copy link
Member

@smastelini I guess this is for you :)

@MaxHalford MaxHalford added the Bug label Nov 24, 2020
@smastelini
Copy link
Member

Hi @JPalm1! Thanks for the detailed explanation. I'll check your provided example to try reproducing the same error. From your description, it does sound like a bug.

@smastelini
Copy link
Member

smastelini commented Nov 24, 2020

Hi again :)

I've created #394 to fix the issue. If I may, let me put in my two cents regarding this situation:

  • The way evaluate.progressive_val_score deals with delayed labels is by not calling learn_one in the underlying learners when the label is missing.
  • Your problem most probably arises because in your implementation y=None is passed to the learners. Due to the "it's better to ask forgiveness than permission" principle, we do not check for this situation in the trees. That's why you get an error: the structures used to monitor target statistics are not initialized yet. PR Protect trees against missing feature values and missing labels #394 shall avoid that errors are raised, but I suggest proceeding as progressive_val_score does.

@JPalm1, could you confirm that the fix in #394 solves your issue?

@JPalm1
Copy link
Author

JPalm1 commented Nov 24, 2020

Hey, @smastelini - thanks for looking into it!

That explanation makes sense. Just to note: I'm not set up to use evaluate.progressive_val_score - I haven't set everything up in a pipeline yet, I am just doing the 'standard' approach of iterating through my data points, predicting for each and then evaluating if the delay time has been exceeded. Therefore I'm calling learn_one(x, y) on the ARFR model every time there is an item to be popped out of the delayed evaluation queue.

The changes you have made in #394 seem to have solved the initial exception but pushed the exception to a different location: this time line 223 in numeric_attribute_regression_observer.py

Let me know if you need any more info - I'm having a dig around too to try further my own understanding as well. :)

@smastelini
Copy link
Member

My bad! I missed one additional place to protect the trees. The fix is in place.

Anyhow, my point is that the error appears because only valid labels should be passed to the learning models. I am assuming that None is being passed as target to the trees in your custom implementation. Am I wrong?

@JPalm1
Copy link
Author

JPalm1 commented Nov 24, 2020

Yes, you're right - sorry, I was being slow.

Thanks for sorting that - the ARFR now runs fine and doesn't trip up for my input data.

I do get an exception when I run the HAT, line 208 of heffding_adaptive_tree_regressor.py due to division by zero. I am unsure whether this is related to the initial issue, but this seems like a simple fix. (I'm not set up with a proper build environment, hence just posting the code block!)

def predict_one(self, x):
        if self._tree_root is not None:
            found_nodes = self._filter_instance_to_leaves(x, None, -1)
            pred = 0.
            
            if len(found_nodes) == 0:
                return pred

            for fn in found_nodes:
                # parent_branch == -999 means that the node is the root of an alternate tree.
                # In other words, the alternate tree is a single leaf. It is probably not accurate
                # enough to be used to predict, so skip it
                if fn.parent_branch != -999:
                    leaf_node = fn.node
                    if leaf_node is None:
                        leaf_node = fn.parent
                    # Option Tree prediction (of sorts): combine the response of all leaves reached
                    # by the instance
                    pred += leaf_node.leaf_prediction(x, tree=self)
            # Mean prediction among the reached leaves
            return pred / len(found_nodes)
        else:
            return 0'''

@smastelini
Copy link
Member

Thanks for reporting that!

I do get an exception when I run the HAT, line 208 of heffding_adaptive_tree_regressor.py due to division by zero. I am unsure whether this is related to the initial issue, but this seems like a simple fix.

That's interesting, indeed. From my understanding, the division error happens because the list of found nodes is empty. So, in summary, your solution is correct :)
But here's the problem, if the tree root is not None, then either one of these portions is reached:

  1. If the root is a leaf:
    def filter_instance_to_leaves(self, x, parent, parent_branch, found_nodes):
    found_nodes.append(FoundNode(self, parent, parent_branch))
  2. If the root is a decision node:
    def filter_instance_to_leaves(self, x, parent, parent_branch, found_nodes):
    child_index = self.instance_child_index(x)
    if child_index >= 0:
    child = self.get_child(child_index)
    if child is not None:
    child.filter_instance_to_leaves(x, parent, parent_branch, found_nodes)
    else:
    found_nodes.append(FoundNode(None, self, child_index))
    if self._alternate_tree is not None:
    self._alternate_tree.filter_instance_to_leaves(x, self, -999, found_nodes)

I am struggling to understand how the len() of found_nodes could be zero. I mean, your proposed solution works, but the reason why the problem happens in the first place is not clear. Could you please provide more information about the data and setup you are currently using to test the tree models?

Well, I might have a crazy idea about why the problem is happening, haha. Just a wild guess: are some of your features categorical? In this case, if you call predict_one with an instance containing a category not seen by the model so far, the tree would not be able to correctly sorting this case. This "emerging feature value" situation is an interesting corner case. Well, the said feature would need to be used in some of the decision nodes to trigger the situation I mentioned.

Could you please confirm whether or not this is the case with your data? If so, I will also add a comment to warn future contributors about the reason for adding this seemingly unnecessary check.

@JPalm1
Copy link
Author

JPalm1 commented Nov 24, 2020

Hm, that is interesting. I'm also confused as to why the length is zero... In 2. is it possible for child_index<0 & self._alternate_tree == None? That's the only thing I spotted in your code snippets, at least. I will step through my code tomorrow and let you know if I find anything more.

For my data, I have encoded the categorical variables to be numerical values, but the feature set is quite sparse. There are about 250 features that are shared between all instances, and then up to ~500 more, depending on the incoming data. Most of the features are grouped, such that if condition X is met, there will be around ~250 extra features on top of the existing 250 core features; 250 more if condition Y is met. In the case where either condition X or Y is not met, the features are NaN and excluded from the input dictionary. I will double check tomorrow morning, but I'm pretty sure before the point this exception is met, some datapoints have had conditions X and Y to be true.

The features are all numerical: the majority are scaled using the StandardScaler; the boolean are left as 0 or 1; the categorical are encoded, but the values never exceed 1.

After processing each data type, I concatenate the three dictionaries to use in predict_one, then add to the queue to fit the model, and calculate metrics after the delay period is complete.

(As a side note: is there an existing feature, or planned feature, that would allow you to assign feature types in the input dictionary, which would then allow the different data types to be processed separately but in the same pipeline? IE some get sent to StandardScaler, some get sent to a different encoder? Apologies if this is a feature already, I may have missed it!)

My plan is to step through tomorrow to get a better understanding, as I can imagine all this info may be a bit vague to you right now, but thanks for your help!

@JPalm1
Copy link
Author

JPalm1 commented Nov 25, 2020

Hi @smastelini, just had a step through, and it seems my hypothesis yesterday is correct. In the case where found_nodes=0, the variables child_index=-1 and self._alternate_tree=None. For my case, this happened in the first recursive call: ie after line 287 had been called once.

Again, I'm unsure whether this is a bug, or whether my setup is causing the issue, however putting in the quick length fix sorts that issue, but trips me up again later:

Line 247 in hatr_nodes.py "AttributeError: 'NumericAttributeBinaryTest' object has no attribute 'add_new_branch'"

This seems to be the issue you hypothesised, as the clause this line is in is commented to be for when the instance contains a categorical value previously unseen by the split node. Which is weird, as inspecting my input dictionary all values are numerical as mentioned previously.. Would having some inputs as floats, and others as ints be an issue? I will try converting them all to floats and see what happens.

@smastelini
Copy link
Member

Hi @JPalm1!

Hm, that is interesting. I'm also confused as to why the length is zero... In 2. is it possible for child_index<0 & self._alternate_tree == None? That's the only thing I spotted in your code snippets, at least.

Yes, it is! _alternate_tree is None if a concept drift was not detected yet. Hoeffding Adaptive Tree Regressor (HATR) starts training another tree in the background once a drift is detected. This is the alternate tree. Hence, in a stationary problem, we expect that the alternate tree will always be empty. The fact that the children index is smaller than 0 means that no branch was found to accommodate the instance traversing down the tree. I found this odd because if your features are indeed all numerical, it's just a matter of checking whether or not they are smaller or equal to some given threshold. Are you sure your data do not contain None? (the actual data passed to the trees)

The tree module automatically treats integers and floats as numerical features and anything else as categorical. Perhaps the occurrence of None could be influencing how the trees "see" the features.

@smastelini
Copy link
Member

(As a side note: is there an existing feature, or planned feature, that would allow you to assign feature types in the input dictionary, which would then allow the different data types to be processed separately but in the same pipeline? IE some get sent to StandardScaler, some get sent to a different encoder? Apologies if this is a feature already, I may have missed it!)

You can check compose.SelectType for that! :D

Please, also check the examples section of iSOUPTreeRegressor to see that in action.

@smastelini
Copy link
Member

I think I found your culprit :D

def branch_for_instance(self, x):
if self._att_idx not in x:
return -1

The branching returns -1 if a split feature is absent from your instance. Such an interesting case that is prone to happen in real-world situations! I am quite happy about facing this challenge hahaha
Using a sparse representation, such as dictionaries, gives us a lot of flexibility but also brings additional challenges :D

I am working on a solution. As soon as I finish it, I'll let you know :D

@JPalm1
Copy link
Author

JPalm1 commented Nov 25, 2020

I found this odd because if your features are indeed all numerical, it's just a matter of checking whether or not they are smaller or equal to some given threshold. Are you sure your data do not contain None? (the actual data passed to the trees)

Stepping through, the x at the tree level seems to be all numerical (I remove the None variables before interacting with the model). But could this be the case still if the models assume that features not present in the dictionary, that have been present before, are 'None'?

It's interesting that numerical features are also classed as categorical, which explains the error I'm seeing here:

Line 247 in hatr_nodes.py "AttributeError: 'NumericAttributeBinaryTest' object has no attribute 'add_new_branch'"

Will hold off til when your fix is in to worry about this one though, thanks for looking into it!

@smastelini
Copy link
Member

It's interesting that numerical features are also classed as categorical, which explains the error I'm seeing here:

Yes! This corner-case did not happen before in skmultiflow because we always assumed all features were presented. I added a fix in #394. Since what started with ARFReg ended up being related to missing data handling and sparse representations, I will update the description of the PR.

@JPalm1, could you please check if the fix works for you?

@JPalm1
Copy link
Author

JPalm1 commented Nov 25, 2020

The HATR is now running smoothly for me with no errors - thanks for taking the time to look into this @smastelini !

It's not giving me the best predictions, however, but that's on me to investigate... :)

@smastelini
Copy link
Member

No worries @JPalm1! It was a nice adventure :)

Just keep in mind:

  • If a decision node receives an instance (plus a label), but the split feature is not present, the tree does not utilize all the information in x for learning purposes. It simply stops the tree update in the last manageable node.

In your case, a feature that does not appear very often might be very relevant for your problem.

In the future, and that's a possible research venue, one can explore some alternatives:

  1. Randomly choosing a branch to sort the instance down the tree in case the decision feature is missing
  2. Pass the instance to both branches
  3. ... other strategies

The impact of those choices is a compelling research subject to investigate! The original Hoeffding Tree framework does not deal with such a situation as far as I am concerned.

@jacobmontiel, @hmgomes, and professor @abifet, this might be an interesting problem to consider in the future. Any comments?

@jacobmontiel
Copy link
Contributor

Well, that was quite some reading. Thanks to all involved in the interesting discussion.

Missing data is unavoidable in real-world applications. My comment is in two main aspects:

  1. regarding this particular case. When the data dictionaries are created some features are skipped, which results in missing data as the tree has "seen" those features in the past, in other words, they are not "emerging features". In this case, the missing data is "generated" and it means something (a given condition was met). My suggestion is that the data should be properly set to some value to indicate that the condition was met.

  2. In a general sense, missing data imputation is a relevant topic that has been partially addressed in streaming data. Options 1 and 2 above seems reasonable. For simplicity, option 1 does the job and provides protection against missing data, this is the strategy used by XGBoost for example. In practice is like replacing the missing data with a constant value. We can argue that this is not the most elegant imputation method but it does the job. On the other hand, option 2 implies an overhead when processing complex trees and very sparse data.

@JPalm1
Copy link
Author

JPalm1 commented Nov 27, 2020

Thanks for the replies @smastelini , @jacobmontiel.

For my specific case, having played around with the model the past day or so, the accuracy seems to be lower than other models I have implemented/baselines I have calculated - so potentially the sparsity of data, and thus not utilizing the full feature set, is causing me issues as you suspected. I do need to tune hyperparameters though, an which may improve things. I will also try and impute some values to those that are non-existent and see if that also improves things - thanks both for the suggestions!

@smastelini
Copy link
Member

Hi @JPalm1. You might get some performance boost due to the last changes implemented in #394. Please, let me know if the models indeed improve in your setup.

@JPalm1
Copy link
Author

JPalm1 commented Dec 1, 2020

Hi @smastelini, after pulling the changes, there doesn't appear to be any major accuracy improvements in the HT. Thanks for letting me know though! Will keep an eye out for any improvements.

As a side not, do any of the changes you have implement affect the HST anomaly detection tree? Previously, I was able to input dictionaries of changing lengths (due to some inputs being NaN) - now when I do that I get the exception from line 36 in anomaly\base.py if a new feature appears/previous feature dissapears from the input dictionary. Thanks!

Note: The algorithm does work when I keep the NaN values in the input dictionary.

@smastelini
Copy link
Member

Hi @JPalm1, this question is out of the scope of the decision tree module. Could you please open another issue to directly tackle this problem?

Meanwhile, @MaxHalford do you have any idea of what might be happening?

@MaxHalford
Copy link
Member

Yes the half-space trees don't work if a feature that is used in a node split is missing. It's a bit the same problem. I could cook something up :). I'll create an issue to keep track of this.

On a side note, @smastelini, I noticed that you put all the base tree code from creme into anomaly/base.py. I guess in the future we want to have a common code for all tree related models? :)

@smastelini
Copy link
Member

smastelini commented Dec 2, 2020

Hey @MaxHalford. During the merge I kept the basic tree structure shared by the previous decision tree and HST. I moved it to the anomaly submodule because it is the only place where this structure is used now.

Ideally we should use a single shared tree representation. It's on my plans to bring another family of tree learners soon to river, and they do not fit on the current conventions defined by BaseHoeffdingTree. So I am very interest in such discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants