Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traversing each example in the first generated regression tree is wrong #12

Closed
Hugo101 opened this issue Jul 17, 2018 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@Hugo101
Copy link
Contributor

Hugo101 commented Jul 17, 2018

Take Dataset ToyCancer as an example. This is the first built regression tree (MaxDepth = 1):

   - [cancer(C):-  -0.16666666667,  cancer(C):- smokes(C) 0.5]

When building the second tree, we need the leaf node value for each example. As suggested in the ISSUE #11, 3 examples are split into left and right side respectively. Namely, three should be -1.666666, three should be 0.5.

However, This is the leaf node value for each example in the python version:
screen shot 2018-07-16 at 9 30 54 pm

I have found where the specific problematic code is.
The problematic function is learnTree in tree.py

 node.learnedDecisionTree.sort(key = len)
 node.learnedDecisionTree = node.learnedDecisionTree[::-1] #reverse

Here sorting would not be correct because of so many decimal numbers which would be counted to the length of str. Based on the above code, the first tree would be:

  - [cancer(C):-  -0.16666666667,  cancer(C):- smokes(C) 0.5]

Therefore, all leaf node values for each example would be -0.166666666667.

Solution: keep the value of each clause has the same number of decimals.

Do some modification in function expandOnBestTest in tree.py.
the original code:

if clause[-1]!='-':
    node.learnedDecisionTree.append(clause[:-1]+" "+str(Utils.getleafValue(self.examples)))
else:
    node.learnedDecisionTree.append(clause+" "+str(Utils.getleafValue(self.examples)))

new modified code:

from decimal import getcontext, Decimal #added by Changbin
if clause[-1]!='-':
    leafValue = Decimal(Utils.getleafValue(self.examples)).quantize(Decimal('0.000000')) #added by Changbin
    node.learnedDecisionTree.append(clause[:-1]+" "+str(leafValue))          
 else:
    leafValue = Decimal(Utils.getleafValue(self.examples)).quantize(Decimal('0.000000')) #added by Changbin
    node.learnedDecisionTree.append(clause+" "+str(leafValue))          

new result is:

  - [cancer(C):- smokes(C) 0.500000,   cancer(C):-  -0.166667]

which could be used to search each example in the tree correctly.

@hayesall hayesall changed the title 6: traversing each example in the first generated regression tree is wrong Traversing each example in the first generated regression tree is wrong Jul 17, 2018
@hayesall hayesall added this to the Match BoostSRL milestone Jul 17, 2018
@hayesall hayesall added the bug Something isn't working label Jul 17, 2018
@hayesall
Copy link
Owner

Resolved in fa9a104, closing issue.

@hayesall hayesall mentioned this issue Mar 23, 2019
hayesall added a commit that referenced this issue Mar 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants