Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overfitting or am I getting the ruleset interpretation wrong? #3

Closed
pagojo opened this issue Oct 8, 2012 · 2 comments
Closed

Overfitting or am I getting the ruleset interpretation wrong? #3

pagojo opened this issue Oct 8, 2012 · 2 comments

Comments

@pagojo
Copy link

pagojo commented Oct 8, 2012

By calling the ruleset method after train on a DecisionTree::ID3Tree object (setup for continuous data) I expect to get back a number of rules. Each rule I interpret as a series of ANDed clauses.

However, many times I get cases where the same attribute is repeated in a clause although it may be included in a clause above it.

e.g.,

attrib_1 < 0.02123562506819547
attrib_2 >= 0.1922781177611915
attrib_3 < 0.2879504779121489
attrib_4 < 0.26382498790056597
attrib_4 < 0.193308315974597
=> class1()

In the above case the second mention to attrib_4 is superfluous if the rule is interpreted as:

if attrib_1 <  0.02123562506819547 
and attrib_2 >= 0.1922781177611915
and attrib_3 < 0.2879504779121489
and attrib_4 < 0.26382498790056597
and attrib_4 < 0.193308315974597
then class1()
end

So, am I wrong to assume a chain of ANDed clauses?
If not, then is the second occurrence of attrib_4 a sign of overfitting which I can safely ignore?
Could this just be a bug?

@igrigorik
Copy link
Owner

Have you tried graphing the actual object?
https://github.com/igrigorik/decisiontree/blob/master/lib/decisiontree/id3_tree.rb#L124

Might help understand the structure of your tree.

@pagojo
Copy link
Author

pagojo commented Oct 10, 2012

Cheers I did that, I had to install GraphViz and GraphR. One thing I noticed is that verison 0.3.2 can't be found on Rubygems.org though (for use by gem install or bundler).

My original posting was influenced by how AI4R does spit out the decision tree rules, which can then be evaled (or copy-pasted in the code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants