Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cookbook - CARTree - classification tree #3282

Merged
merged 1 commit into from
Nov 22, 2016

Conversation

OXPHOS
Copy link
Member

@OXPHOS OXPHOS commented Jun 12, 2016

  • CART takes a SGVector in parameter list. What I had in the code is the most reasonable way I found to initialize a SGVector in meta - but it didn't work. I got error:
/Users/zora/Github/shogun/build/examples/meta/cpp/classifier/randomforest.cpp:32:16: error: 
      no template named 'CSGVector'; did you mean 'SGVector'?
auto ft = some<CSGVector<bool>>(2);
               ^~~~~~~~~
               SGVector
/Users/zora/Github/shogun/src/shogun/lib/SGString.h:22:25: note: 'SGVector'
      declared here
template<class T> class SGVector;
  • Need more work on the introduction of cart and I didn't check the style yet as make cookbook fails because of the changes I made for the first point.

@karlnapf
Copy link
Member

It should be easier to use a bool vector...

@karlnapf
Copy link
Member

@OXPHOS currently the meta language does not support

  • instantiating SG* types
  • accessing elements within

I think we should add both. @sorig do you have an idea how much work that is? We would need to adjust the include path stuff (similar to what @OXPHOS did in this patch), and then wrapping the [] and () operators might be nice as well. What do you think?

@OXPHOS Let's see what @sorig says before we continue here

@OXPHOS
Copy link
Member Author

OXPHOS commented Jun 21, 2016

I rebased based on #3285 but got the same error here and at local.

I removed parser.out, parsetab.py, parsetab.pyc, lextab.py and lextab.pyc in examples/meta/generator.

I was using:

#![set_attribute_types]
BoolVector ft(2)
ft[0] = False
ft[1] = False
#![set_attribute_types]

@OXPHOS OXPHOS mentioned this pull request Jun 21, 2016
@karlnapf
Copy link
Member

  1. Remove the build dir
  2. Clean up the examples/meta/generator/ parser output

It works on my machine. Ask travis, it might confirm

@sorig
Copy link
Member

sorig commented Aug 11, 2016

Let me know if you need help with this.

Java error is caused by java.json#L14 not using the correct type. You need to figure out how BoolVectors are mapped to Java and use the appropriate type here.

C# is also an error with mapped types. Take the generated C# file, figure out how to make it compile, and update the csharp.json file.

cartree.cs(26,22): error CS1502: The best overloaded method match for `CARTree.CARTree(BoolVector, EProblemType, int, bool)' has some invalid arguments
/opt/shogun/build/src/interfaces/csharp_modular/modshogun.dll (Location of the symbol related to previous error)
cartree.cs(26,34): error CS1503: Argument `#1' cannot convert `bool[]' expression to type `BoolVector'

The errors in the other languages are probably similar, but need investigation.

@OXPHOS
Copy link
Member Author

OXPHOS commented Aug 19, 2016

@sorig Hey I have got some results, but not solutions...
So I need to use a new BoolVector for cartree, but I looked into the errors in Java and found that there's no BoolVector type in jblas. I think this is why Heiko commented out this.
If we just use 0 and 1 for false and true, apparently csharp doesn't like it. And use IntVector instead of BoolVector won't work either. So I am not sure how to deal with the problem.

There's another problem in Java, even if the above one is solved, also appeared in #3303:

//![set_attribute_types]
DoubleMatrix ft = new DoubleMatrix(2);
ft.put(0, 0);
ft.put(1, 0);
//![set_attribute_types]

If I create a vector of length 2 in meta language (and translated in cartree.java as shown in the above code), and pass the vector to the constructor of CCarTree:

//![create_instance]
CARTree classifier = new CARTree(ft, PT_MULTICLASS, 5, true);
//![create_instance]

and output the length of the vector in CCarTree constructor:

CCARTree::CCARTree(SGVector<bool> attribute_types, EProblemType prob_type, int32_t num_folds, bool cv_prune): CTreeMachine<CARTreeNodeData>()
{
    std::cerr << "vector length: " << attribute_types.vlen << std::endl;
}

I will get: vector length: 1, when ft has size of 2.
I have tried initializing a matrix of size(1, 2) or size(2, 1), and I got the same error.
So in summary whatever size vector/matrix I pass to the constructor of CCarTree as param, it will end up with size of 1. I have no idea why this is happening. I am not sure whether you could help look into this? Thanks!

@OXPHOS
Copy link
Member Author

OXPHOS commented Sep 8, 2016

@karlnapf any thoughts? : )

@karlnapf
Copy link
Member

@OXPHOS I can look at this, shall I just copy the meta example in this PR? Does it reproduce the errors.
With Java, there is a problem with strong types and implicit numerical downcasting ... Ill look intot his

@OXPHOS
Copy link
Member Author

OXPHOS commented Sep 13, 2016

@karlnapf Maybe you could reproduce the error by using the code in #3303, where the error is simpler: When passing a vector

//![set_attribute_types]
DoubleMatrix ft = new DoubleMatrix(2);
ft.put(0, 0);
ft.put(1, 0);
//![set_attribute_types]

to the constructor:

CCHAIDTree(int32_t dependent_vartype, SGVector<int32_t> feature_types, int32_t num_breakpoints=0);

The vector will have size of 1.

@karlnapf
Copy link
Member

The "other" problem should be solved in #3451
Looking into the other tomorrow or so.

@karlnapf
Copy link
Member

karlnapf commented Sep 28, 2016

There is another problem with csharp and boolean vectors --- we don't have a typemap, which means we cannot pass boolean vectors as input in the csharp interface, see #3452

This, sadly, means that for now, we have to change this example to not using boolean vectors that are hand-constructed as parameters. Instead, we can create an instance of the (mapped) BoolVector as an object and then use a method (say zero) to populate it (hacky)

@karlnapf
Copy link
Member

Can you rebase? Should work now

@karlnapf
Copy link
Member

I guess this needs an updated data version and a squash. It is all ready otherwise

@karlnapf
Copy link
Member

Can you please run the tests locally (at least the cpp version) before putting things into the PR. This would have been detected locally.

Is done with
ctest -R integration_meta_cpp-multiclass_classifier-cartree -V

@karlnapf
Copy link
Member

See the error is coming from a missing shogun-data update

The generated test passed:
https://travis-ci.org/shogun-toolbox/shogun/jobs/175608246#L3717

But checking against reference data failed:
https://travis-ci.org/shogun-toolbox/shogun/jobs/175608246#L4322

@karlnapf
Copy link
Member

And it even tells you the line where it differs:
https://travis-ci.org/shogun-toolbox/shogun/jobs/175608246#L4427

@OXPHOS
Copy link
Member Author

OXPHOS commented Nov 16, 2016

@karlnapf Thanks..! I saw only some of the tests failed so I thought it should be some specific errors without checking closely.. I updated the test dataset and it worked for me at local now.

@karlnapf
Copy link
Member

So let's get it in then :)

@OXPHOS
Copy link
Member Author

OXPHOS commented Nov 17, 2016

Ruby and CSharp don't like 0 as False value, and it might also be true for Octave. But it is weird that in R tests the integration_meta_cpp-multiclass_classifier-cartree failed as the generated output didn't match ref. I'll check whether there is chance to generate different results from the algorithm.

@karlnapf
Copy link
Member

Ah sigh, I thought I had solved that. But we have a mechanic to fix this...checking ...

@karlnapf
Copy link
Member

@OXPHOS rebase against develop, then it should work (I tried locally for all interfaces that failed in the last build here)

@OXPHOS OXPHOS force-pushed the cookbook_cartree branch 3 times, most recently from f3aac11 to 3ebab4e Compare November 21, 2016 06:15
@karlnapf
Copy link
Member

karlnapf commented Nov 21, 2016

@OXPHOS You should always try these things locally before you push. At least in one modular language.

So here is the listing I used (no fixed random seed)
https://gist.github.com/karlnapf/6cf4186dc77861681ceba938f397f2c0

Note that meta language syntax is True False

Also, check out the docs for git commit --amend, which allows you to update your commit, so that the PR doesnt have like 5 commits, but just one

@karlnapf
Copy link
Member

I dont understand the failure in octave. Checking again. The rest looks fine

@karlnapf
Copy link
Member

Ok so works locally for oactave, I restarted the build
Just need to squash the commits, then we should be able to merge (fingers crossed travis is fine)

@karlnapf
Copy link
Member

karlnapf commented Nov 21, 2016

This seems to be specific to the travis setup, I don't have this problem locally ....
EDIT: Managed to reproduce now.

@karlnapf
Copy link
Member

karlnapf commented Nov 21, 2016

@OXPHOS I solved the problem with octave. Man, this is a beast to get merged.
But I think you can squash the next time before you push ( after rebasing against develop once #3559 is merged, EDIT: it is merged now), since all the rest works

@OXPHOS
Copy link
Member Author

OXPHOS commented Nov 22, 2016

@karlnapf It finally works! Thanks for the advice and sorry that I skipped the local test because of laziness.

@karlnapf
Copy link
Member

Nice one, this was a big effort to get these Boolean vectors working. But it really helped in figuring out corner cases of the whole system. Thanks for the patience :)

@karlnapf karlnapf merged commit 18204b2 into shogun-toolbox:develop Nov 22, 2016
Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor style updates would be good :)

Classification And Regression Tree
==================================

Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decition tree learning is based on trees as predictive models. (Remove the second sentence)


Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value.

Decision trees are mostly used as the following two types:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two types of decision trees:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better: There are decision trees for both classification (integer-valued) and regression (real-valued).

- Classification tree, where the predicted outcome is the class to which the data belongs.
- Regression tree, where predicted outcome can be considered a real number.

Classification And Regression Tree (CART) algorithm is an umbrella method that can be applied to generate both classification tree and regression tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification and regression tree (CART) algorithm is an umbrella method ....


Classification And Regression Tree (CART) algorithm is an umbrella method that can be applied to generate both classification tree and regression tree.

In this example, we showed how to apply CART algorithm to multi-class dataset and predict the labels with classification tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, we show how to apply CART to multi-class datasets.

Remove the rest and fix typos


.. sgexample:: cartree.sg:create_features

We set the type of each predictive attribute (true for nominal, false for ordinal/continuous)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These words you can also use above when you talk about regression/classification values

.. sgexample:: cartree.sg:set_attribute_types

We create an instance of the :sgclass:`CCARTree` classifier by passting it the attribute types and the tree type.
We can also set the number of subsets used in cross-valiation and whether to use cross-validation pruning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference to a wiki article would be good here

abhinavrai44 pushed a commit to abhinavrai44/shogun that referenced this pull request Jan 24, 2017
karasikov pushed a commit to karasikov/shogun that referenced this pull request Apr 15, 2017
karasikov pushed a commit to karasikov/shogun that referenced this pull request Apr 15, 2017
cookbook - CARTree -  classification tree
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants