add metaexample for CHAIDTree Regression #5065

Hephaestus12 · 2020-06-12T19:30:08Z

Also removed Random forest regression undocumented example as it is already ported.

gf712

Looks good, now you just have to add the data file :)

gf712 · 2020-06-12T20:46:54Z

examples/meta/src/regression/chaidtree.sg.in

+
+#![create_machine]
+Machine chaidtree = create_machine("CHAIDTree", dependent_vartype=2, feature_types=ft, num_breakpoints=50)
+chaidtree.set_labels(labels_train)


Could you pass labels either to the constructor or use put instead please?

Okay, I'll do that.

Hephaestus12 · 2020-06-12T21:05:03Z

Looks good, now you just have to add the data file :)

Isn't it already present?
I'm using the same dataset used for the CARTree meta example.

gf712 · 2020-06-12T21:10:22Z

Have a look at https://github.com/shogun-toolbox/docs/blob/master/DEVELOPING.md#adding-tests-2

Hephaestus12 · 2020-06-12T23:42:40Z

Have a look at https://github.com/shogun-toolbox/docs/blob/master/DEVELOPING.md#adding-tests-2

This says I should copy the chaidtree.dat file from the build/tests/meta/generated_results/cpp/ directory, but running
make meta_examples
as well as
make build_cpp_meta_examples
isnt generating the .dat file for chaidtree. Only the chaidtree.cpp meta example file is getting generated.(.dat files for other meta examples are getting generated though) What might be the reason for this?

gf712 · 2020-06-13T06:02:46Z

You need to also run the meta example (using ctest) and that will generate the output file. You should use the cpp meta example output

gf712 · 2020-06-13T08:18:33Z

examples/meta/src/regression/chaidtree.sg.in

+#![set_feature_types]
+
+#![create_machine]
+Machine chaidtree = create_machine("CHAIDTree", labels=labels_train, dependent_vartype=2, feature_types=ft, num_breakpoints=50)


actually the error is here: if you have a look at the parameter registration they all have a m_ prefix. You should remove that m_ for the parameter name

shogun/src/shogun/multiclass/tree/CHAIDTree.cpp

Lines 1395 to 1404 in 7483101

SG_ADD(&m_weights,"m_weights", "weights", ParameterProperties::READONLY);

SG_ADD(&m_weights_set,"m_weights_set", "weights set", ParameterProperties::READONLY);

SG_ADD(&m_feature_types,"m_feature_types", "feature types", ParameterProperties::SETTING);

SG_ADD(&m_dependent_vartype,"m_dependent_vartype", "dependent variable type", ParameterProperties::SETTING);

SG_ADD(&m_max_tree_depth,"m_max_tree_depth", "max tree depth", ParameterProperties::HYPER);

SG_ADD(&m_min_node_size,"m_min_node_size", "min node size", ParameterProperties::SETTING);

SG_ADD(&m_alpha_merge,"m_alpha_merge", "alpha-merge", ParameterProperties::HYPER);

SG_ADD(&m_alpha_split,"m_alpha_split", "alpha-split", ParameterProperties::HYPER);

SG_ADD(&m_cont_breakpoints,"m_cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING);

SG_ADD(&m_num_breakpoints,"m_num_breakpoints", "number of breakpoints", ParameterProperties::HYPER);

Oh okay, I'll do that.

Hephaestus12 · 2020-06-13T12:58:14Z

When I run make test I get the following error:

The following tests FAILED:
	 19 - python_legacy-distance_director_euclidean (Child aborted)
	105 - python_legacy-structure_factor_graph_model (NUMERICAL)
	106 - python_legacy-structure_graphcuts (NUMERICAL)
	233 - generated_cpp-regression-random_forest_regression (SEGFAULT)
	354 - generated_python-regression-random_forest_regression (SEGFAULT)

I removed the regression-random-forest undocumented example as the random forest regression meta example already exists. Is that what is leading to these errors?

gf712 · 2020-06-13T13:13:38Z

When I run make test I get the following error:
The following tests FAILED:
	 19 - python_legacy-distance_director_euclidean (Child aborted)
	105 - python_legacy-structure_factor_graph_model (NUMERICAL)
	106 - python_legacy-structure_graphcuts (NUMERICAL)
	233 - generated_cpp-regression-random_forest_regression (SEGFAULT)
	354 - generated_python-regression-random_forest_regression (SEGFAULT)
I removed the regression-random-forest undocumented example as the random forest regression meta example already exists. Is that what is leading to these errors?

No, that error is not related to this PR. see #5060

Hephaestus12 · 2020-06-13T13:25:29Z

I tried running make test and the generated_cpp-regression-chaidtree runs successfully:

        Start 227: generated_cpp-regression-chaidtree
227/358 Test #227: generated_cpp-regression-chaidtree .......   Passed    0.03 sec

However, the .dat file is still not getting created.(I have removed the m from the parameter names)

(python3.5) tejsukhatme@hephaestus:~/shogun/build/tests/meta/generated_results/cpp/regression$ ls
cartree.cpp                             cpp-regression-kernel_ridge_regression_nystrom  cpp-regression-svrlight              least_angle_regression.dat    random_forest_regression.dat
cartree.dat                             cpp-regression-least_angle_regression           kernel_ridge_regression.cpp          linear_ridge_regression.cpp   support_vector_regression.cpp
chaidtree.cpp                           cpp-regression-linear_ridge_regression          kernel_ridge_regression.dat          linear_ridge_regression.dat   support_vector_regression.dat
cpp-regression-cartree                  cpp-regression-multiple_kernel_learning         kernel_ridge_regression_nystrom.cpp  multiple_kernel_learning.cpp  svrlight.cpp
cpp-regression-chaidtree                cpp-regression-random_forest_regression         kernel_ridge_regression_nystrom.dat  multiple_kernel_learning.dat  svrlight.dat
cpp-regression-kernel_ridge_regression  cpp-regression-support_vector_regression        least_angle_regression.cpp           random_forest_regression.cpp

When I run ctest for the single test I get the following error:

error while loading shared libraries: libhdf5.so.103: cannot open shared object file: No such file or directory

@geektoni libhdf5 strikes again, should I make a new environment and set everything up all over again?

gf712 · 2020-06-13T13:30:48Z

you need to find libhdf5.so.103 in your system, I am guessing it is is anaconda folder. And then you need to add the path to LD_LIBRARY_PATH

karlnapf · 2020-06-13T18:21:30Z

Could you name the files chaid_tree? Ie with an underscore. Just to tidy up a bit (also for future prs on examples) thx

karlnapf · 2020-06-13T18:22:13Z

Just disable hdf5 in cmake... I do this locally as it causes problems otherwise

karlnapf · 2020-06-21T21:55:52Z

examples/meta/src/regression/chaidtree.sg.in

+
+#![extract_weights_labels]
+RealVector labels_vector = labels_predict.get_real_vector("labels")
+RealVector weights = chaidtree.get_real_vector("weights")


these are set by the user so I think this doesn't need to be extracted as discussed in the data PR.
Once you have removed this, you have to regenerate the data, and update the data PR, then update this PR (including the submodule)

the weights are still extracted here, you will need to remove that

Hephaestus12 · 2020-06-23T10:53:54Z

I have pushed the new data to the shogun-data PR.
After that gets merged, what do I have to do?

gf712 · 2020-06-23T10:59:14Z

I have pushed the new data to the shogun-data PR.
After that gets merged, what do I have to do?

You need to first update the data commit hash here to use the latest commit you just pushed. And then when the CI passes we merge both PRs and that's it :)

karlnapf · 2020-06-23T11:15:06Z

src/shogun/multiclass/tree/CHAIDTree.cpp

+	SG_ADD(&m_weights,"weights", "weights", ParameterProperties::READONLY);
+	SG_ADD(&m_weights_set,"weights_set", "weights set", ParameterProperties::READONLY);
+	SG_ADD(&m_feature_types,"feature_types", "feature types", ParameterProperties::SETTING);
+	SG_ADD(&m_dependent_vartype,"dependent_vartype", "dependent variable type", ParameterProperties::SETTING);


This causes the notebooks to fail:
https://dev.azure.com/shogunml/shogun/_build/results?buildId=3629&view=logs&j=089c709a-44eb-5f6e-96e7-15e9ee1ff5bf&t=2da3e16b-a2b2-5f01-2cbe-a20d9528195b&l=1849

Should be simple to fix: open the notebook and edit the name in there as well :)

How do I run test the notebooks on my local machine? Does make test do that?

There is a script for doing that https://github.com/shogun-toolbox/shogun/blob/develop/scripts/test_notebooks.sh.

The link Heiko pasted here above will show also how to use it.

It doesnt. You would have to compile shogun with the python interface, make sure you can load it from python, and then open the notebook in jupyter notebook.

However, you might be able to do a simple hack here:

Open the notebook in a texteditor

Search for m_

If it is one of the varnames, change it to the values you updated them to

Save the file in the texteditor and submit

As this is such a simple change, that should do it without the need for you running it locally

Yes, I have already pushed the code after making those changes. :)

karlnapf · 2020-06-23T11:15:55Z

@Hephaestus12 you also need to push the updated example here (as the code in the PR still extracts the weights)
And I just saw that a notebook fails from renaming the variable...easy to fix see my other comment

Hephaestus12 · 2020-06-23T11:59:15Z

The file changes are showing that some other data file has been deleted.

Hephaestus12 · 2020-06-23T12:03:35Z

Fixed.

Hephaestus12 · 2020-06-23T13:18:52Z

@geektoni Now, this PR shows that I have deleted the weighted_degree_string.dat file when you see the files changed tab.

karlnapf · 2020-06-23T13:20:08Z

doc/ipython-notebooks/multiclass/Tree/DecisionTrees.ipynb

@@ -1405,9 +1405,9 @@
   "source": [
    "def train_chaidtree(dependent_var_type,feature_types,num_bins,feats,labels):\n",
    "    # create CHAID tree object\n",
-    "    c = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=dependent_var_type,\n",


I think you forgot some, I remember seeing one name in a "get" call...double check

Are you referring to this:

shogun/doc/ipython-notebooks/multiclass/Tree/DecisionTrees.ipynb

Line 613 in 123f512

" tree = sg.create_machine(\"C45ClassifierTree\", labels=labels, m_nominal=types)\n",

I don't think we should change this right? As it isn't related to CHAID tree? As then we will have to make some other change in the source code related to C45ClassifierTree

Also, the other one is :

shogun/doc/ipython-notebooks/multiclass/Tree/DecisionTrees.ipynb

Line 643 in 123f512

" output_certainty=tree.get('m_certainty')\n",

This one too, is an instance of C45ClassifierTree. Should I change these two instances too?

ah sorry. of course you are right!
We will see it in the CI for the notebooks

karlnapf · 2020-06-23T13:21:40Z

This is because I just merged another data PR.
You will need to rebase your data PR, force push it.
Then update this PR with the new submodule and force push as well
(also double check the notebook I mentioned)

Hephaestus12 · 2020-06-23T15:24:25Z

This is because I just merged another data PR.
You will need to rebase your data PR, force push it.
Then update this PR with the new submodule and force push as well
(also double check the notebook I mentioned)

I did what you said, Now there are 4 data file changes, I don't understand why this is happening.

karlnapf · 2020-06-23T15:36:53Z

as you can see from the merge conflict for the submodule here, you haven't updated the submodule in this PR to the version of your (refactored!) data PR

Hephaestus12 · 2020-06-24T04:43:40Z

Now the merge conflict will resolve when we merge the shogun-data pr right?

karlnapf · 2020-06-24T07:20:09Z

Make sure to read about why merge conflicts happen in git and how to resolve them...

karlnapf · 2020-06-24T09:02:01Z

All you really need to do is to update the data submodule to the latest version in your data PR (and then force push again)

karlnapf · 2020-06-25T09:37:54Z

This shouldn't take so long to sort out @Hephaestus12 if you have questions, come to irc and ask, we are happy to help! Please make this a priority

Hephaestus12 · 2020-06-25T10:29:04Z

Yes, I hope this is ready now?

karlnapf · 2020-06-25T12:15:20Z

Almost! See comment in other PR (in shogun-data, you should always squash your commits, in shogun-dev that is not necessary as we can do it when merging)

gf712 · 2020-06-25T12:42:35Z

examples/meta/src/regression/chaid_tree.sg.in

+
+#![set_feature_types]
+IntVector ft(1)
+ft[0] = 2


@karlnapf this is causing issues. In octave this becomes a scalar value :( I wrote a fix for this but it's in another branch, not yet merged.. Also this throws an error in the meta example, but ctest doesn't pick this up (I had this issue before) and I am not sure why. The test only fails when comparing the serialised outputs in the integration test, because this will not have serialised anything because of the exception thrown when you put ft

So, unless we merge your fix this meta example will fail only in Octave, right? Could it be possible to merge this PR anyway, but somehow excluding it from testing with Octave (since it is broken atm)? Just to not have to put this on hold indefinitely...

Maybe @Hephaestus12 can just fix it here? All you need to do is replace

shogun/src/interfaces/swig/shogun.i

Line 199 in 3041ea0

#ifdef SWIGR

with #if defined(SWIGR) || defined(SWIGOCTAVE)

Yes, I'll do this.

I hope this works :D

karlnapf · 2020-06-25T15:42:04Z

ok data is not all in sync.... let's see what the CI says

karlnapf · 2020-06-25T17:33:40Z

Looks good, I'll merge :)

karlnapf · 2020-06-25T17:34:35Z

thanks! This was a nice one! :)

Hephaestus12 · 2020-06-25T18:00:32Z

Yayayay

^ sorry for this, I'm just really relieved. xD

gf712 reviewed Jun 12, 2020

View reviewed changes

gf712 reviewed Jun 13, 2020

View reviewed changes

Hephaestus12 mentioned this pull request Jun 20, 2020

add data for CHAID tree regression shogun-toolbox/shogun-data#198

Merged

karlnapf reviewed Jun 21, 2020

View reviewed changes

karlnapf reviewed Jun 23, 2020

View reviewed changes

Hephaestus12 force-pushed the feature/add_meta_example branch from 3c0e76f to 94e2179 Compare June 23, 2020 11:43

karlnapf reviewed Jun 23, 2020

View reviewed changes

Hephaestus12 force-pushed the feature/add_meta_example branch from fb41115 to a5ad4d3 Compare June 24, 2020 13:34

Hephaestus12 added 8 commits June 25, 2020 15:43

add metaexample for CHAIDTree Regression

a3dcdfd

Fix tiny typo.

b89ba4c

set labels in constructor

1f9edbc

remove prefix m from paramteres

95a2af7

changed filename

40613e8

add data and meta-example

7c843e2

make change in notebook

4d666ac

Re-add data for meta example.

f735ae1

Hephaestus12 force-pushed the feature/add_meta_example branch from a5ad4d3 to f735ae1 Compare June 25, 2020 10:19

changed local branch in data repo

ea5a144

gf712 reviewed Jun 25, 2020

View reviewed changes

Hephaestus12 added 2 commits June 25, 2020 20:06

Edit SWIG interface.

b66ca9a

squashed data commits

ad7341f

karlnapf merged commit 8c83d7e into shogun-toolbox:develop Jun 25, 2020

	SG_ADD(&m_weights,"m_weights", "weights", ParameterProperties::READONLY);
	SG_ADD(&m_weights_set,"m_weights_set", "weights set", ParameterProperties::READONLY);
	SG_ADD(&m_feature_types,"m_feature_types", "feature types", ParameterProperties::SETTING);
	SG_ADD(&m_dependent_vartype,"m_dependent_vartype", "dependent variable type", ParameterProperties::SETTING);
	SG_ADD(&m_max_tree_depth,"m_max_tree_depth", "max tree depth", ParameterProperties::HYPER);
	SG_ADD(&m_min_node_size,"m_min_node_size", "min node size", ParameterProperties::SETTING);
	SG_ADD(&m_alpha_merge,"m_alpha_merge", "alpha-merge", ParameterProperties::HYPER);
	SG_ADD(&m_alpha_split,"m_alpha_split", "alpha-split", ParameterProperties::HYPER);
	SG_ADD(&m_cont_breakpoints,"m_cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING);
	SG_ADD(&m_num_breakpoints,"m_num_breakpoints", "number of breakpoints", ParameterProperties::HYPER);

add metaexample for CHAIDTree Regression #5065

add metaexample for CHAIDTree Regression #5065

Conversation

Hephaestus12 commented Jun 12, 2020

gf712 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hephaestus12 commented Jun 12, 2020

gf712 commented Jun 12, 2020

Hephaestus12 commented Jun 12, 2020

gf712 commented Jun 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hephaestus12 commented Jun 13, 2020

gf712 commented Jun 13, 2020 • edited Loading

Hephaestus12 commented Jun 13, 2020 • edited Loading

gf712 commented Jun 13, 2020

karlnapf commented Jun 13, 2020

karlnapf commented Jun 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hephaestus12 commented Jun 23, 2020

gf712 commented Jun 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf commented Jun 23, 2020

Hephaestus12 commented Jun 23, 2020

Hephaestus12 commented Jun 23, 2020

Hephaestus12 commented Jun 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf commented Jun 23, 2020

Hephaestus12 commented Jun 23, 2020

karlnapf commented Jun 23, 2020

Hephaestus12 commented Jun 24, 2020

karlnapf commented Jun 24, 2020

karlnapf commented Jun 24, 2020

karlnapf commented Jun 25, 2020

Hephaestus12 commented Jun 25, 2020

karlnapf commented Jun 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf commented Jun 25, 2020

karlnapf commented Jun 25, 2020

karlnapf commented Jun 25, 2020

Hephaestus12 commented Jun 25, 2020 • edited Loading

gf712 commented Jun 13, 2020 •

edited

Loading

Hephaestus12 commented Jun 13, 2020 •

edited

Loading

Hephaestus12 commented Jun 25, 2020 •

edited

Loading