-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metaexample for CHAIDTree Regression #5065
add metaexample for CHAIDTree Regression #5065
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, now you just have to add the data file :)
|
||
#![create_machine] | ||
Machine chaidtree = create_machine("CHAIDTree", dependent_vartype=2, feature_types=ft, num_breakpoints=50) | ||
chaidtree.set_labels(labels_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pass labels either to the constructor or use put instead please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll do that.
Isn't it already present? |
This says I should copy the chaidtree.dat file from the |
You need to also run the meta example (using ctest) and that will generate the output file. You should use the cpp meta example output |
#![set_feature_types] | ||
|
||
#![create_machine] | ||
Machine chaidtree = create_machine("CHAIDTree", labels=labels_train, dependent_vartype=2, feature_types=ft, num_breakpoints=50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the error is here: if you have a look at the parameter registration they all have a m_
prefix. You should remove that m_
for the parameter name
shogun/src/shogun/multiclass/tree/CHAIDTree.cpp
Lines 1395 to 1404 in 7483101
SG_ADD(&m_weights,"m_weights", "weights", ParameterProperties::READONLY); | |
SG_ADD(&m_weights_set,"m_weights_set", "weights set", ParameterProperties::READONLY); | |
SG_ADD(&m_feature_types,"m_feature_types", "feature types", ParameterProperties::SETTING); | |
SG_ADD(&m_dependent_vartype,"m_dependent_vartype", "dependent variable type", ParameterProperties::SETTING); | |
SG_ADD(&m_max_tree_depth,"m_max_tree_depth", "max tree depth", ParameterProperties::HYPER); | |
SG_ADD(&m_min_node_size,"m_min_node_size", "min node size", ParameterProperties::SETTING); | |
SG_ADD(&m_alpha_merge,"m_alpha_merge", "alpha-merge", ParameterProperties::HYPER); | |
SG_ADD(&m_alpha_split,"m_alpha_split", "alpha-split", ParameterProperties::HYPER); | |
SG_ADD(&m_cont_breakpoints,"m_cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING); | |
SG_ADD(&m_num_breakpoints,"m_num_breakpoints", "number of breakpoints", ParameterProperties::HYPER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, I'll do that.
When I run make test I get the following error:
I removed the regression-random-forest undocumented example as the random forest regression meta example already exists. Is that what is leading to these errors? |
No, that error is not related to this PR. see #5060 |
I tried running make test and the
However, the .dat file is still not getting created.(I have removed the m from the parameter names)
When I run ctest for the single test I get the following error:
@geektoni libhdf5 strikes again, should I make a new environment and set everything up all over again? |
you need to find |
Could you name the files chaid_tree? Ie with an underscore. Just to tidy up a bit (also for future prs on examples) thx |
Just disable hdf5 in cmake... I do this locally as it causes problems otherwise |
|
||
#![extract_weights_labels] | ||
RealVector labels_vector = labels_predict.get_real_vector("labels") | ||
RealVector weights = chaidtree.get_real_vector("weights") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are set by the user so I think this doesn't need to be extracted as discussed in the data PR.
Once you have removed this, you have to regenerate the data, and update the data PR, then update this PR (including the submodule)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the weights are still extracted here, you will need to remove that
I have pushed the new data to the shogun-data PR. |
You need to first update the data commit hash here to use the latest commit you just pushed. And then when the CI passes we merge both PRs and that's it :) |
SG_ADD(&m_weights,"weights", "weights", ParameterProperties::READONLY); | ||
SG_ADD(&m_weights_set,"weights_set", "weights set", ParameterProperties::READONLY); | ||
SG_ADD(&m_feature_types,"feature_types", "feature types", ParameterProperties::SETTING); | ||
SG_ADD(&m_dependent_vartype,"dependent_vartype", "dependent variable type", ParameterProperties::SETTING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes the notebooks to fail:
https://dev.azure.com/shogunml/shogun/_build/results?buildId=3629&view=logs&j=089c709a-44eb-5f6e-96e7-15e9ee1ff5bf&t=2da3e16b-a2b2-5f01-2cbe-a20d9528195b&l=1849
Should be simple to fix: open the notebook and edit the name in there as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I run test the notebooks on my local machine? Does make test
do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a script for doing that https://github.com/shogun-toolbox/shogun/blob/develop/scripts/test_notebooks.sh.
The link Heiko pasted here above will show also how to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesnt. You would have to compile shogun with the python interface, make sure you can load it from python, and then open the notebook in jupyter notebook.
However, you might be able to do a simple hack here:
- Open the notebook in a texteditor
- Search for
m_
- If it is one of the varnames, change it to the values you updated them to
- Save the file in the texteditor and submit
As this is such a simple change, that should do it without the need for you running it locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have already pushed the code after making those changes. :)
@Hephaestus12 you also need to push the updated example here (as the code in the PR still extracts the weights) |
3c0e76f
to
94e2179
Compare
The file changes are showing that some other data file has been deleted. |
Fixed. |
@geektoni Now, this PR shows that I have deleted the |
@@ -1405,9 +1405,9 @@ | |||
"source": [ | |||
"def train_chaidtree(dependent_var_type,feature_types,num_bins,feats,labels):\n", | |||
" # create CHAID tree object\n", | |||
" c = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=dependent_var_type,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot some, I remember seeing one name in a "get" call...double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to this:
" tree = sg.create_machine(\"C45ClassifierTree\", labels=labels, m_nominal=types)\n", |
I don't think we should change this right? As it isn't related to CHAID tree? As then we will have to make some other change in the source code related to C45ClassifierTree
Also, the other one is :
" output_certainty=tree.get('m_certainty')\n", |
This one too, is an instance of C45ClassifierTree
. Should I change these two instances too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry. of course you are right!
We will see it in the CI for the notebooks
This is because I just merged another data PR. |
I did what you said, Now there are 4 data file changes, I don't understand why this is happening. |
as you can see from the merge conflict for the submodule here, you haven't updated the submodule in this PR to the version of your (refactored!) data PR |
Now the merge conflict will resolve when we merge the shogun-data pr right? |
Make sure to read about why merge conflicts happen in git and how to resolve them... |
All you really need to do is to update the data submodule to the latest version in your data PR (and then force push again) |
fb41115
to
a5ad4d3
Compare
This shouldn't take so long to sort out @Hephaestus12 if you have questions, come to irc and ask, we are happy to help! Please make this a priority |
a5ad4d3
to
f735ae1
Compare
Yes, I hope this is ready now? |
Almost! See comment in other PR (in shogun-data, you should always squash your commits, in shogun-dev that is not necessary as we can do it when merging) |
|
||
#![set_feature_types] | ||
IntVector ft(1) | ||
ft[0] = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf this is causing issues. In octave this becomes a scalar value :( I wrote a fix for this but it's in another branch, not yet merged.. Also this throws an error in the meta example, but ctest doesn't pick this up (I had this issue before) and I am not sure why. The test only fails when comparing the serialised outputs in the integration test, because this will not have serialised anything because of the exception thrown when you put ft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, unless we merge your fix this meta example will fail only in Octave, right? Could it be possible to merge this PR anyway, but somehow excluding it from testing with Octave (since it is broken atm)? Just to not have to put this on hold indefinitely...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @Hephaestus12 can just fix it here? All you need to do is replace
shogun/src/interfaces/swig/shogun.i
Line 199 in 3041ea0
#ifdef SWIGR |
with #if defined(SWIGR) || defined(SWIGOCTAVE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this works :D
ok data is not all in sync.... let's see what the CI says |
Looks good, I'll merge :) |
thanks! This was a nice one! :) |
Yayayay ^ sorry for this, I'm just really relieved. xD |
Also removed Random forest regression undocumented example as it is already ported.