Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

id3 algorithm #1947

Closed
wants to merge 5 commits into from
Closed

Conversation

mazumdarparijat
Copy link
Contributor

@mazumdarparijat
Copy link
Contributor Author

@iglesias I have addressed monicadragen's PR on id3. While I had to rework most parts of the original commits, I have tried to keep the things same wherever possible. Please have a look!
I will add unit-tests to this very soon.
Its a relatively big PR. You may start reviewing from the tree structures and node structures that I have introduced. IMO, thats the best place to start.

{
float64_t gain = informational_gain_attribute(i,feats,class_labels);

if(gain > max){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (asdasd)
(whitespace)

@karlnapf
Copy link
Member

karlnapf commented Mar 6, 2014

Hi @mazumdarparijat
Wow, impressive work! :)
However, this patch is too big to review. We should not merge it like this as many things might slip through our hands.

I suggest the following approach:

  • Remove everything from it for now (keep a local copy).
  • Then add just one class (two files max) - the most basic one.
  • Then add unit tests just for those.
  • Send a PR.

This should not include more than 3 files and not more than a few hundred lines. It will be much easier to think about the code, what it is supposed to do, how it should be documented, what unit tests should there be.
Then we can discuss those changes (with the help of @iglesias ), and merge.

Then you add another class (few files) and their unit tests. And so on.

What do you think about that?

@mazumdarparijat
Copy link
Contributor Author

@karlnapf Let me do it in 2 parts then. In the 1st part, I will send in just the tree/node structures that I have created. In the second part, I will send in the id3 algorithm class. @iglesias ?

{

SG_REF(feats);
SG_UNREF(m_feats);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indenting... and this is for the whole file.
most editor has a script for this nowadays

@vigsterkr
Copy link
Member

the problems with this patch is actually are:

  • CID3Classifier is implemented twice: once within the library and once within the examples. which i don't understand why would anybody do this?
  • a yet another tree is introduced to the library: there's really no need for that as there's already like 3 tree implementations in within the library. If the current multiclass/tree/TreeMachine.h is not generic enough for you, just extend/refactor it as you need, but please don't introduce a new class....

thank you for the patch. contrary to @karlnapf comment, i think you could continue this pull request, as soon as you remove all those things i've mentioned above, the PR will be much much less, and easier to review

@@ -0,0 +1,97 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header file among examples? why?
this file should not be here (examples) at all...

@mazumdarparijat
Copy link
Contributor Author

@vigsterkr These (classifier_multiclass_id3.cpp/.h) are actually artefacts of monicadragan's patch that I applied. I moved this file, renamed and also refactored it massively. I am not sure why this is showing up still.
Since this PR is very long and messed up quite a bit, I have issued a fresh PR containing only the data structures created by me for id3 (which is the 1st part). Once that is merged, I will issue another PR for id3 algorithm. That way, it will be easier to review.
But you may still have a look at the last commit of this PR to get an idea of how I have implemented id3. Thanks! :)
I am closing this for now then!

@karlnapf
Copy link
Member

karlnapf commented Mar 7, 2014

@vigsterkr I agree with you on the structure. The point is that it is impossible to review all of this at once. Rather add things one-by-one. If there is a refactoring of some existing class, do that first (including tests etc) Then, in a separate PR, add new classes. Then finally examples.
Otherwise things slip through.

@vigsterkr
Copy link
Member

@karlnapf yeah like the LibLinear patch right? :) ok i'm just pulling your legs, don't take it personally...

@karlnapf
Copy link
Member

karlnapf commented Mar 7, 2014

@vigsterkr exactly like that ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants