-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added CKernelDependenceMaximization and CBAHSIC in feature selection framework #2363
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
/* | ||
* Copyright (c) The Shogun Machine Learning Toolbox | ||
* Written (w) 2014 Soumyajit De | ||
* All rights reserved. | ||
* | ||
* Redistribution and use in source and binary forms, with or without | ||
* modification, are permitted provided that the following conditions are met: | ||
* | ||
* 1. Redistributions of source code must retain the above copyright notice, this | ||
* list of conditions and the following disclaimer. | ||
* 2. Redistributions in binary form must reproduce the above copyright notice, | ||
* this list of conditions and the following disclaimer in the documentation | ||
* and/or other materials provided with the distribution. | ||
* | ||
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR | ||
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | ||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
* | ||
* The views and conclusions contained in the software and documentation are those | ||
* of the authors and should not be interpreted as representing official policies, | ||
* either expressed or implied, of the Shogun Development Team. | ||
*/ | ||
|
||
#include <shogun/statistics/HSIC.h> | ||
#include <shogun/preprocessor/BAHSIC.h> | ||
|
||
using namespace shogun; | ||
|
||
CBAHSIC::CBAHSIC() : CKernelDependenceMaximization() | ||
{ | ||
init(); | ||
} | ||
|
||
void CBAHSIC::init() | ||
{ | ||
m_estimator=new CHSIC(); | ||
SG_REF(m_estimator); | ||
m_algorithm=BACKWARD_ELIMINATION; | ||
} | ||
|
||
CBAHSIC::~CBAHSIC() | ||
{ | ||
// estimator is SG_UNREF'ed in base CDependenceMaximization destructor | ||
} | ||
|
||
void CBAHSIC::set_algorithm(EFeatureSelectionAlgorithm algorithm) | ||
{ | ||
SG_INFO("Algorithm is set to BACKWARD_ELIMINATION for %s and therefore " | ||
"cannot be set externally!\n", get_name()); | ||
} | ||
|
||
EPreprocessorType CBAHSIC::get_type() const | ||
{ | ||
return P_BAHSIC; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
/* | ||
* Copyright (c) The Shogun Machine Learning Toolbox | ||
* Written (w) 2014 Soumyajit De | ||
* All rights reserved. | ||
* | ||
* Redistribution and use in source and binary forms, with or without | ||
* modification, are permitted provided that the following conditions are met: | ||
* | ||
* 1. Redistributions of source code must retain the above copyright notice, this | ||
* list of conditions and the following disclaimer. | ||
* 2. Redistributions in binary form must reproduce the above copyright notice, | ||
* this list of conditions and the following disclaimer in the documentation | ||
* and/or other materials provided with the distribution. | ||
* | ||
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR | ||
* ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | ||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
* | ||
* The views and conclusions contained in the software and documentation are those | ||
* of the authors and should not be interpreted as representing official policies, | ||
* either expressed or implied, of the Shogun Development Team. | ||
*/ | ||
|
||
#ifndef BAHSIC_H__ | ||
#define BAHSIC_H__ | ||
|
||
#include <shogun/lib/config.h> | ||
#include <shogun/preprocessor/KernelDependenceMaximization.h> | ||
|
||
namespace shogun | ||
{ | ||
|
||
/** @brief Class CBAHSIC, that extends CKernelDependenceMaximization and uses | ||
* HSIC [1] to compute dependence measures for feature selection using a | ||
* backward elimination approach as described in [1]. This class serves as a | ||
* convenience class that initializes the CDependenceMaximization#m_estimator | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe the class header of both of those should mention the memory requirements additional to the HSIC computation itself? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karlnapf the base class CDependenceMaximization doc already mentions the additional memory requirement. These two classes don't add anything extra on the top of that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok ! |
||
* with an instance of CHSIC and allows only ::BACKWARD_ELIMINATION algorithm | ||
* to use which is set internally. Therefore, trying to use other algorithms | ||
* by set_algorithm() will not work. Plese see the class documentation of CHSIC | ||
* and [2] for more details on mathematical description of HSIC. | ||
* | ||
* Refrences: | ||
* [1] Song, Le and Bedo, Justin and Borgwardt, Karsten M. and Gretton, Arthur | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we try to reproduce an example from the paper in the notebook for this? |
||
* and Smola, Alex. (2007). Gene Selection via the BAHSIC Family of Algorithms. | ||
* Journal Bioinformatics. Volume 23 Issue Pages i490-i498. Oxford University | ||
* Press Oxford, UK | ||
* [2]: Gretton, A., Fukumizu, K., Teo, C., & Song, L. (2008). A kernel | ||
* statistical test of independence. Advances in Neural Information Processing | ||
* Systems, 1-8. | ||
*/ | ||
class CBAHSIC : public CKernelDependenceMaximization | ||
{ | ||
public: | ||
/** Default constructor */ | ||
CBAHSIC(); | ||
|
||
/** Destructor */ | ||
virtual ~CBAHSIC(); | ||
|
||
/** | ||
* Since only ::BACKWARD_ELIMINATION algorithm is applicable for BAHSIC, | ||
* and this is set internally, this method is overridden to prevent this | ||
* to be set from public API. | ||
* | ||
* @param algorithm the feature selection algorithm to use | ||
*/ | ||
virtual void set_algorithm(EFeatureSelectionAlgorithm algorithm); | ||
|
||
/** @return the preprocessor type */ | ||
virtual EPreprocessorType get_type() const; | ||
|
||
/** @return the class name */ | ||
virtual const char* get_name() const | ||
{ | ||
return "BAHSIC"; | ||
} | ||
|
||
private: | ||
/** Register params and initialize with default values */ | ||
void init(); | ||
|
||
}; | ||
|
||
} | ||
#endif // BAHSIC_H__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure this thing even needs to be exposed to modular ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf ummm.. well, its in public API of CFeatures.. so.. But since we have kept num_features thing out of CFeatures since it doesn't make sense for all feature types, maybe this method should not be here at all. Maybe a helper method in CFeatureSelection should handle it, like CFeatureSelection::get_num_features. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh, no hiding in in such a specialised class is not a good idea. then rather keep it public and expose it. i just though nobody might ever call this, so rather hide to not confuse people. but maybe actually somebody wants to call it, so keep this stuff, sorry for the confusion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlnapf well I thought that since our last discussion with @vigsterkr and @sonney2k regarding this num_feature thing, having a copy_dimension_subset in CFeatures would ultimately result in this method being unimplemented in all feature classes except for CDenseFeatures and CSparseFeatures :(