-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrapper FS based on Cuckoo Search algorithm #335
base: main
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application. When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated. If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public. |
Thanks for the contribution. Once you've sorted out the OCA I can go through and review this, but I'm not allowed to review it until the OCA check passes. |
I signed the OCA form 4 days ago. And now I am waiting for the confirmation. |
Ok, thanks. Once it's gone through the system we can get going with this. It shouldn't take too long. |
Thank you. |
It looks like there were some issues with the OCA you submitted, could you check the status of it and fix the issues it describes? |
The problem was that in the first OCA form, I forgot to write down my GitHub username. But few moments ago, I signed the OCA, and I provided my GitHub username. |
Dear Adam, Is there anyway to get a list or an array of the feature values from the 'selectedFeatureDataset' ?, such as getting the array of values of a certain feature based on the index. That this I need in order to compute functions like Pearson Correlation Coefficient between two features for some purposes |
Not easily from the selected feature dataset as it's stored row wise. The current information theoretic feature selection dataset converts the data into a columnar storage format before running the feature selection procedure. Depending on what you want to do it might be more efficient to do that, like if you need to compute the correlations between all features. Otherwise you pay a |
I want to compute the correlation between the entire features in the selected feature dataset and how can I construct the d*n matrix using tribuo engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, sorry it took a while to review. The implementation looks interesting, but there's a bit of work required to get it to play nicely with Tribuo's provenance and configuration systems. Also could you add a basic unit test which runs a feature selection operation, similar to the other tests in the FeatureSelection subproject?
@@ -0,0 +1,27 @@ | |||
package FS_Wrapper_Approaches.Discreeting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base package name needs updating to org.tribuo.classification.fs.wrapper
and then the files should be moved into the right directory. At the moment this won't compile because the directory and package names don't line up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also all the files need the copyright and license header.
@@ -0,0 +1,27 @@ | |||
package FS_Wrapper_Approaches.Discreeting; | |||
|
|||
import static org.apache.commons.math3.special.Erf.erf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tribuo no longer depends on Apache Commons Math 3, and this import isn't used.
* Enumeration that contains the types of transfer functions in which they are used to define the type of transfer function | ||
*/ | ||
public enum TransferFunction { | ||
V1, V2, V3, V4, S1, S2, S3, S4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of discreteValue could be folded in here, making the enum contain a DoubleUnaryFunction
which performs the necessary operation, and then it could have a discretise
method which applies the function the output. Then the Binarizing
interface can be removed.
import org.tribuo.classification.Label; | ||
import org.tribuo.classification.evaluation.LabelEvaluation; | ||
import org.tribuo.classification.evaluation.LabelEvaluator; | ||
import org.tribuo.common.nearest.KNNClassifierOptions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid surfacing the options classes in the library code as they are not under the same compatibility guarantees as the library code. It's probably better to import KNNTrainer
directly.
/** | ||
* This interface includes the evaluation function of each solution | ||
*/ | ||
public interface FitnessFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface only contains static members. For the time being those methods can be moved onto CuckooSearchOptimizer
and be private, then if we need to use them from other places they can be easily moved and made more accessible.
private final double delta; | ||
private final int populationSize; | ||
private int [][] setOfSolutions; | ||
private final int maxIteration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the final variables here should not be final
and need to be tagged @Config
(with appropriate description fields) so they are automatically captured by the provenance system.
* @param totalNumberOfFeatures The number of features in the given dataset | ||
* @return The population of subsets of selected features | ||
*/ | ||
private int[][] GeneratePopulation(int totalNumberOfFeatures) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere this method should accept a SplittableRandom
rather than make a fresh RNG each time.
/** | ||
* This record is used to hold subset of features with its corresponding fitness score | ||
*/ | ||
record FeatureSet_FScore_Container(int[] subSet, double score) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CuckooSearchFeatureSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this record in order to store the solutions with their corresponding fitness values in order to use this to arrange the solutions based on their fitness scores using Comparator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I was suggesting this record should be renamed CuckooSearchFeatureSet
as it's specific to this approach and it's currently got a very general name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I am sorry, I don't get your note as it should be. You are right I should change the name, and I am currently working on the FJP issue. Thanks Dr. Adam for your suggestions. All of your suggestions are suitable, and I know that I still need more time to learn more and more in programming in Java (my best language)
KNNClassifierOptions classifier = new KNNClassifierOptions(); | ||
CrossValidation<Label, LabelEvaluation> crossValidation = new CrossValidation<>(classifier.getTrainer(), selectedFeatureDataset, new LabelEvaluator(), 10); | ||
double avgAccuracy = 0d; | ||
for (Pair<LabelEvaluation, Model<Label>> ACC : crossValidation.evaluate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always use curly braces even for single line for loops and if statements.
import FS_Wrapper_Approaches.Discreeting.Binarizing; | ||
import FS_Wrapper_Approaches.Discreeting.TransferFunction; | ||
import com.oracle.labs.mlrg.olcut.util.Pair; | ||
import org.tribuo.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All imports should be explicit, no star imports for packages.
Thank you for your suggestions, all of them are great and I will do them, but currently I am working on my PhD proposal and I have meetings with my advisor during the current month, so I will make these suggestions during the next one. |
…ification/fs/wrapper directory
Description
< Adding a wrapper FS approach based on binary Cuckoo Search algorithm to Tribuo engine>
Motivation
< Recently Evolutionary Computation have shown alot of good results in a wide range of technologies especially in engineering problems. Moreover, many published research papers include the utilizing of EO algorithms for FS and inform good results. Therefore, adding a wrapper FS model to Tribuo engine would be great >
< Please link any relevant issues or PRs >
Paper reference
< Rodrigues, D. et al. BCS: A Binary Cuckoo Search algorithm for feature selection. Proc. - IEEE Int. Symp. Circuits Syst. 465–468 (2013) doi:10.1109/ISCAS.2013.6571881. >