Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using pca in pipeline #116

Open
AdoHaha opened this issue Mar 9, 2017 · 2 comments
Open

using pca in pipeline #116

AdoHaha opened this issue Mar 9, 2017 · 2 comments

Comments

@AdoHaha
Copy link

AdoHaha commented Mar 9, 2017

Hi,
can pca be used in pipeline as a feature extractor?
It is theoretically in Feature Extraction examples but when I try to add it to the pipeline via pipeline.addFeatureExtractionModule(pca); I get no known conversion for argument 1 from ‘GRT::PrincipalComponentAnalysis’ to ‘const GRT::FeatureExtraction&’ error.

Also, can pca be trained on TimeSeriesClassificationData data, or only on matrix type?

@nickgillian
Copy link
Owner

Hey,

At this time in the master branch, the PrincipalComponentAnalysis can't be used directly as a feature extraction module. To use it for feature extraction, you need to run the feature extraction outside of the pipeline, and then input the output of the PrincipalComponentAnalysis module as input to the pipeline.

To help improve this, I've added a new PCA module to the toolkit which allows the PrincipalComponentAnalysis algorithm to be used directly within a pipeline as a feature extraction module.

You can find this new PCA module in the dev branch: https://github.com/nickgillian/grt/tree/dev/GRT/FeatureExtractionModules/PCA

You can find an example of how to use this here: https://github.com/nickgillian/grt/blob/dev/examples/FeatureExtractionModules/PCAPipelineExample/PCAPipelineExample.cpp

I still need to test this fully, so there may be bugs/issues (which is why it is still in the dev branch and not merged with master).

One note, there is currently a hack with how you need to use PCA module. This is because you need to train the PCA module before you can use it, so this requires you to add the PCA module to the pipeline, then access a pointer to the PCA module from the pipeline, then train the PCA model with your dataset. You can see this hack in the example above. This is bad for two reasons:

  1. It means you can't have any module before the PCA module in the pipeline (because the data will not be pumped through this module for training the PCA module)
  2. The coding flow is rather ugly (as you need to add the module, then get a pointer to it, then manually train it).

I'm working on improving this to enable you to add multiple modules to the pipeline before PCA, add a classifier after PCA, and then when you call pipeline.train(data) the pipeline will automatically iterate through all the modules, pipe the data recursively through each stage, train the feature modules (like PCA) and then finally train the classifier at the end of the pipeline. For now, you will need to do this manually.

@AdoHaha
Copy link
Author

AdoHaha commented Mar 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants