Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gini operate on subspaces that than “features” #91

Closed
falkben opened this issue Nov 19, 2018 · 7 comments
Closed

Gini operate on subspaces that than “features” #91

falkben opened this issue Nov 19, 2018 · 7 comments
Assignees
Projects
Milestone

Comments

@falkben
Copy link
Contributor

falkben commented Nov 19, 2018

@jovo @MrAE need a little more info/explanation

@falkben falkben created this issue from a note in Roadmap (Sprint1) Nov 19, 2018
@falkben falkben moved this from Sprint1 to Sprint2 in Roadmap Nov 19, 2018
@jovo
Copy link
Member

jovo commented Nov 20, 2018 via email

@MrAE MrAE added this to the Sprint II milestone Jan 2, 2019
@MrAE
Copy link
Collaborator

MrAE commented Jan 15, 2019

So, would we have the following features as equivalent?

[-1 +1]* [x1, x2] == [+1 -1]*[x1 x2] =?= [-1 -1]* [x1, x2] == [+1 +1]*[x1 x2]

?

@jovo
Copy link
Member

jovo commented Jan 15, 2019 via email

@falkben
Copy link
Contributor Author

falkben commented Jan 15, 2019

How do you want to handle randmat continuous? It seems like the feature importance as calculated right now would count every linear combination as different for continuous. We could tackle that at some other point though. My naive suggestion for continuous would be to just count how many times a particular feature was used in the entire forest (and not worry at all about the weights).

@jovo
Copy link
Member

jovo commented Jan 15, 2019 via email

@MrAE
Copy link
Collaborator

MrAE commented Jan 16, 2019

Ok, so disregard my previous statement.

Projection weights are equivalent if they parametrize the same line in $\mathbb{R}^p$ e.g for two projection vectors $w$ and $v$ if $w = av$ for some $a \in \mathbb{R}$, then we should count them as the same thing when computing feature importance.

This is easy to check if the weights are binary {1,-1} or are sampled from some finite discrete set.

If the weights are sampled continuously, I'm not sure it would be efficient to check this or if we would even have a high enough probability of sampling equivalent vectors.

As Ben suggested, we could just count the number of times a unique combination of the original features was used disregarding the projection weights.

@jovo
Copy link
Member

jovo commented Jan 16, 2019 via email

MrAE added a commit that referenced this issue Jan 17, 2019
add some helper functions
add test for new way of computing feature importance
MrAE added a commit that referenced this issue Jan 17, 2019
MrAE added a commit that referenced this issue Jan 18, 2019
* fix issue #91 based on discussion in the comments.
add some helper functions
add test for new way of computing feature importance

* remove need for library(Matrix) and update function parameteres.
fix documentation typos
[issue #91]

* update test-FeatureImportance
move `flipWeights` to helperFunctions

* update Feature Importance to be more readable [@ben].
Merge RunFeature* into the same file.
Update README with correct output names.
@MrAE MrAE closed this as completed Jan 23, 2019
MrAE added a commit that referenced this issue Feb 6, 2019
* Add Zenodo DOI Badge. (#118) [Closes #74]

* Add Zenodo DOI Badge.

* Fix link [Closes #74]

* speed up travis builds (#125)

* removed the distribution and sudo entries from travis config - faster?

* adding back sduo false and adding cache packages option

* small updates to contributing guide (#133)

* add example for running styler

move contributing to .github folder

* ignore the .github path

* fix indexing order of operations error [fixes #119]. (#134)

* added functionality to change mtry and sparsity in Urerf (#120)

* added functionality to change mtry and sparsity in Urerf

* ran styler on modified files and removed white space.

* added tests for new RandMat functions.

* Added the functionality to splitting based on BIC score using Mclust (#124)

* added functionality to change mtry and sparsity in Urerf

* Added functionality to split based on BIC score

* Add LinearCombo arg to the Urerf fn

* Add fast version of BIC

* fix some minor errors (#141)

Ran through styler and fixed some roxygen import and documentation.

* fix issue #91 based on discussion in the comments. (#140)

* fix issue #91 based on discussion in the comments.
add some helper functions
add test for new way of computing feature importance

* remove need for library(Matrix) and update function parameteres.
fix documentation typos
[issue #91]

* update test-FeatureImportance
move `flipWeights` to helperFunctions

* update Feature Importance to be more readable [@ben].
Merge RunFeature* into the same file.
Update README with correct output names.

* check-as-cran warning will now cause TravisCI to fail. (#142)

* Print tree (#136)

* added functionality to change mtry and sparsity in Urerf

* ran styler on modified files and removed white space.

* added tests for new RandMat functions.

* added PrintTree function and modified NAMESPACE file to call PrintTree (I'm not sure this last step was necessary but it doesn't hurt.

* Add documentation and adjust the formatting of the output.

* the double comparison now relies on machine epsilon. (#149)

* the double comparison now relies on machine epsilon.

* fix for test not passing

* move an assignment out of an if condition. (#151)

Fixes issue #135

* Packed forest submodule (#152)

* add packedForest submodule

* update submodule to latest commitadd readme for submodule operations

* update submodule readme

* update submodule

* update submodule (#154)

* update submodule (#155)

* Draft of v2.0.3 for CRAN (#156)

* Draft of v2.0.3 for CRAN
no warnings, errors, or notes on my Mac.

* run README.Rmd

* update submodule (#159)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
Sprint2
Development

No branches or pull requests

3 participants