Gini operate on subspaces that than “features” #91

falkben · 2018-11-19T15:58:07Z

@jovo @MrAE need a little more info/explanation

jovo · 2018-11-20T13:52:33Z

currently, -1*x and +1*x are treated differently, but they are effectively identical, so they should be treated the same. similarly, [-1 +1]* [x1, x2] is the same as [+1 -1]*[x1 x2]. see what i mean?

…

On Mon, Nov 19, 2018 at 10:58 AM Ben Falk ***@***.***> wrote: @jovo <https://github.com/jovo> @MrAE <https://github.com/MrAE> need a little more info/explanation — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#91>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACjcnt4m4pFyEASJ6zUdo8gf9B5z7Wiks5uwtU0gaJpZM4YpZTq> .

-- the glass is all full: half water, half air. neurodata.io

MrAE · 2019-01-15T21:18:07Z

So, would we have the following features as equivalent?

[-1 +1]* [x1, x2] == [+1 -1]*[x1 x2] =?= [-1 -1]* [x1, x2] == [+1 +1]*[x1 x2]

?

jovo · 2019-01-15T21:24:39Z

if the decision boundary is the same, than the features are the same. for example, [-1]*[x1] == [+1][x1] so, the first two are not equal to the last two. we need to think more about the others....

…

On Tue, Jan 15, 2019 at 4:18 PM Jesse Leigh Patsolic < ***@***.***> wrote: So, would we have the following features as equivalent? [-1 +1]* [x1, x2] == [+1 -1]*[x1 x2] =?= [-1 -1]* [x1, x2] == [+1 +1]*[x1 x2] ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#91 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACjckK3hVv2Hvlot7UwV4YQxtA2R72Zks5vDkWPgaJpZM4YpZTq> .

-- the glass is all full: half water, half air. neurodata.io

falkben · 2019-01-15T21:30:07Z

How do you want to handle randmat continuous? It seems like the feature importance as calculated right now would count every linear combination as different for continuous. We could tackle that at some other point though. My naive suggestion for continuous would be to just count how many times a particular feature was used in the entire forest (and not worry at all about the weights).

jovo · 2019-01-15T21:41:31Z

oh, i never think about continuous. let's just count that they exist, as you propose....

…

On Tue, Jan 15, 2019 at 4:30 PM Ben Falk ***@***.***> wrote: How do you want to handle randmat continuous? It seems like the feature importance as calculated right now would count every linear combination as different for continuous. We could tackle that at some other point though. My naive suggestion for continuous would be to just count how many times a particular feature was used in the entire forest (and not worry at all about the weights). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#91 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACjcmQRhwq-UYnKzAgTUbvFMQOc63vFks5vDkhggaJpZM4YpZTq> .

-- the glass is all full: half water, half air. neurodata.io

MrAE · 2019-01-16T17:01:12Z

Ok, so disregard my previous statement.

Projection weights are equivalent if they parametrize the same line in $\mathbb{R}^p$ e.g for two projection vectors $w$ and $v$ if $w = av$ for some $a \in \mathbb{R}$ , then we should count them as the same thing when computing feature importance.

This is easy to check if the weights are binary {1,-1} or are sampled from some finite discrete set.

If the weights are sampled continuously, I'm not sure it would be efficient to check this or if we would even have a high enough probability of sampling equivalent vectors.

As Ben suggested, we could just count the number of times a unique combination of the original features was used disregarding the projection weights.

jovo · 2019-01-16T17:08:32Z

agreed

…

On Wed, Jan 16, 2019 at 12:03 PM Jesse Leigh Patsolic < ***@***.***> wrote: Ok, so disregard my previous statement. Projection weights are equivalent if they parametrize the same line in $\mathbb{R}^p$ e.g for two projection vectors $w$ and $v$ if $w = av$ for some $a \in \mathbb{R}$, then we should count them as the same thing when computing feature importance. This is easy to check if the weights are binary {1,-1} or are sampled from some finite discrete set. If the weights are sampled continuously, I'm not sure it would be efficient to check this or if we would even have a high enough probability of sampling equivalent vectors. As Ben suggested, we could just count the number of times a unique combination of the original features was used disregarding the projection weights. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#91 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACjcgBro3LhSR8UGIZM_FkaxY_KAvB7ks5vD1rZgaJpZM4YpZTq> .

-- the glass is all full: half water, half air. neurodata.io

add some helper functions add test for new way of computing feature importance

fix documentation typos [issue #91]

@ben

* fix issue #91 based on discussion in the comments. add some helper functions add test for new way of computing feature importance * remove need for library(Matrix) and update function parameteres. fix documentation typos [issue #91] * update test-FeatureImportance move `flipWeights` to helperFunctions * update Feature Importance to be more readable [@ben]. Merge RunFeature* into the same file. Update README with correct output names.

@ben

* Add Zenodo DOI Badge. (#118) [Closes #74] * Add Zenodo DOI Badge. * Fix link [Closes #74] * speed up travis builds (#125) * removed the distribution and sudo entries from travis config - faster? * adding back sduo false and adding cache packages option * small updates to contributing guide (#133) * add example for running styler move contributing to .github folder * ignore the .github path * fix indexing order of operations error [fixes #119]. (#134) * added functionality to change mtry and sparsity in Urerf (#120) * added functionality to change mtry and sparsity in Urerf * ran styler on modified files and removed white space. * added tests for new RandMat functions. * Added the functionality to splitting based on BIC score using Mclust (#124) * added functionality to change mtry and sparsity in Urerf * Added functionality to split based on BIC score * Add LinearCombo arg to the Urerf fn * Add fast version of BIC * fix some minor errors (#141) Ran through styler and fixed some roxygen import and documentation. * fix issue #91 based on discussion in the comments. (#140) * fix issue #91 based on discussion in the comments. add some helper functions add test for new way of computing feature importance * remove need for library(Matrix) and update function parameteres. fix documentation typos [issue #91] * update test-FeatureImportance move `flipWeights` to helperFunctions * update Feature Importance to be more readable [@ben]. Merge RunFeature* into the same file. Update README with correct output names. * check-as-cran warning will now cause TravisCI to fail. (#142) * Print tree (#136) * added functionality to change mtry and sparsity in Urerf * ran styler on modified files and removed white space. * added tests for new RandMat functions. * added PrintTree function and modified NAMESPACE file to call PrintTree (I'm not sure this last step was necessary but it doesn't hurt. * Add documentation and adjust the formatting of the output. * the double comparison now relies on machine epsilon. (#149) * the double comparison now relies on machine epsilon. * fix for test not passing * move an assignment out of an if condition. (#151) Fixes issue #135 * Packed forest submodule (#152) * add packedForest submodule * update submodule to latest commitadd readme for submodule operations * update submodule readme * update submodule * update submodule (#154) * update submodule (#155) * Draft of v2.0.3 for CRAN (#156) * Draft of v2.0.3 for CRAN no warnings, errors, or notes on my Mac. * run README.Rmd * update submodule (#159)

falkben created this issue from a note in Roadmap (Sprint1) Nov 19, 2018

falkben moved this from Sprint1 to Sprint2 in Roadmap Nov 19, 2018

MrAE added this to the Sprint II milestone Jan 2, 2019

MrAE added the high-priority label Jan 2, 2019

falkben assigned MrAE Jan 3, 2019

MrAE added a commit that referenced this issue Jan 17, 2019

fix issue #91 based on discussion in the comments.

0071022

add some helper functions add test for new way of computing feature importance

MrAE added a commit that referenced this issue Jan 17, 2019

remove need for library(Matrix) and update function parameteres.

bc63d31

fix documentation typos [issue #91]

MrAE closed this as completed Jan 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gini operate on subspaces that than “features” #91

Gini operate on subspaces that than “features” #91

falkben commented Nov 19, 2018 •

edited

jovo commented Nov 20, 2018 via email

MrAE commented Jan 15, 2019

jovo commented Jan 15, 2019 via email

falkben commented Jan 15, 2019

jovo commented Jan 15, 2019 via email

MrAE commented Jan 16, 2019

jovo commented Jan 16, 2019 via email

Gini operate on subspaces that than “features” #91

Gini operate on subspaces that than “features” #91

Comments

falkben commented Nov 19, 2018 • edited

jovo commented Nov 20, 2018 via email

MrAE commented Jan 15, 2019

jovo commented Jan 15, 2019 via email

falkben commented Jan 15, 2019

jovo commented Jan 15, 2019 via email

MrAE commented Jan 16, 2019

jovo commented Jan 16, 2019 via email

falkben commented Nov 19, 2018 •

edited