New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gini operate on subspaces that than “features” #91
Comments
currently, -1*x and +1*x are treated differently, but they are effectively
identical, so they should be treated the same.
similarly,
[-1 +1]* [x1, x2] is the same as [+1 -1]*[x1 x2].
see what i mean?
…On Mon, Nov 19, 2018 at 10:58 AM Ben Falk ***@***.***> wrote:
@jovo <https://github.com/jovo> @MrAE <https://github.com/MrAE> need a
little more info/explanation
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#91>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACjcnt4m4pFyEASJ6zUdo8gf9B5z7Wiks5uwtU0gaJpZM4YpZTq>
.
--
the glass is all full: half water, half air.
neurodata.io
|
So, would we have the following features as equivalent?
? |
if the decision boundary is the same, than the features are the same.
for example, [-1]*[x1] == [+1][x1]
so, the first two are not equal to the last two.
we need to think more about the others....
…On Tue, Jan 15, 2019 at 4:18 PM Jesse Leigh Patsolic < ***@***.***> wrote:
So, would we have the following features as equivalent?
[-1 +1]* [x1, x2] == [+1 -1]*[x1 x2] =?= [-1 -1]* [x1, x2] == [+1 +1]*[x1
x2]
?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACjckK3hVv2Hvlot7UwV4YQxtA2R72Zks5vDkWPgaJpZM4YpZTq>
.
--
the glass is all full: half water, half air.
neurodata.io
|
How do you want to handle randmat continuous? It seems like the feature importance as calculated right now would count every linear combination as different for continuous. We could tackle that at some other point though. My naive suggestion for continuous would be to just count how many times a particular feature was used in the entire forest (and not worry at all about the weights). |
oh, i never think about continuous.
let's just count that they exist, as you propose....
…On Tue, Jan 15, 2019 at 4:30 PM Ben Falk ***@***.***> wrote:
How do you want to handle randmat continuous? It seems like the feature
importance as calculated right now would count every linear combination as
different for continuous. We could tackle that at some other point though.
My naive suggestion for continuous would be to just count how many times a
particular feature was used in the entire forest (and not worry at all
about the weights).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACjcmQRhwq-UYnKzAgTUbvFMQOc63vFks5vDkhggaJpZM4YpZTq>
.
--
the glass is all full: half water, half air.
neurodata.io
|
Ok, so disregard my previous statement. Projection weights are equivalent if they parametrize the same line in This is easy to check if the weights are binary {1,-1} or are sampled from some finite discrete set. If the weights are sampled continuously, I'm not sure it would be efficient to check this or if we would even have a high enough probability of sampling equivalent vectors. As Ben suggested, we could just count the number of times a unique combination of the original features was used disregarding the projection weights. |
agreed
…On Wed, Jan 16, 2019 at 12:03 PM Jesse Leigh Patsolic < ***@***.***> wrote:
Ok, so disregard my previous statement.
Projection weights are equivalent if they parametrize the same line in
$\mathbb{R}^p$ e.g for two projection vectors $w$ and $v$ if $w = av$ for
some $a \in \mathbb{R}$, then we should count them as the same thing when
computing feature importance.
This is easy to check if the weights are binary {1,-1} or are sampled from
some finite discrete set.
If the weights are sampled continuously, I'm not sure it would be
efficient to check this or if we would even have a high enough probability
of sampling equivalent vectors.
As Ben suggested, we could just count the number of times a unique
combination of the original features was used disregarding the projection
weights.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACjcgBro3LhSR8UGIZM_FkaxY_KAvB7ks5vD1rZgaJpZM4YpZTq>
.
--
the glass is all full: half water, half air.
neurodata.io
|
add some helper functions add test for new way of computing feature importance
* fix issue #91 based on discussion in the comments. add some helper functions add test for new way of computing feature importance * remove need for library(Matrix) and update function parameteres. fix documentation typos [issue #91] * update test-FeatureImportance move `flipWeights` to helperFunctions * update Feature Importance to be more readable [@ben]. Merge RunFeature* into the same file. Update README with correct output names.
* Add Zenodo DOI Badge. (#118) [Closes #74] * Add Zenodo DOI Badge. * Fix link [Closes #74] * speed up travis builds (#125) * removed the distribution and sudo entries from travis config - faster? * adding back sduo false and adding cache packages option * small updates to contributing guide (#133) * add example for running styler move contributing to .github folder * ignore the .github path * fix indexing order of operations error [fixes #119]. (#134) * added functionality to change mtry and sparsity in Urerf (#120) * added functionality to change mtry and sparsity in Urerf * ran styler on modified files and removed white space. * added tests for new RandMat functions. * Added the functionality to splitting based on BIC score using Mclust (#124) * added functionality to change mtry and sparsity in Urerf * Added functionality to split based on BIC score * Add LinearCombo arg to the Urerf fn * Add fast version of BIC * fix some minor errors (#141) Ran through styler and fixed some roxygen import and documentation. * fix issue #91 based on discussion in the comments. (#140) * fix issue #91 based on discussion in the comments. add some helper functions add test for new way of computing feature importance * remove need for library(Matrix) and update function parameteres. fix documentation typos [issue #91] * update test-FeatureImportance move `flipWeights` to helperFunctions * update Feature Importance to be more readable [@ben]. Merge RunFeature* into the same file. Update README with correct output names. * check-as-cran warning will now cause TravisCI to fail. (#142) * Print tree (#136) * added functionality to change mtry and sparsity in Urerf * ran styler on modified files and removed white space. * added tests for new RandMat functions. * added PrintTree function and modified NAMESPACE file to call PrintTree (I'm not sure this last step was necessary but it doesn't hurt. * Add documentation and adjust the formatting of the output. * the double comparison now relies on machine epsilon. (#149) * the double comparison now relies on machine epsilon. * fix for test not passing * move an assignment out of an if condition. (#151) Fixes issue #135 * Packed forest submodule (#152) * add packedForest submodule * update submodule to latest commitadd readme for submodule operations * update submodule readme * update submodule * update submodule (#154) * update submodule (#155) * Draft of v2.0.3 for CRAN (#156) * Draft of v2.0.3 for CRAN no warnings, errors, or notes on my Mac. * run README.Rmd * update submodule (#159)
@jovo @MrAE need a little more info/explanation
The text was updated successfully, but these errors were encountered: