Skip to content

Commit

Permalink
build based on 7a27869
Browse files Browse the repository at this point in the history
  • Loading branch information
Documenter.jl committed Apr 28, 2023
1 parent 5c4e99f commit c59d287
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
4 changes: 2 additions & 2 deletions dev/api/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
max_rules::Int=10
) -&gt; MLJModelInterface.Probabilistic</code></pre><p>Explainable rule-based model based on a random forest. This SIRUS algorithm extracts rules from a stabilized random forest. See the <a href="https://huijzer.xyz/StableTrees.jl/dev/">main page of the documentation</a> for details about how it works.</p><p><strong>Example</strong></p><p>The classifier satisfies the MLJ interface, so it can be used like any other MLJ model. For example, it can be used to create a machine:</p><pre><code class="language-julia hljs">julia&gt; using SIRUS, MLJ

julia&gt; mach = machine(StableRulesClassifier(; max_rules=15), X, y);</code></pre><p><strong>Arguments</strong></p><ul><li><code>rng</code>: Random number generator. <code>StableRNGs</code> are advised.</li><li><code>partial_sampling</code>: Ratio of samples to use in each subset of the data. The default of 0.7 should be fine for most cases.</li><li><code>n_trees</code>: The number of trees to use. The higher the number, the more likely it is that the correct rules are extracted from the trees, but also the longer model fitting will take. In most cases, 1000 rules should be more than enough, but it might be useful to run 2000 rules one time and verify that the model performance does not change much.</li><li><code>max_depth</code>: The depth of the tree. A lower depth decreases model complexity and can therefore improve accuracy when the sample size is small (reduce overfitting).</li><li><code>q</code>: Number of cutpoints to use per feature. The default value of 10 should be good for most situations.</li><li><code>min_data_in_leaf</code>: Minimum number of data points per leaf.</li><li><code>max_rules</code>: This is the most important hyperparameter. In general, the more rules, the more accurate the model. However, more rules will also decrease model interpretability. So, it is important to find a good balance here. In most cases, 10-40 rules should provide reasonable accuracy while remaining interpretable.</li><li><code>lambda</code>: The weights of the final rules are determined via a regularized regression over each rule as a binary feature. This hyperparameter specifies the strength of the ridge (L2) regularizer. Since the rules are quite strongly correlated, the ridge regularizer is the most useful to stabilize the weight estimates.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/mlj.jl#L87-L139">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.StableForestClassifier" href="#SIRUS.StableForestClassifier"><code>SIRUS.StableForestClassifier</code></a><span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">StableForestClassifier(;
julia&gt; mach = machine(StableRulesClassifier(; max_rules=15), X, y);</code></pre><p><strong>Arguments</strong></p><ul><li><code>rng</code>: Random number generator. <code>StableRNGs</code> are advised.</li><li><code>partial_sampling</code>: Ratio of samples to use in each subset of the data. The default of 0.7 should be fine for most cases.</li><li><code>n_trees</code>: The number of trees to use. The higher the number, the more likely it is that the correct rules are extracted from the trees, but also the longer model fitting will take. In most cases, 1000 rules should be more than enough, but it might be useful to run 2000 rules one time and verify that the model performance does not change much.</li><li><code>max_depth</code>: The depth of the tree. A lower depth decreases model complexity and can therefore improve accuracy when the sample size is small (reduce overfitting).</li><li><code>q</code>: Number of cutpoints to use per feature. The default value of 10 should be good for most situations.</li><li><code>min_data_in_leaf</code>: Minimum number of data points per leaf.</li><li><code>max_rules</code>: This is the most important hyperparameter. In general, the more rules, the more accurate the model. However, more rules will also decrease model interpretability. So, it is important to find a good balance here. In most cases, 10-40 rules should provide reasonable accuracy while remaining interpretable.</li><li><code>lambda</code>: The weights of the final rules are determined via a regularized regression over each rule as a binary feature. This hyperparameter specifies the strength of the ridge (L2) regularizer. Since the rules are quite strongly correlated, the ridge regularizer is the most useful to stabilize the weight estimates.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/mlj.jl#L87-L139">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.StableForestClassifier" href="#SIRUS.StableForestClassifier"><code>SIRUS.StableForestClassifier</code></a><span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">StableForestClassifier(;
rng::AbstractRNG=default_rng(),
partial_sampling::Real=0.7,
n_trees::Int=1_000,
Expand All @@ -18,4 +18,4 @@
min_data_in_leaf::Int=5
) &lt;: MLJModelInterface.Probabilistic</code></pre><p>Random forest classifier with a stabilized forest structure (Bénard et al., <a href="http://proceedings.mlr.press/v130/benard21a.html">2021</a>). This stabilization increases stability when extracting rules. The impact on the predictive accuracy compared to standard random forests should be relatively small.</p><div class="admonition is-info"><header class="admonition-header">Note</header><div class="admonition-body"><p>Just like normal random forests, this model is not easily explainable. If you are interested in an explainable model, use the <code>StableRulesClassifier</code>.</p></div></div><p><strong>Example</strong></p><p>The classifier satisfies the MLJ interface, so it can be used like any other MLJ model. For example, it can be used to create a machine:</p><pre><code class="language-julia hljs">julia&gt; using SIRUS, MLJ

julia&gt; mach = machine(StableForestClassifier(), X, y);</code></pre><p><strong>Arguments</strong></p><ul><li><code>rng</code>: Random number generator. <code>StableRNGs</code> are advised.</li><li><code>partial_sampling</code>: Ratio of samples to use in each subset of the data. The default of 0.7 should be fine for most cases.</li><li><code>n_trees</code>: The number of trees to use.</li><li><code>max_depth</code>: The depth of the tree. A lower depth decreases model complexity and can therefore improve accuracy when the sample size is small (reduce overfitting).</li><li><code>q</code>: Number of cutpoints to use per feature. The default value of 10 should be good for most situations.</li><li><code>min_data_in_leaf</code>: Minimum number of data points per leaf.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/mlj.jl#L35-L77">source</a></section></article><h2 id="Methods"><a class="docs-heading-anchor" href="#Methods">Methods</a><a id="Methods-1"></a><a class="docs-heading-anchor-permalink" href="#Methods" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="SIRUS.feature_names" href="#SIRUS.feature_names"><code>SIRUS.feature_names</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">feature_names(rule::Rule) -&gt; Vector{String}</code></pre><p>Return a vector of feature names; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/rules.jl#L107-L111">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.directions" href="#SIRUS.directions"><code>SIRUS.directions</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">directions(rule::Rule) -&gt; Vector{Symbol}</code></pre><p>Return a vector of split directions; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/rules.jl#L116-L120">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="Base.values-Tuple{SIRUS.Rule}" href="#Base.values-Tuple{SIRUS.Rule}"><code>Base.values</code></a><span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">values(rule::Rule) -&gt; Vector{Float64}</code></pre><p>Return a vector split values; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/rules.jl#L125-L129">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.satisfies" href="#SIRUS.satisfies"><code>SIRUS.satisfies</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">satisfies(row::AbstractVector, rule::Rule)</code></pre><p>Return whether data <code>row</code> satisfies <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/f7cd25efb762538c3a8a6a562eda496b1d2410f8/src/rules.jl#L448-L452">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../">« SIRUS</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.24 on <span class="colophon-date" title="Friday 28 April 2023 11:59">Friday 28 April 2023</span>. Using Julia version 1.8.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
julia&gt; mach = machine(StableForestClassifier(), X, y);</code></pre><p><strong>Arguments</strong></p><ul><li><code>rng</code>: Random number generator. <code>StableRNGs</code> are advised.</li><li><code>partial_sampling</code>: Ratio of samples to use in each subset of the data. The default of 0.7 should be fine for most cases.</li><li><code>n_trees</code>: The number of trees to use.</li><li><code>max_depth</code>: The depth of the tree. A lower depth decreases model complexity and can therefore improve accuracy when the sample size is small (reduce overfitting).</li><li><code>q</code>: Number of cutpoints to use per feature. The default value of 10 should be good for most situations.</li><li><code>min_data_in_leaf</code>: Minimum number of data points per leaf.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/mlj.jl#L35-L77">source</a></section></article><h2 id="Methods"><a class="docs-heading-anchor" href="#Methods">Methods</a><a id="Methods-1"></a><a class="docs-heading-anchor-permalink" href="#Methods" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-binding" id="SIRUS.feature_names" href="#SIRUS.feature_names"><code>SIRUS.feature_names</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">feature_names(rule::Rule) -&gt; Vector{String}</code></pre><p>Return a vector of feature names; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/rules.jl#L107-L111">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.directions" href="#SIRUS.directions"><code>SIRUS.directions</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">directions(rule::Rule) -&gt; Vector{Symbol}</code></pre><p>Return a vector of split directions; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/rules.jl#L116-L120">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="Base.values-Tuple{SIRUS.Rule}" href="#Base.values-Tuple{SIRUS.Rule}"><code>Base.values</code></a><span class="docstring-category">Method</span></header><section><div><pre><code class="language-julia hljs">values(rule::Rule) -&gt; Vector{Float64}</code></pre><p>Return a vector split values; one for each clause in <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/rules.jl#L125-L129">source</a></section></article><article class="docstring"><header><a class="docstring-binding" id="SIRUS.satisfies" href="#SIRUS.satisfies"><code>SIRUS.satisfies</code></a><span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">satisfies(row::AbstractVector, rule::Rule)</code></pre><p>Return whether data <code>row</code> satisfies <code>rule</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/rikhuijzer/SIRUS.jl/blob/7a27869541018686fe77a8f558c298c5c5c81646/src/rules.jl#L448-L452">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../">« SIRUS</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 0.27.24 on <span class="colophon-date" title="Friday 28 April 2023 12:05">Friday 28 April 2023</span>. Using Julia version 1.8.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
8 changes: 4 additions & 4 deletions dev/index.html

Large diffs are not rendered by default.

Loading

0 comments on commit c59d287

Please sign in to comment.