<TaskClassif:pima> (768 x 9): Pima Indian Diabetes
@@ -3489,16 +3406,12 @@ Exercise 2: AutoML
Solution
-
+
@@ -3521,7 +3434,7 @@ Exercise 2: AutoML
Solution
-
+
# check available hyperparameters for tuning (converting to data.table for
# better readability)
tail(as.data.table(graph_learner$param_set), 10)
@@ -3539,7 +3452,6 @@ Exercise 2: AutoML
# in result tables)
graph_learner$id <- "graph_learner"
-
A data.table: 10 x 11
@@ -3706,7 +3618,6 @@ Exercise 2: AutoML
-
Conveniently, there is a sugar function, tune_nested()
, that takes care of nested resampling in one step. Use it to evaluate your tuned graph learner with
@@ -3749,7 +3660,6 @@ Exercise 2: AutoML
-
A rr_score: 3 x 9
@@ -3813,7 +3723,6 @@ Exercise 2: AutoML
-
classif.ce: 0.2421875
@@ -3974,6 +3883,7 @@ Exercise 2: AutoML
Solution
+Define resampling strategies
+Run loop
for i, (train_index, val_index) in enumerate(outer_cv.split(X_train, y_train)):
@@ -4009,6 +3920,7 @@ Exercise 2: AutoML
Solution
+per fold
+aggregated
+detailed
-
+
# Nested CV with parameter optimization for ensemble pipeline
clf_gs_voting = GridSearchCV(
estimator=clf_voting,
@@ -4128,13 +4042,12 @@ Exercise 2: AutoML
Accuracy does not account for imbalanced data! Let’s check how the test data is distributed:
-
+
unique, counts = np.unique(y_test, return_counts=True)
table = pd.DataFrame(data = dict(zip(unique, counts)), index=[0]) #index necassary because only numeric values are in dictionary
table
-
@@ -4158,15 +4071,14 @@ Exercise 2: AutoML
-
+Confusion matrix
-
+
pred_test = clf_gs_voting.predict(X_test)
conf_matrix = pd.DataFrame(confusion_matrix(pred_test, y_test))
conf_matrix
-
@@ -4195,7 +4107,6 @@ Exercise 2: AutoML
-
The distribution shows a shift towards ‘false’ with \(2/3\) of all test observations.
@@ -4227,7 +4138,7 @@ Exercise 3: Ka
-TBD
+Apply course contents to real-world problem
Make yourself familiar with the Titanic Kaggle challenge.
@@ -4297,33 +4208,6 @@ Exercise 3: Ka
}
}
}
- const toggleGiscusIfUsed = (isAlternate, darkModeDefault) => {
- const baseTheme = document.querySelector('#giscus-base-theme')?.value ?? 'light';
- const alternateTheme = document.querySelector('#giscus-alt-theme')?.value ?? 'dark';
- let newTheme = '';
- if(darkModeDefault) {
- newTheme = isAlternate ? baseTheme : alternateTheme;
- } else {
- newTheme = isAlternate ? alternateTheme : baseTheme;
- }
- const changeGiscusTheme = () => {
- // From: https://github.com/giscus/giscus/issues/336
- const sendMessage = (message) => {
- const iframe = document.querySelector('iframe.giscus-frame');
- if (!iframe) return;
- iframe.contentWindow.postMessage({ giscus: message }, 'https://giscus.app');
- }
- sendMessage({
- setConfig: {
- theme: newTheme
- }
- });
- }
- const isGiscussLoaded = window.document.querySelector('iframe.giscus-frame') !== null;
- if (isGiscussLoaded) {
- changeGiscusTheme();
- }
- }
const toggleColorMode = (alternate) => {
// Switch the stylesheets
const alternateStylesheets = window.document.querySelectorAll('link.quarto-color-scheme.quarto-color-alternate');
@@ -4390,15 +4274,13 @@ Exercise 3: Ka
return localAlternateSentinel;
}
}
- const darkModeDefault = false;
- let localAlternateSentinel = darkModeDefault ? 'alternate' : 'default';
+ let localAlternateSentinel = 'default';
// Dark / light mode switch
window.quartoToggleColorScheme = () => {
// Read the current dark / light value
let toAlternate = !hasAlternateSentinel();
toggleColorMode(toAlternate);
setStyleSentinel(toAlternate);
- toggleGiscusIfUsed(toAlternate, darkModeDefault);
};
// Ensure there is a toggle, if there isn't float one in the top right
if (window.document.querySelector('.quarto-color-scheme-toggle') === null) {
@@ -4477,9 +4359,10 @@ Exercise 3: Ka
// clear code selection
e.clearSelection();
});
- function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
+ function tippyHover(el, contentFn) {
const config = {
allowHTML: true,
+ content: contentFn,
maxWidth: 500,
delay: 100,
arrow: false,
@@ -4489,17 +4372,8 @@ Exercise 3: Ka
interactive: true,
interactiveBorder: 10,
theme: 'quarto',
- placement: 'bottom-start',
+ placement: 'bottom-start'
};
- if (contentFn) {
- config.content = contentFn;
- }
- if (onTriggerFn) {
- config.onTrigger = onTriggerFn;
- }
- if (onUntriggerFn) {
- config.onUntrigger = onUntriggerFn;
- }
window.tippy(el, config);
}
const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
@@ -4513,125 +4387,6 @@ Exercise 3: Ka
const note = window.document.getElementById(id);
return note.innerHTML;
});
- }
- const xrefs = window.document.querySelectorAll('a.quarto-xref');
- const processXRef = (id, note) => {
- // Strip column container classes
- const stripColumnClz = (el) => {
- el.classList.remove("page-full", "page-columns");
- if (el.children) {
- for (const child of el.children) {
- stripColumnClz(child);
- }
- }
- }
- stripColumnClz(note)
- if (id === null || id.startsWith('sec-')) {
- // Special case sections, only their first couple elements
- const container = document.createElement("div");
- if (note.children && note.children.length > 2) {
- container.appendChild(note.children[0].cloneNode(true));
- for (let i = 1; i < note.children.length; i++) {
- const child = note.children[i];
- if (child.tagName === "P" && child.innerText === "") {
- continue;
- } else {
- container.appendChild(child.cloneNode(true));
- break;
- }
- }
- if (window.Quarto?.typesetMath) {
- window.Quarto.typesetMath(container);
- }
- return container.innerHTML
- } else {
- if (window.Quarto?.typesetMath) {
- window.Quarto.typesetMath(note);
- }
- return note.innerHTML;
- }
- } else {
- // Remove any anchor links if they are present
- const anchorLink = note.querySelector('a.anchorjs-link');
- if (anchorLink) {
- anchorLink.remove();
- }
- if (window.Quarto?.typesetMath) {
- window.Quarto.typesetMath(note);
- }
- // TODO in 1.5, we should make sure this works without a callout special case
- if (note.classList.contains("callout")) {
- return note.outerHTML;
- } else {
- return note.innerHTML;
- }
- }
- }
- for (var i=0; i res.text())
- .then(html => {
- const parser = new DOMParser();
- const htmlDoc = parser.parseFromString(html, "text/html");
- const note = htmlDoc.getElementById(id);
- if (note !== null) {
- const html = processXRef(id, note);
- instance.setContent(html);
- }
- }).finally(() => {
- instance.enable();
- instance.show();
- });
- }
- } else {
- // See if we can fetch a full url (with no hash to target)
- // This is a special case and we should probably do some content thinning / targeting
- fetch(url)
- .then(res => res.text())
- .then(html => {
- const parser = new DOMParser();
- const htmlDoc = parser.parseFromString(html, "text/html");
- const note = htmlDoc.querySelector('main.content');
- if (note !== null) {
- // This should only happen for chapter cross references
- // (since there is no id in the URL)
- // remove the first header
- if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
- note.children[0].remove();
- }
- const html = processXRef(null, note);
- instance.setContent(html);
- }
- }).finally(() => {
- instance.enable();
- instance.show();
- });
- }
- }, function(instance) {
- });
}
let selectedAnnoteEl;
const selectorForAnnotation = ( cell, annotation) => {
@@ -4674,7 +4429,6 @@ Exercise 3: Ka
}
div.style.top = top - 2 + "px";
div.style.height = height + 4 + "px";
- div.style.left = 0;
let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
if (gutterDiv === null) {
gutterDiv = window.document.createElement("div");
@@ -4700,32 +4454,6 @@ Exercise 3: Ka
});
selectedAnnoteEl = undefined;
};
- // Handle positioning of the toggle
- window.addEventListener(
- "resize",
- throttle(() => {
- elRect = undefined;
- if (selectedAnnoteEl) {
- selectCodeLines(selectedAnnoteEl);
- }
- }, 10)
- );
- function throttle(fn, ms) {
- let throttle = false;
- let timer;
- return (...args) => {
- if(!throttle) { // first call gets through
- fn.apply(this, args);
- throttle = true;
- } else { // all the others get throttled
- if(timer) clearTimeout(timer); // cancel #2
- timer = setTimeout(() => {
- fn.apply(this, args);
- timer = throttle = false;
- }, ms);
- }
- };
- }
// Attach click handler to the DT
const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
for (const annoteDlNode of annoteDls) {
@@ -4789,5 +4517,4 @@ Exercise 3: Ka
-
\ No newline at end of file
diff --git a/exercises/nested-resampling/nested_resampling.qmd b/exercises/nested-resampling/nested_resampling.qmd
index 2a3c57acb..8e5cdb57a 100644
--- a/exercises/nested-resampling/nested_resampling.qmd
+++ b/exercises/nested-resampling/nested_resampling.qmd
@@ -4,16 +4,16 @@ subtitle: "[Introduction to Machine Learning](https://slds-lmu.github.io/i2ml/)"
notebook-view:
- notebook: ex_nested_resampling_R.ipynb
title: "Exercise sheet for R"
- url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/ex_forests_R.ipynb"
+ url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/ex_nested_resampling_R.ipynb"
- notebook: ex_nested_resampling_py.ipynb
title: "Exercise sheet for Python"
- url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/ex_forests_py.ipynb"
+ url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/ex_nested_resampling_py.ipynb"
- notebook: sol_nested_resampling_R.ipynb
title: "Solutions for R"
- url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/sol_forests_R.ipynb"
+ url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/sol_nested_resampling_R.ipynb"
- notebook: sol_nested_resampling_py.ipynb
title: "Solutions for Python"
- url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/sol_forests_py.ipynb"
+ url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/sol_nested_resampling_py.ipynb"
---
::: {.content-hidden when-format="pdf"}
@@ -37,7 +37,8 @@ notebook-view:
## Exercise 1: Tuning Principles
::: {.callout-note title="Learning goals" icon=false}
-TBD
+1. Understand model fitting procedure in nested resampling
+2. Discuss bias and variance in nested resampling
:::
@@ -155,7 +156,7 @@ ii. False -- we are relatively flexible in choosing the outer loss, but the inne
## Exercise 2: AutoML
::: {.callout-note title="Learning goals" icon=false}
-TBD
+Build autoML pipeline with R/Python
:::
In this exercise, we build a simple automated machine learning (AutoML) system that will make data-driven choices on which learner/estimator to use and also conduct the necessary tuning.
@@ -261,7 +262,7 @@ You need to define dependencies, since the tuning process is defined by which le
:::
***
-\item Conveniently, there is a sugar function, `tune_nested()`, that takes care of nested resampling in one step. Use it to evaluate your tuned graph learner with
+Conveniently, there is a sugar function, `tune_nested()`, that takes care of nested resampling in one step. Use it to evaluate your tuned graph learner with
- mean classification error as inner loss,
@@ -421,7 +422,9 @@ for i, (train_index, val_index) in enumerate(outer_cv.split(X_train, y_train)):
**Solution**
+Define resampling strategies
{{< embed sol_nested_resampling_py.ipynb#2-f-1 echo=true >}}
+Run loop
{{< embed sol_nested_resampling_py.ipynb#2-f-2 echo=true >}}
@@ -434,8 +437,11 @@ Extract performance estimates per outer fold and overall (as mean). According to
**Solution**
+per fold
{{< embed sol_nested_resampling_py.ipynb#2-g-1 echo=true >}}
+aggregated
{{< embed sol_nested_resampling_py.ipynb#2-g-2 echo=true >}}
+detailed
{{< embed sol_nested_resampling_py.ipynb#2-g-3 echo=true >}}
@@ -453,8 +459,7 @@ Lastly, evaluate the performance on the test set. Think about the imbalance of y
Accuracy does not account for imbalanced data! Let's check how the test data is distributed:
{{< embed sol_nested_resampling_py.ipynb#2-h-2 echo=true >}}
-
-
+Confusion matrix
{{< embed sol_nested_resampling_py.ipynb#2-h-3 echo=true >}}
The distribution shows a shift towards 'false' with $2/3$ of all test observations.
@@ -474,7 +479,7 @@ Congrats, you just designed a turn-key AutoML system that does (nearly) all the
## Exercise 3: Kaggle Challenge
::: {.callout-note title="Learning goals" icon=false}
-TBD
+Apply course contents to real-world problem
:::
Make yourself familiar with the [Titanic Kaggle challenge](https://www.kaggle.com/c/titanic).
Exercise 12 – Nested Resampling
- -$$
% math spaces % N, naturals % Z, integers % Q, rationals % R, reals % C, complex % C, space of continuous functions % machine numbers % maximum error % counting / finite sets % set 0, 1 % set -1, 1 % unit interval % basic math stuff % x tilde % argmax % argmin % argmax with limits % argmin with limits@@ -3302,7 +3216,10 @@
Exercise 1: T
-
TBD
+-
+
- Understand model fitting procedure in nested resampling +
- Discuss bias and variance in nested resampling +
Suppose that we want to compare four different learners:
@@ -3359,7 +3276,7 @@Exercise 1: T
@@ -3407,7 +3324,7 @@ Exercise 2: AutoML
-TBD
+Build autoML pipeline with R/Python
TBD
+Build autoML pipeline with R/Python
In this exercise, we build a simple automated machine learning (AutoML) system that will make data-driven choices on which learner/estimator to use and also conduct the necessary tuning.
@@ -3431,7 +3348,7 @@