Skip to content

Commit

Permalink
revise nr exercise
Browse files Browse the repository at this point in the history
  • Loading branch information
lisa-wm committed Jul 15, 2024
1 parent fa09cbc commit 4fc370d
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 386 deletions.
Binary file added exercises-pdf/nested_resampling_all.pdf
Binary file not shown.
Binary file added exercises-pdf/nested_resampling_ex.pdf
Binary file not shown.
479 changes: 103 additions & 376 deletions exercises/nested-resampling/nested_resampling.html

Large diffs are not rendered by default.

25 changes: 15 additions & 10 deletions exercises/nested-resampling/nested_resampling.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@ subtitle: "[Introduction to Machine Learning](https://slds-lmu.github.io/i2ml/)"
notebook-view:
- notebook: ex_nested_resampling_R.ipynb
title: "Exercise sheet for R"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/ex_forests_R.ipynb"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/ex_nested_resampling_R.ipynb"
- notebook: ex_nested_resampling_py.ipynb
title: "Exercise sheet for Python"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/ex_forests_py.ipynb"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/ex_nested_resampling_py.ipynb"
- notebook: sol_nested_resampling_R.ipynb
title: "Solutions for R"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/sol_forests_R.ipynb"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/sol_nested_resampling_R.ipynb"
- notebook: sol_nested_resampling_py.ipynb
title: "Solutions for Python"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/exercises/nested_resampling/sol_forests_py.ipynb"
url: "https://github.com/slds-lmu/lecture_i2ml/blob/master/exercises/nested-resampling/sol_nested_resampling_py.ipynb"
---

::: {.content-hidden when-format="pdf"}
Expand All @@ -37,7 +37,8 @@ notebook-view:
## Exercise 1: Tuning Principles

::: {.callout-note title="Learning goals" icon=false}
TBD
1. Understand model fitting procedure in nested resampling
2. Discuss bias and variance in nested resampling
:::


Expand Down Expand Up @@ -155,7 +156,7 @@ ii. False -- we are relatively flexible in choosing the outer loss, but the inne
## Exercise 2: AutoML

::: {.callout-note title="Learning goals" icon=false}
TBD
Build autoML pipeline with R/Python
:::

In this exercise, we build a simple automated machine learning (AutoML) system that will make data-driven choices on which learner/estimator to use and also conduct the necessary tuning.
Expand Down Expand Up @@ -261,7 +262,7 @@ You need to define dependencies, since the tuning process is defined by which le
:::

***
\item Conveniently, there is a sugar function, `tune_nested()`, that takes care of nested resampling in one step. Use it to evaluate your tuned graph learner with
Conveniently, there is a sugar function, `tune_nested()`, that takes care of nested resampling in one step. Use it to evaluate your tuned graph learner with

- mean classification error as inner loss,

Expand Down Expand Up @@ -421,7 +422,9 @@ for i, (train_index, val_index) in enumerate(outer_cv.split(X_train, y_train)):
<details>
<summary>**Solution**</summary>

Define resampling strategies
{{< embed sol_nested_resampling_py.ipynb#2-f-1 echo=true >}}
Run loop
{{< embed sol_nested_resampling_py.ipynb#2-f-2 echo=true >}}

</details>
Expand All @@ -434,8 +437,11 @@ Extract performance estimates per outer fold and overall (as mean). According to
<details>
<summary>**Solution**</summary>

per fold
{{< embed sol_nested_resampling_py.ipynb#2-g-1 echo=true >}}
aggregated
{{< embed sol_nested_resampling_py.ipynb#2-g-2 echo=true >}}
detailed
{{< embed sol_nested_resampling_py.ipynb#2-g-3 echo=true >}}

</details>
Expand All @@ -453,8 +459,7 @@ Lastly, evaluate the performance on the test set. Think about the imbalance of y
Accuracy does not account for imbalanced data! Let's check how the test data is distributed:

{{< embed sol_nested_resampling_py.ipynb#2-h-2 echo=true >}}


Confusion matrix
{{< embed sol_nested_resampling_py.ipynb#2-h-3 echo=true >}}

The distribution shows a shift towards 'false' with $2/3$ of all test observations.
Expand All @@ -474,7 +479,7 @@ Congrats, you just designed a turn-key AutoML system that does (nearly) all the
## Exercise 3: Kaggle Challenge

::: {.callout-note title="Learning goals" icon=false}
TBD
Apply course contents to real-world problem
:::

Make yourself familiar with the [Titanic Kaggle challenge](https://www.kaggle.com/c/titanic).
Expand Down

0 comments on commit 4fc370d

Please sign in to comment.