Skip to content

Commit

Permalink
changes to exercise formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
mike-ivs committed Sep 14, 2023
1 parent f5d8460 commit 638cb1b
Showing 1 changed file with 7 additions and 13 deletions.
20 changes: 7 additions & 13 deletions _episodes/02-regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,16 +250,13 @@ plt.show()
Comparing the plots and errors it seems like a polynomial regression of N=2 is a far superior fit to Dataset II than a linear fit. In fact, it looks like our polynomial fit almost perfectly fits Dataset II... which is because Dataset II is created from a N=2 polynomial equation!

> ## Exercise: Perform and compare linear and polynomial fits for Datasets I, III, and IV.
> 1. Which performs better for each dataset?
> 2. Modify your polynomial regression function to take `N` as an input parameter to your regression model. How does changing the degree of polynomial fit affect each dataset?
> Which performs better for each dataset? Modify your polynomial regression function to take `N` as an input parameter to your regression model. How does changing the degree of polynomial fit affect each dataset?
> > ## Solution
> > Question 1.
> > ~~~
> > for ds in ["I","II","III","IV"]:
> > # Sort our data in order of our x (feature) values
> > data_ds = data[data["dataset"]==ds]
> > data_ds = data_ds.sort_values("x")
> >
> > fit_a_linear_model(data_ds["x"],data_ds["y"])
> > fit_a_poly_model(data_ds["x"],data_ds["y"])
> >
Expand All @@ -270,13 +267,11 @@ Comparing the plots and errors it seems like a polynomial regression of N=2 is a
> > The `N=2` polynomial fit is far better for Dataset II. According to the RMSE the polynomial is a slightly better fit for Datasets I and III, however it could be argued that a linear fit is good enough.
> > Dataset III looks like a linear relation that has a single outlier, rather than a truly non-linear relation. The polynomial and linear fits perform just as well (or poorly) on Dataset IV.
> > For Dataset IV it looks like `y` may be a better estimator of `x`, than `x` is at estimating `y`.
> >
> > Question 2.
> > ~~~
> > def fit_a_poly_model(x,y,N):
> > # Define our estimator/model(s)
> > poly_features = PolynomialFeatures(degree=N)
> > ...
> > # ...
> > ~~~
> > {: .language-python}
> >
Expand All @@ -286,22 +281,21 @@ Comparing the plots and errors it seems like a polynomial regression of N=2 is a
> > # Sort our data in order of our x (feature) values
> > data_ds = data[data["dataset"]==ds]
> > data_ds = data_ds.sort_values("x")
> >
> > fit_a_linear_model(data_ds["x"],data_ds["y"])
> > for N in range(2,11):
> > print("Polynomial degree =",N)
> > fit_a_poly_model(data_ds["x"],data_ds["y"],N)
> >
> > print("Polynomial degree =",N)
> > fit_a_poly_model(data_ds["x"],data_ds["y"],N)
> > plt.show()
> > ~~~
> > {: .language-python}
> >
> > With a large enough polynomial you can fit through every point with a unique `x` value.
> > Datasets II and IV remain unchanged beyond `N=2` as the polynomial has converged (dataset II) or cannot model the data (Dataset IV).
> > Datasets I and III slowly decrease their RMSE and N is increased, but it is likely that these more complex models are overfitting the data (overfitting is discussed later in the lesson).
> > Datasets II and IV remain unchanged beyond `N=2` as the polynomial has converged (dataset II) or cannot model the data (Dataset IV).
> > Datasets I and III slowly decrease their RMSE and N is increased, but it is likely that these more complex models are overfitting the data. Overfitting is discussed later in the lesson.
> {: .solution}
{: .challenge}


## Let's explore a more realistic scenario

Now that we have some convenient Python functions to perform quick regressions on data it's time to explore a more realistic regression modelling scenario.
Expand Down

0 comments on commit 638cb1b

Please sign in to comment.