Skip to content

Commit

Permalink
Merge pull request #661 from jimbobbennett/add-videos
Browse files Browse the repository at this point in the history
Adding videos guiding through regression lessons
  • Loading branch information
carlotta94c committed May 31, 2023
2 parents 1d9c863 + 95a6fef commit 5fb278d
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 2 deletions.
19 changes: 17 additions & 2 deletions 2-Regression/3-Linear/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Now you are ready to dive deeper into regression for ML. While visualization all

In this lesson, you will learn more about two types of regression: _basic linear regression_ and _polynomial regression_, along with some of the math underlying these techniques. Those models will allow us to predict pumpkin prices depending on different input data.

[![ML for beginners - Understanding Linear Regression](https://img.youtube.com/vi/CRxFT8oTDMg/0.jpg)](https://youtu.be/CRxFT8oTDMg "ML for beginners - Understanding Linear Regression")

> 🎥 Click the image above for a short video overview of linear regression.
> Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension.
### Prerequisite
Expand Down Expand Up @@ -95,6 +99,10 @@ Now that you have an understanding of the math behind linear regression, let's c

## Looking for Correlation

[![ML for beginners - Looking for Correlation: The Key to Linear Regression](https://img.youtube.com/vi/uoRq-lW2eQo/0.jpg)](https://youtu.be/uoRq-lW2eQo "ML for beginners - Looking for Correlation: The Key to Linear Regression")

> 🎥 Click the image above for a short video overview of correlation.
From the previous lesson you have probably seen that the average price for different months looks like this:

<img alt="Average price by month" src="../2-Data/images/barchart.png" width="50%"/>
Expand Down Expand Up @@ -151,6 +159,10 @@ Another approach would be to fill those empty values with mean values from the c

## Simple Linear Regression

[![ML for beginners - Linear and Polynomial Regression using Scikit-learn](https://img.youtube.com/vi/e4c_UP2fSjg/0.jpg)](https://youtu.be/e4c_UP2fSjg "ML for beginners - Linear and Polynomial Regression using Scikit-learn")

> 🎥 Click the image above for a short video overview of linear and polynomial regression.
To train our Linear Regression model, we will use the **Scikit-learn** library.

```python
Expand Down Expand Up @@ -209,7 +221,6 @@ plt.plot(X_test,pred)

<img alt="Linear regression" src="images/linear-results.png" width="50%" />


## Polynomial Regression

Another type of Linear Regression is Polynomial Regression. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line.
Expand All @@ -236,7 +247,7 @@ pipeline.fit(X_train,y_train)
Using `PolynomialFeatures(2)` means that we will include all second-degree polynomials from the input data. In our case it will just mean `DayOfYear`<sup>2</sup>, but given two input variables X and Y, this will add X<sup>2</sup>, XY and Y<sup>2</sup>. We may also use higher degree polynomials if we want.

Pipelines can be used in the same manner as the original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve:

<img alt="Polynomial regression" src="images/poly-results.png" width="50%" />

Using Polynomial Regression, we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features!
Expand All @@ -249,6 +260,10 @@ Using Polynomial Regression, we can get slightly lower MSE and higher determinat

In the ideal world, we want to be able to predict prices for different pumpkin varieties using the same model. However, the `Variety` column is somewhat different from columns like `Month`, because it contains non-numeric values. Such columns are called **categorical**.

[![ML for beginners - Categorical Feature Predictions with Linear Regression](https://img.youtube.com/vi/DYGliioIAE0/0.jpg)](https://youtu.be/DYGliioIAE0 "ML for beginners - Categorical Feature Predictions with Linear Regression")

> 🎥 Click the image above for a short video overview of using categorical features.
Here you can see how average price depends on variety:

<img alt="Average price by variety" src="images/price-by-variety.png" width="50%" />
Expand Down
19 changes: 19 additions & 0 deletions 2-Regression/4-Logistic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ In this lesson, you will learn:
- Techniques for logistic regression

✅ Deepen your understanding of working with this type of regression in this [Learn module](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-77952-leestott)

## Prerequisite

Having worked with the pumpkin data, we are now familiar enough with it to realize that there's one binary category that we can work with: `Color`.
Expand All @@ -34,12 +35,17 @@ For our purposes, we will express this as a binary: 'White' or 'Not White'. Ther

Logistic regression differs from linear regression, which you learned about previously, in a few important ways.

[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data")

> 🎥 Click the image above for a short video overview of logistic regression.
### Binary classification

Logistic regression does not offer the same features as linear regression. The former offers a prediction about a binary category ("orange or not orange") whereas the latter is capable of predicting continual values, for example given the origin of a pumpkin and the time of harvest, _how much its price will rise_.

![Pumpkin classification Model](./images/pumpkin-classifier.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
### Other classifications

There are other types of logistic regression, including multinomial and ordinal:
Expand All @@ -57,6 +63,10 @@ Remember how linear regression worked better with more correlated variables? Log

Logistic regression will give more accurate results if you use more data; our small dataset is not optimal for this task, so keep that in mind.

[![ML for beginners - Data Analysis and Preparation for Logistic Regression](https://img.youtube.com/vi/B2X4H9vcXTs/0.jpg)](https://youtu.be/B2X4H9vcXTs "ML for beginners - Data Analysis and Preparation for Logistic Regression")

> 🎥 Click the image above for a short video overview of preparing data for linear regression
✅ Think about the types of data that would lend themselves well to logistic regression

## Exercise - tidy the data
Expand Down Expand Up @@ -215,6 +225,10 @@ You can visualize variables side-by-side with Seaborn plots.

Building a model to find these binary classification is surprisingly straightforward in Scikit-learn.

[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data")

> 🎥 Click the image above for a short video overview of building a linear regression model
1. Select the variables you want to use in your classification model and split the training and test sets calling `train_test_split()`:

```python
Expand Down Expand Up @@ -327,6 +341,10 @@ Let's revisit the terms we saw earlier with the help of the confusion matrix's m

## Visualize the ROC curve of this model

[![ML for beginners - Analyzing Logistic Regression Performance with ROC Curves](https://img.youtube.com/vi/GApO575jTA0/0.jpg)](https://youtu.be/GApO575jTA0 "ML for beginners - Analyzing Logistic Regression Performance with ROC Curves")

> 🎥 Click the image above for a short video overview of ROC curves
Let's do one more visualization to see the so-called 'ROC' curve:

```python
Expand All @@ -346,6 +364,7 @@ plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
```

Using Matplotlib, plot the model's [Receiving Operating Characteristic](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) or ROC. ROC curves are often used to get a view of the output of a classifier in terms of its true vs. false positives. "ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis." Thus, the steepness of the curve and the space between the midpoint line and the curve matter: you want a curve that quickly heads up and over the line. In our case, there are false positives to start with, and then the line heads up and over properly:

![ROC](./images/ROC_2.png)
Expand Down

0 comments on commit 5fb278d

Please sign in to comment.