diff --git a/2-Regression/3-Linear/README.md b/2-Regression/3-Linear/README.md index d213a7b436..e90295aa1a 100644 --- a/2-Regression/3-Linear/README.md +++ b/2-Regression/3-Linear/README.md @@ -13,6 +13,10 @@ Now you are ready to dive deeper into regression for ML. While visualization all In this lesson, you will learn more about two types of regression: _basic linear regression_ and _polynomial regression_, along with some of the math underlying these techniques. Those models will allow us to predict pumpkin prices depending on different input data. +[![ML for beginners - Understanding Linear Regression](https://img.youtube.com/vi/CRxFT8oTDMg/0.jpg)](https://youtu.be/CRxFT8oTDMg "ML for beginners - Understanding Linear Regression") + +> 🎥 Click the image above for a short video overview of linear regression. + > Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension. ### Prerequisite @@ -95,6 +99,10 @@ Now that you have an understanding of the math behind linear regression, let's c ## Looking for Correlation +[![ML for beginners - Looking for Correlation: The Key to Linear Regression](https://img.youtube.com/vi/uoRq-lW2eQo/0.jpg)](https://youtu.be/uoRq-lW2eQo "ML for beginners - Looking for Correlation: The Key to Linear Regression") + +> 🎥 Click the image above for a short video overview of correlation. + From the previous lesson you have probably seen that the average price for different months looks like this: Average price by month @@ -151,6 +159,10 @@ Another approach would be to fill those empty values with mean values from the c ## Simple Linear Regression +[![ML for beginners - Linear and Polynomial Regression using Scikit-learn](https://img.youtube.com/vi/e4c_UP2fSjg/0.jpg)](https://youtu.be/e4c_UP2fSjg "ML for beginners - Linear and Polynomial Regression using Scikit-learn") + +> 🎥 Click the image above for a short video overview of linear and polynomial regression. + To train our Linear Regression model, we will use the **Scikit-learn** library. ```python @@ -209,7 +221,6 @@ plt.plot(X_test,pred) Linear regression - ## Polynomial Regression Another type of Linear Regression is Polynomial Regression. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line. @@ -236,7 +247,7 @@ pipeline.fit(X_train,y_train) Using `PolynomialFeatures(2)` means that we will include all second-degree polynomials from the input data. In our case it will just mean `DayOfYear`2, but given two input variables X and Y, this will add X2, XY and Y2. We may also use higher degree polynomials if we want. Pipelines can be used in the same manner as the original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve: - + Polynomial regression Using Polynomial Regression, we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features! @@ -249,6 +260,10 @@ Using Polynomial Regression, we can get slightly lower MSE and higher determinat In the ideal world, we want to be able to predict prices for different pumpkin varieties using the same model. However, the `Variety` column is somewhat different from columns like `Month`, because it contains non-numeric values. Such columns are called **categorical**. +[![ML for beginners - Categorical Feature Predictions with Linear Regression](https://img.youtube.com/vi/DYGliioIAE0/0.jpg)](https://youtu.be/DYGliioIAE0 "ML for beginners - Categorical Feature Predictions with Linear Regression") + +> 🎥 Click the image above for a short video overview of using categorical features. + Here you can see how average price depends on variety: Average price by variety diff --git a/2-Regression/4-Logistic/README.md b/2-Regression/4-Logistic/README.md index b3149b5161..43ca6aef3a 100644 --- a/2-Regression/4-Logistic/README.md +++ b/2-Regression/4-Logistic/README.md @@ -16,6 +16,7 @@ In this lesson, you will learn: - Techniques for logistic regression ✅ Deepen your understanding of working with this type of regression in this [Learn module](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-77952-leestott) + ## Prerequisite Having worked with the pumpkin data, we are now familiar enough with it to realize that there's one binary category that we can work with: `Color`. @@ -34,12 +35,17 @@ For our purposes, we will express this as a binary: 'White' or 'Not White'. Ther Logistic regression differs from linear regression, which you learned about previously, in a few important ways. +[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data") + +> 🎥 Click the image above for a short video overview of logistic regression. + ### Binary classification Logistic regression does not offer the same features as linear regression. The former offers a prediction about a binary category ("orange or not orange") whereas the latter is capable of predicting continual values, for example given the origin of a pumpkin and the time of harvest, _how much its price will rise_. ![Pumpkin classification Model](./images/pumpkin-classifier.png) > Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded) + ### Other classifications There are other types of logistic regression, including multinomial and ordinal: @@ -57,6 +63,10 @@ Remember how linear regression worked better with more correlated variables? Log Logistic regression will give more accurate results if you use more data; our small dataset is not optimal for this task, so keep that in mind. +[![ML for beginners - Data Analysis and Preparation for Logistic Regression](https://img.youtube.com/vi/B2X4H9vcXTs/0.jpg)](https://youtu.be/B2X4H9vcXTs "ML for beginners - Data Analysis and Preparation for Logistic Regression") + +> 🎥 Click the image above for a short video overview of preparing data for linear regression + ✅ Think about the types of data that would lend themselves well to logistic regression ## Exercise - tidy the data @@ -215,6 +225,10 @@ You can visualize variables side-by-side with Seaborn plots. Building a model to find these binary classification is surprisingly straightforward in Scikit-learn. +[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data") + +> 🎥 Click the image above for a short video overview of building a linear regression model + 1. Select the variables you want to use in your classification model and split the training and test sets calling `train_test_split()`: ```python @@ -327,6 +341,10 @@ Let's revisit the terms we saw earlier with the help of the confusion matrix's m ## Visualize the ROC curve of this model +[![ML for beginners - Analyzing Logistic Regression Performance with ROC Curves](https://img.youtube.com/vi/GApO575jTA0/0.jpg)](https://youtu.be/GApO575jTA0 "ML for beginners - Analyzing Logistic Regression Performance with ROC Curves") + +> 🎥 Click the image above for a short video overview of ROC curves + Let's do one more visualization to see the so-called 'ROC' curve: ```python @@ -346,6 +364,7 @@ plt.ylabel('True Positive Rate') plt.title('ROC Curve') plt.show() ``` + Using Matplotlib, plot the model's [Receiving Operating Characteristic](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) or ROC. ROC curves are often used to get a view of the output of a classifier in terms of its true vs. false positives. "ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis." Thus, the steepness of the curve and the space between the midpoint line and the curve matter: you want a curve that quickly heads up and over the line. In our case, there are false positives to start with, and then the line heads up and over properly: ![ROC](./images/ROC_2.png)