# DS106 Machine Learning : Lesson Three Companion Notebook

### Table of Contents <a class="anchor" id="DS106L3_toc"></a>

* [Table of Contents](#DS106L3_toc)
    * [Page 1 - Introduction](#DS106L3_page_1)
    * [Page 2 - Quadratic Relationships](#DS106L3_page_2)
    * [Page 3 - Quadratic Modeling in R](#DS106L3_page_3)
    * [Page 4 - Exponential Relationships](#DS106L3_page_4)
    * [Page 5 - Exponential Modeling in R](#DS106L3_page_5)
    * [Page 6 - Key Terms](#DS106L3_page_6)
    * [Page 7 - Lesson 3 Hands-On](#DS106L3_page_7)
    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Overview of this Module<a class="anchor" id="DS106L3_page_1"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [1]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Non-Linear Modeling
VimeoVideo('246121345', width=720, height=480)

# Introduction

Previously, you looked at linear models with one predictor and logistic models with one predictor. In this lesson, you will start to look at models that are neither linear nor logistic! There are many non-linear models that exist, but you will only look quadratic and exponential modeling. You will also add on to your work in linear and logistic regression by adding additional predictors (IVs). By the end of this lesson, you should be able to:

* Recognize by shape quadratic and exponential relationships
* Conduct quadratic modeling in R
* Conduct exponential modeling in R

This lesson will culminate with a hands-on in which you test data to determine its shape and then run the appropriate non-linear model.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Quadratic Relationships<a class="anchor" id="DS106L3_page_2"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Quadratic Relationships

There are many real world situations that can be modeled with a quadratic equation. Any time a ball is thrown, or a projectile is shot, or an arrow is shot, the path of the object will take on the shape of a parabola, or u-shape. The u-shape can be right side up (looking like a smiley mouth) or upside down (looking like a frowny mouth). It may be a full U, or it may only be a partial U. Any parabola can be modeled with an equation of the form y = ax<sup>2</sup> + bx + c.

Here are a few examples of some quadratic relationships:

* Some chemical reactions that will progress based on the square of the concentration of the reagents.
* The ideal model for profit vs. price in economics. 
* The stopping distance of a car. 

Below is the general shape that a quadratic relationship will take in the data:

![A graph showing the general shape of a quadratic relationship. The x axis of the graph is labeled age and runs from one to six. The y axis is labeled length and runs from forty to two hundred. Data points are plotted on the graph. A blue line starts in the bottom left corner and curves upward toward the upper right corner. A gray band surrounds the line and is thicker at each end of the line.](Media/nonlinear1.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Quadratic Modeling in R<a class="anchor" id="DS106L3_page_3"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Quadratic Modeling in R

Now that you're in good *shape* with understanding what quadratic data looks like, you'll learn how to model with quadratic data in R.

---

## Load Libraries

All you will need to complete a quadratic model in R is ```ggplot2```, so that you can graph the shape of the data.

```{r}
library("ggplot2")
```

---

## Read in Data

Researchers conducted a study of bluegill fish. They had been tagging fish for years, and were interested in their growth. The data file can be found **[here](https://repo.exeterlms.com/documents/V2/DataScience/Modeling-Optimization/bluegill_fish.zip)**. 

---

## Question Setup

The question you will be answering is: ```Does the age of the bluegill fish influence their length?```

---

## Graph a Quadratic Relationship

If you were unsure whether you had a quadratic relationship with your data, you would want to try to graph it against a best-fit quadratic line to see if your data really was quadratic in nature.  You can do that in good 'ol ```ggplot```! 

```{r}
quadPlot <- ggplot(bluegill_fish, aes(x = age, y=length)) + geom_point() + stat_smooth(method = "lm", formula = y ~x + I(x^2), size =1)
quadPlot
```

You will use ```bluegill_fish``` as your dataset, specify ```age``` as your ```x=``` variable, and specify ```length``` as your ```y=``` variable.  Then you can add dots with ```geom_point()```, and add a best fit line with ```stat_smooth()```.  As arguments, you will add ```method="lm"```, then write out the quadratic formula, which is ```y ~ x + I(x^2)```.

Here is the end result:

![The results of using the predict function. Column headings are precision, recall, F 1 score, and support. Row headings are setosa, versicolor, and virginica. A final row is labeled average forward slash total. Row one, one point zero zero, one point zero zero, one point zero zero, nineteen. Row two, zero point eight three, zero point seven seven, zero point eight zero, thirteen. Row three, zero point seven nine, zero point eight five, zero point eight one, thirteen. Final row, zero point eight nine, zero point eight nine, zero point eight nine, forty five.](Media/nonlinear1.png)

Looks like a quadratic line is a pretty good fit for the data!

---

## Model the Quadratic Relationship

Now that you are sure you have a quadratic relationship, you can go ahead and model it! You will need to square the x term, however, first.  In this example, your x is ```age```.  Simply square it like this and save it as its own variable, ```Agesq```:

```{r}
Agesq <- bluegill_fish$age^2
```

Then you're ready to dust off that favorite tool of yours, ```lm()```.  This time, however, you'll use specify a slightly more sophisticated model so that you can make it quadratic in nature! You'll do the y, which is ```length```, by the x, which is ```age```, and then add in the ```Agesq``` variable that you created above.

```{r}
quadModel <- lm(bluegill_fish$length~bluegill_fish$age+Agesq)
summary(quadModel)
```

And here is the result you get from the ```summary()``` function: 

```text
Call:
lm(formula = bluegill_fish$length ~ bluegill_fish$age + Agesq)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.6170  -5.7699  -0.6662   5.6881  18.1085 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         2.4242     9.5976   0.253    0.801    
bluegill_fish$age  50.4923     5.2141   9.684 7.53e-15 ***
Agesq              -3.6511     0.6951  -5.253 1.36e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.911 on 75 degrees of freedom
Multiple R-squared:  0.8954,	Adjusted R-squared:  0.8926 
F-statistic: 320.9 on 2 and 75 DF,  p-value: < 2.2e-16
```

Looking at the overall ```F-statistic``` shown on the bottom and associated ```p-value```, this quadratic model is significant! This means that age is a significant quadratic predictor of bluegill fish length. 

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>If you would like to learn about exponential regression in Python, <a href="https://www.youtube.com/watch?v=ro5ftxuD6is"> click here.</a> If you would like to learn about exponential regression in Google Sheets, <a href="https://www.youtube.com/watch?v=30yEVjbeq0o"> click here! </a></p>
    </div>
</div>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Exponential Relationships<a class="anchor" id="DS106L3_page_4"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Exponential Relationships

There are natural phenomena that will either increase or decrease exponentially. In today's vernacular, the word "viral" is a sort of substitution for the word exponential. If your Tweet goes "viral," it might simply mean that you told 4 followers, and each of those 4 friends retweeted it to their 4 followers, etc. Before you know it, the tweet has been retweeted tens or even hundreds of thousands of times.

Exponential changes can either be growth or decay. Through the magic of compound interest, your investment account can grow exponentially. On the other hand, radioactive materials typically decay exponentially.

Graphed in statistics, an exponential relationship will usually look something like this:

![A path through a forest.](Media/nonlinear4.jpg)

---

## Decibel Scale Example

There are some common things with which you are probably familiar that are exponential. For example, noise is on an exponential scale called the decibel (dB) scale. In fact, the noise scale is exponential both in intensity, and 'loudness.' For instance, a sound at 40 dB would be quiet talking, whereas a sound at 50 dB (louder conversation) would be 10 times as intense, and twice as loud.

A change of 10 dB is not that big of a deal, but a change of 40 dB (for instance) is a pretty big change. Again starting at 40 dB, a change to 80 dB (loud highway noise at close range) is change of intensity of 10,000x, and a change in loudness of 16x. An 80 dB sound is much more than just twice the intensity or loudness of a 40 dB sound. Take a look:

![The decibal scale, showing sound levels in decibles, from zero to one hundred and ninety. Various audible situations are listed on the left of some of the sound levels. Normal breathing, ten. A whisper at two meters, twenty. A quote silent unquote library, thirty, and so on, up to fireworks at one meter, one hundred fifty. To the right of the scale are how these situations will sound to a person, ranging from faint at thirty decibles to intolerable at one hundred fifty decibles to loudest possible true sound at one hundred ninety decibles.](Media/L03-10.png)

---

## Richter Scale Example

Another common measurement that is also exponential is the Richter scale, which measures magnitude of an earthquake. The scale goes from 1 to 9, but each increase of 1 on the Richter scale translates to an earthquake that has a shaking amplitude that is 10 times higher, and the energy released is 31.6 times as high. A magnitude 5 earthquake is usually felt by those at the epicenter, but the damage is usually minimal unless the buildings are poorly constructed. They rarely get reported unless they are felt in heavily populated areas. On average, there are usually 3 to 5 of these earthquakes every day. On the other hand, a magnitude 6 earthquake can usually be felt up to a couple hundred miles from the epicenter, and damage will vary depending on the quality of the construction at the epicenter. However, they still happen at least a couple times a week. 

An earthquake that measures 7 on the Richter scale is considered to be a major quake. Buildings at the center will suffer major damage to complete collapse, and buildings as much as 150 miles away will have some damage. These occur 1 - 2 times per month. At 8, an earthquake causes major damage to total destruction to even the most sturdy structures at the epicenter, and damage will be widespread. These can be felt several hundred miles away. You get about one of these each year. An earthquake that measures 9 or more on the Richter scale will happen once every 10 to 20 years, usually causes total destruction at the epicenter, and can cause permanent changes in the local topography. The most recent earthquake 9 or higher was in Japan in 2011. Prior to that was the earthquake is Sumatra on the day after Christmas in 2004 (9.1 on the Richter scale); the tsunami that followed killed nearly a quarter of a million people. Prior to these two, the last earthquake of magnitude 9 or higher was way back in the 1960's.

![A road that has been cracked and broken during a strong earthquake.](Media/L03-11.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Exponential Modeling in R<a class="anchor" id="DS106L3_page_5"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Exponential Modeling in R

Now that you have an idea of what to expect in an exponential model, you will try one in R! 

---

## Load Libraries

Believe it or not, you won't need any additional libraries outside of what's included in base R.

---

## Read in Data

A certain strain of bacteria was grown in a controlled environment. Most organisms will grow exponentially until something else starts to inhibit that growth - whether it be predators, or limited food. The exponential growth can be modeled using a regression equation. The Bacteria count was recorded for evenly spaced time periods, and **[the data are shown here](https://repo.exeterlms.com/documents/V2/DataScience/Modeling-Optimization/bacteria.zip)**

---

## Question Setup

You are trying to answer the question of how does much does bacteria grow over time. You will examine the change in ```Count```, your y variable, over time ```Period```, your x variable.

---

## Exponential Modeling

As with quadratic modeling, you will start by using the ```lm()``` function.  However, you will need to take the log of the y variable using the ```log()``` function: 

```{r}
exMod <- lm(log(bacteria$Count)~bacteria$Period)
summary(exMod)
```

Calling a summary on this model results in this:

```text
Call:
lm(formula = log(bacteria$Count) ~ bacteria$Period)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.106956 -0.038992  0.002216  0.025141  0.076005 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        2.703652   0.024358  111.00   <2e-16 ***
bacteria$Period 0.164782   0.002647   62.25   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05106 on 16 degrees of freedom
Multiple R-squared:  0.9959,	Adjusted R-squared:  0.9956 
F-statistic:  3875 on 1 and 16 DF,  p-value: < 2.2e-16
```

By looking at the bottom ```F-statistic``` and associated ```p-value```, you see that this model is significant! That means that this particular bacteria does demonstrate exponential growth over time.  Looking at the ```Estimate``` column, you can see that for every one additional time b in, the bacteria has increased by 16%! 

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>If you would like to learn about exponential regression in Python, <a href="https://plot.ly/python/exponential-fits/"> check out Plotly. </a></p>
    </div>
</div>

---

## Summary

* Quadratic regression can be used to model data that shows a non-linear relationship.
* Exponential regression can be used to model phenomena that exhibit bounding growth or exponential decay.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Key Terms<a class="anchor" id="DS106L3_page_6"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Key Terms

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Quadratic Relationship</td>
        <td>A parabola, or U-shaped curve, in the data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Exponential Relationship</td>
        <td>A graph that continues upward or downward at a non-steady rate, gathering steam as it goes.</td>
    </tr>
</table>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Lesson 3 Hands-On<a class="anchor" id="DS106L3_page_7"></a>

[Back to Top](#DS106L3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">




# Nonlinear Regression Hands-On

This Hands-­On **will** be graded. The best way to become a data scientist is to practice!

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Do not submit your project until you have completed all requirements, as you will not be able to resubmit.</p>
    </div>
</div>

Data from **[the following spreadsheet](https://repo.exeterlms.com/documents/V2/DataScience/Modeling-Optimization/nonlinear.zip)** will be used throughout this hands on. You have two sets of X and Y variables here; graph and analyze both and determine what non-linear form they best follow.  These two sets of X and Ys might both be exponential relationships or quadratic relationships, or there might be one of each. The best way to figure it out is to try and fit both a quadratic function and an exponential function to each pair of variables, and then model each to determine which model is a better fit. 

To complete this hands on, you will need to:

1.  Create a scatterplot of the data with the Y variable on the vertical axis, and the X variable on the horizontal axis.
2.  Using eyeball analysis, make a guess about what type of model will work best for the dataset. You can add the best fit quadratic line as well to determine if it's a good fit.
3.  Using the chosen model from step 2, complete the steps to perform the analysis that were listed in the lesson.


<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Be sure to zip and submit your entire directory when finished!</p>
    </div>
</div>

