## Overview

In the previous objective, we briefly introduced the concept of a linear regression and the coefficients returned by the model. We missed one important part of the process: plotting our results! Let's do that now. 

### Linear Regression Coefficients

Remember that we are fitting a line to two variables: an independent variable (x axis) and the dependent variable (y axis).The form of the equation of this line is given by 

<script src="https://i.upmath.me/latex.js"></script>
<p>$$y = \beta_0 + \beta_1x$$</p>
<p>When we fit a line, we’re trying to find the coefficients $\beta_0$ and $\beta_1$. The parameter $\beta_0$ is the intercept (when $x$=0, the intercept is the $y$ value) and $\beta_1$ is the slope. The scikit-learn estimator process determines the values for $\beta_0$ and $\beta_1$ that describe a line that best &quot;fits&quot; the data. How the model actually calculates the best fit is something that we will cover in the upcoming modules.</p>


In the following example, we'll fit the same data set as in the previous objective (using the scikit-learn estimator) and then plot the results of our model.

## Follow Along

Using the steps outlined in the previous objective, we'll load our data and fit a linear regression.

In [1]:
# Import pandas and seaborn
import pandas as pd
import numpy as np
import seaborn as sns

# Load the data into a DataFrame
penguins = sns.load_dataset("penguins")

# Drop NaNs
penguins.dropna(inplace=True)

In [2]:
# Create the 2-D features matrix
X_penguins = penguins['flipper_length_mm']
X_penguins_2D = X_penguins[:, np.newaxis]

# Create the target array
y_penguins = penguins['body_mass_g']

In [3]:
# Import the estimator class
from sklearn.linear_model import LinearRegression

# Instantiate the class (with default parameters)
model = LinearRegression()

# Dispay the model parameters
model

LinearRegression()

In [4]:
# Display the shape of X_penguins
print('Original features matrix: ', X_penguins.shape)

# Add a new axis to create a column vector
X_penguins_2D = X_penguins[:, np.newaxis]
print(X_penguins_2D.shape)

Original features matrix:  (333,)
(333, 1)


In [5]:
# Fit the model
model.fit(X_penguins_2D, y_penguins)

LinearRegression()

#### Look at the coefficients

As reviewed above, the coefficients describe the slope and intercept. We access these coefficients with the following methods:

In [6]:
# Slope (also called the model coefficient)
print(model.coef_)

# Intercept
print(model.intercept_)

# In equation form
print(f'\nbody_mass_g = {model.coef_[0]} x flipper_length_mm + ({model.intercept_})')

[50.15326594]
-5872.092682842825

body_mass_g = 50.15326594224113 x flipper_length_mm + (-5872.092682842825)


We now have coefficients of a line! Let's plot this line along with our data. Even though we used seaborn earlier, we'll keep this plot simple and stick to using the basic matplotlib tools. First, we need to generate the line so there is something to plot.

In [7]:
# Generate the line from the model coefficients
x_line = np.linspace(170,240)
y_line = model.coef_*x_line + model.intercept_

In [8]:
# Import plotting libraries
import matplotlib.pyplot as plt

# Create the figure and axes objects
fig, ax = plt.subplots(1)
ax.scatter(x = X_penguins, y = y_penguins, label="Observed data")
ax.plot(x_line, y_line, color='g', label="linear regression model")
ax.set_xlabel('Penguin flipper length (mm)')
ax.set_ylabel('Penguin weight (g)')
ax.legend()

#plt.show()
plt.clf() 

<Figure size 432x288 with 0 Axes>

![mod1_obj3_penguin_reg_sklearn](https://raw.githubusercontent.com/LambdaSchool/data-science-canvas-images/main/unit_2/sprint_1/mod1_obj3_penguin_reg_sklearn.png)

## Challenge

In the original data set, there are other physical measurements on the penguins that we can perform a linear regression on and then plot the resulting best-fit line.

Follow these suggested steps:

* Load the data set and remove the NaN values.
* Choose two variables to explore and plot them to check the relationship visually.
* Create the feature matrix and target array.
* Import the `LinearRegression()` class and instantiate the model.
* Fit the model and then print out the coefficients
* Plot the model fit along with the data set; does it look like a nice fit to the data?

## Additional Resources

* [Glossary of Common Terms and API Elements](https://scikit-learn.org/stable/glossary.html#general-concepts)
* [sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linear%20regression#sklearn.linear_model.LinearRegression)