# Linear Regression: A Demonstration

## Getting the Data

In [None]:
import seaborn as sns

In [None]:
penguins = sns.load_dataset('penguins')
penguins.sample(5)

## Cleaning the Data

We'll just eliminate any rows with missing data.

In [None]:
penguins.isna().sum()

In [None]:
penguins_clean = penguins.dropna()

In [None]:
penguins_clean.isna().sum()

## Correlations Everywhere!

Let's examine the continuous variables. We can plot every one against every other one with one line of code!

In [None]:
cont_vars = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']

sns.pairplot(penguins_clean[cont_vars]);

## Penguin Flipper Length and Penguin Body Mass

Let's see if we can use the first to predict the second!

In [None]:
flipper_length = penguins_clean['flipper_length_mm']
body_mass = penguins_clean['body_mass_g']

In [None]:
sns.regression.regplot(x=flipper_length, y=body_mass, data=penguins);

## Exact Calculations

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lr = LinearRegression()

X = flipper_length.values.reshape(-1, 1)
y = body_mass
lr.fit(X, y)

In [None]:
print(f"""
Our best-fit line is: Body Mass ~ {round(lr.coef_[0], 2)} * Flipper Length + {round(lr.intercept_, 2)}
""")

## Conclusion

Is this best-fit line perfectly accurate? No.
Is this nevertheless a useful model of penguin body mass? Absolutely!