<img src="../../Pics/MLSb-T.png" width="160">
<br><br>
<center><u><H1>Seaborn-Regression</H1></u></center>

In [None]:
import seaborn as sns
%matplotlib inline
import pandas as pd

## <u>Linear Regression:</u>

In [None]:
# Load the example tips dataset
tips = sns.load_dataset("tips")
tips.head()

#### lmplot and regplot functions:

In [None]:
sns.lmplot(x="total_bill", y="tip", data=tips)

In [None]:
sns.regplot(x=tips["total_bill"], y=tips["tip"], data=tips)

#### The main difference to know about is that regplot() accepts the x and y variables in a variety of formats including simple numpy arrays, pandas Series objects, or as references to variables in a pandas DataFrame object passed to data. In contrast, lmplot() has data as a required parameter and the x and y variables must be specified as strings. 

## Multiple Linear Regression:

In [None]:
sns.set(style="ticks", context="talk")

# Plot tip as a function of toal bill across days
g = sns.lmplot(x="total_bill", y="tip", hue="day", data=tips,
               palette="rainbow", size=5)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Total bill ($)", "Tip ($)")

## Polynomial Regression:

In [None]:
anscombe = sns.load_dataset("anscombe")
anscombe.head()

In [None]:
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
           order=2, ci=None, scatter_kws={"s": 100},size=4)

### Residplot: 
Can be a useful tool for checking whether the simple regression model is appropriate for a dataset. It fits and removes a simple linear regression and then plots the residual values for each observation. Ideally, these values should be randomly scattered around y = 0:

In [None]:
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
              scatter_kws={"s": 80})

## Difference regplot() and lmplot() on categorical data

While regplot() always shows a single relationsihp, lmplot() combines regplot() with FacetGrid to provide an easy interface to show a linear regression on “faceted” plots that allow you to explore interactions with up to three additional categorical variables

In [None]:
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips,size=4)

In [None]:
sns.lmplot(x="total_bill", y="tip", hue="smoker",
           col="time", row="sex", data=tips, size=4)

## <u>Logistic Regression:</u>

#### On Logistic Regression the outpout is a discrete value that could be only 0 or 1.

In [None]:
# Load the example titanic dataset
df = sns.load_dataset("titanic")
df.head()

#### On this case we can set the survived variable at 0 for not survived and 1 for survived, the same for sex: male = 0 and female = 1.

In [None]:
sns.set(style="darkgrid")
# Make a custom palette with gendered colors
pal = dict(male="#5882FA", female="#F5A9F2")
# Show the survival proability as a function of age and sex
g = sns.lmplot(x="age", y="survived", col="sex", hue="sex", data=df,
               palette=pal, y_jitter=.02, logistic=True, size=4)
g.set(xlim=(0, 80), ylim=(-.05, 1.05))

#### You can set the palette color on <b>HTML color chart<b>. For reference you can visit: http://html-color-codes.info/

## Controlling the size and shape of the plot:

#### Plots made by regplot() and lmplot() look the same but on axes that have a different size and shape. This is because func:regplot is an “axes-level” function draws onto a specific axes. This means that you can make mutli-panel figures yourself and control exactly where the the regression plot goes. If no axes is provided, it simply uses the “currently active” axes, which is why the default plot has the same size and shape as most other matplotlib functions. To control the size, you need to create a figure object yourself. Lets see some examples:

In [None]:
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(5, 5))
sns.regplot(x="total_bill", y="tip", data=tips, ax=ax)

In [None]:
sns.lmplot(x="total_bill", y="tip", col="day", data=tips,
           col_wrap=3, size=3)

## JointPlot:
jointplot() can use regplot() to show the linear regression fit on the joint axes by passing kind="reg"

In [None]:
sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg")

## PairPlot:
Using the pairplot() function with kind="reg" combines regplot() and PairGrid to show the linear relationship between variables in a dataset. Note how this is different from lmplot(). In the figure below, the two axes don’t show the same relationship conditioned on two levels of a third variable; rather, PairGrid() is used to show multiple relationships between different pairings of the variables in a dataset

In [None]:
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
             size=4, aspect=.8, kind="reg")

### Adding a categorical variable

In [None]:
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
             hue="smoker", size=4, aspect=.8, kind="reg")

## Reference:

http://seaborn.pydata.org/generated/seaborn.regplot.html

https://seaborn.pydata.org/tutorial/regression.html