# Other Models
As you have now experienced, linear regression is not the right model for every data set. Here are a few other common mdoels that might work better for your dataset/questions.

### Model Assumptions
Before using any model, remember to check the model assumptions. You should walk through these assumptions in your analysis. 

First we'll load a dataset, and then walk through some examples: 

In [1]:
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
import seaborn as sns

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
df_x = pd.DataFrame(diabetes.data,
                 columns=diabetes.feature_names)
df_y = pd.DataFrame(diabetes.target,
                 columns=["diabetes"])

# split into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, 
                                                   test_size=0.2, 
                                                   random_state=42)

In [2]:
df_x.head()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204
2,0.085299,0.05068,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.02593
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022692,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031991,-0.046641


 ## Multiple Linear Regression 
Multiple Linear Regression defines the relationship between one dependent variable and multiple independent variables. In our Boston dataset, we can examine the relationship between each of the fetures and housing price using multiple linear regression. 

Here's a great article on how it works under the hood: 
https://medium.com/data-py-blog/multiple-linear-regression-in-python-329e60cdc7ab

It works in a very similar fashion to simple linear regression, with the following differences: 
- instead of subsetting, we use each potentially correlated, independent feature
- Our error terms are sums of the difference metrics

In [3]:
# instantiate the model
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True,
                        n_jobs=4)

# train
fit = model.fit(x_train, y_train) # here we're just loading the entire dataframe instead of a subset

# make predictions
preds = model.predict(x_test)

## Logistic Regression
Logistic Regression is regression algorithm used to predict the probability of a categorical, dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X. 

Here is a great article on how logistic regression works: 
https://medium.com/greyatom/logistic-regression-89e496433063

We can also model logistic regression using sklearn: 

In [1]:
# load dataset
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)

In [12]:
# instantiate model
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
fit = lr.fit(X, y)

preds = clf.predict(X) # predict labels for x



In [13]:
# score model
clf.score(X, y)

0.96

You would want to first subset these into training and test sets as we did with linear regression, but the model syntax is similar. 