# Multiple Linear Regression

Linear regression is a technique that allows us to model the relationiship between an dependent variable _y_ and a set of independent variables denoted X. Simple Linear Regression is the case of X being more than one independent variable.

In other words a one-to-many relationships with y and X. 

In [40]:
# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.metrics import classification_report

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import math

For example, take a first-order model $ {y_i} = ({\beta_0} + {\beta_1}{x_{i1}} + {\beta_2}{x_{i2}}) + {e_i} $. This is going to be used to model after our dataset which will try to predict the breathing habits of baby birds that live in underground burrows.

- $ {y_i} $ is the total volume of air breathed per minute by baby bird ${i}$ 
- $ {x_{i1}} $ is the percentage of oxygen in the air baby bird ${i}$ breathes
- $ {x_{i2}} $ is the percentage of carbon dioxide in the air baby bird ${i}$ breathes.
- $ {e_i} $ is an independent error term that will follow a normal distribution and equal variance.


In [41]:
def multipleLinearReg(data):
    x_data = data.ix[:, 1:]
    y_data = data.ix[:, 0]
    
    model = LinearRegression()
    model.fit(x_data, y_data)
    
    return model

We can now apply our dataset of baby bird breathing measurements.

In [42]:
dataset = pd.DataFrame(np.loadtxt(fname="../data/babybirds.txt", skiprows=1, dtype="int16"))
dataset.columns = ["$ {y_i} $", "$ {x_{i1}} $", "$ {x_{i2}} $"]
multipleLinearReg(dataset)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)