# Model Development

## What is a Model?

* A model can be thought as a mathematical equation used to predict a value given one or more other values.
* Relating one or more inde pendent variables to dependent variables.

### Example

![model_definition](./model.svg)

* Usually the more relevant data you have the more accuarate your model is.

![model_accuarate](./model_accuarate.svg)

* To understand why more data is important consider the following sitation.
* Pink cars sell for singinficantly less.

![model_diff](./model_diff.svg)

# Linear Regression 

* Linear regression will refer to one independent variable to make a prediction.
* Multiple linear regression will refer to multiple independent-variables to make a prediction.

![linear regresion](./liearRegression.svg)

# Simple Linear Regression

Simple linear regression or **SLR** is method to help us understand the reation between two variables. 
1. The predictor (independet) variable $X$
2. The target (dependent) variable $y$
### $$y={ b }_{ 0 }+{ b }_{ 1 }x$$
* ${ b }_{ 0 }$:The Intercept
* ${ b }_{ 1 }$:The Slope

# Simple Linear Regression:Prediction

* If we assume there is a linear relationship between these variables, it's is possible to use this relationship to formulate a model to determine the price of the car.
$$y=38423-821x$$
![simplelinearregression](./modelevaluatedSLR.svg)
$$y=38423-821(20)$$
$$y=22000$$


# Simple Linear Regression: FIT

* In order to determine the line, we take data points from our dataset marked in red and then we use these training points to fit our model. The result of the training points are the parameters.

![fitregression](./fitmodelprocess.svg)

* We usually store the data points into dataframe or numpy arrays

$$x=\begin{bmatrix} 0 \\ 20 \\ 40 \end{bmatrix}\quad y=\begin{bmatrix} 38243 \\ 22003 \\ 5583 \end{bmatrix}$$

* The value we would like to predict i called target that we store in the array $Y$

* We store the independet variable in the dataframe or array $X$


# Simple Linear Regression: NOISE

* In many cases, many factors influence how much people pay for a car. For example make or how old the car is. In this model, this uncertainty is taken into a account by assuming a small random value is added to the point on the line. This is called noise. 

![noiseNormal](./noise.svg)

* Vertical axis shows the value added and the horizontal axis illustrates the probability that the value will be added

# Simple Linear Regression Steps.

1. We have a set of training points.
2. We use these training points to fit or train the model and get parameters
3. we then use these parameters in the model.
4. We get the model 
5. we can use these model to predict values that we haven't seen.

### Example:

* We have no car with 20 highway miles per gallon, so we can use our model to make a prediction for the price of the car. "Don't forget that model are not always correct"

* If the linear assumption is correct this error is due to the noise.

![linear regression process](./linearRegProc2.svg)

# Fitting a Simple Linear Model Stimator

* X: Predictor variable.
* Y: Target Variable.
    1. Import Linear-Model from scikit-learn.
     ```python
        from sklearn.linear_model import LinearRegression
     ```
    2. Create a linear regression object using the constructor.
        ```python
        lm = LinearRegression()
        ```
    3. We define the predictor variable and target variable.
        ```python
            X=df[["highway-mpg"]]
            Y=df[["price"]]
        ```
    4. Then we use lm.fit(X,Y) to fit the model, and find the parameters ${ b }_{ 0 }$ and ${ b }_{ 1 }$
    5. We can obtain a prediction.
        ```python
            Yhat = lm.predict(X)
        ```
    6. We can view the intercept ${ b }_{ 0 }$ 
        ```python
            lm.intercept
        ```
    7. We can also view the slope ${ b }_{ 1 }$
        ```python
            lm.coef
        ```
    8. The relationship between price and **HIGHWAY MPG"" 
        $$price = 38423.31 - 821.73*highway-mpg $$

# Multiple Linear Regression (MLR)

* This method is used to explain the relationship between:
    * One continues target(Y) variable.
    * Two or more predictor (X) variables.
    $$\widehat { Y } ={ b }_{ 0 }+{ b }_{ 1 }{ x }_{ 1 }+{ b }_{ 2 }{ x }_{ 2 }+{ b }_{ 3 }{ x }_{ 3 }+{ b }_{ 4 }{ x }_{ 4 }$$
    
    * ${ b }_{ 0 }$: intercept(X=0)
    * ${ b }_{ 1 }$: The coefficient or parameter of ${ x }_{ 1 }$
    * ${ b }_{ 2 }$: The coefficient or parameter of ${ x }_{ 2 }$ and so on

### Example
In this example of multivarible equation can be visualized on a 2d plate  $\hat { y } =1+2{ x }_{ 1 }+3{ x }_{ 2 }$ the variables ${ x }_{ 1 }$ and ${ x }_{ 2 }$ 


![multiple3d](./multime3d.svg)

# Fitting a Multiple Linear Model Stimator

1. We can extract the 4 predictor and store them in the variable z.
    ```python
       z = df[["horsepower", "curb-weight", "engine-size", "highway-mpg"]]
    ```
2. Then train the model as before.
    ```python
        lm.fit(z,df[["price"]])
    ```
3. we can also obtain a prediction.
    ```python
        lm.predict(z)
    ```
    
# Multiple-Estimated Linear Model.

1. Find the intercept (${ b }_{ 0 }$).
    ```python
    -15678.14262
    ```
2. Find the coeficients (${ b }_{ 0 }$,${ b }_{ 1 }$,${ b }_{ 2 }$,${ b }_{ 3 }$,${ b }_{ 4 }$).
     ```python
        lm.coef_
        array[52.2, 4.61, 81.5, 33.53] not the exact values
    ```
3. The estimated linear model.
    
    $$price=-15678.74+52.66*horsepower+4.70*curb-weigh+81.96*enginesize+33.58*highwaympg$$
   


# Regression Plot.

### why use regression plot?

* it gives us a good estimate of:
    1. The relationship between two variables.
    2. The strength of the correlation,
    3. The direction of the relationship (Positive or negative).

Regression Plot shows us a combination of:

* The scatterplot: Where each point represents a different $y$
* The fitted linear regression line ($\hat { y } $)

![regressionplot](./plotReg.svg)


# Regression Plot: Python.

```python
import seaborn as sns
```

```python
sns.regplot(x="highway-mpg", y='price', data=df)
plt.ylim(0,)
```

# Residual Plot.

The residual plot represent the error between the actual value, examining the predicted value and actual value we see a difference. We obtain that value by subtracting the predicted value and the actual target value. We then plot that value on the vertical axis with the dependent variable as the horizontal axis. We expect to see the resulsts to have zero mean, distributed evenly around the **x** axis with similar variance.

![residualPlot](./residualPlot.svg)

# Residual Plot: Case 1.

![residualplotcase1](./residualplotcase1.png)


* Randomly spread out around x-axis then a linear model is appropiate.

