###  Codio Activity 19.1: Regression Models for Predictions

**Expected Time = 60 minutes**

**Total Points = 50**

This activity will use regression models to provide scores for unseen content (albums).  Using these scores, you can make recommendations for unheard albums to users. You are also given similar information as to that from the lecture in terms of *lofi* and *slick* scores for each artist.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [1]:
import os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

from sklearn.linear_model import LinearRegression

#### Our Data

This example uses a synthetic dataset of reviews from five individuals and five albums.  The album covers and artists are displayed below and the dataset is loaded and displayed below.  Two additional columns `lofi` and `slick` are included to rate the nature of the music. 

![](images/covers.png)

In [2]:
reviews = pd.read_csv("./data/sample_reviews.csv", index_col=0)

In [13]:
reviews.head()

Unnamed: 0,Alfred,Mandy,Lenny,Joan,Tino,slick,lofi
Michael Jackson,3.0,,2.0,3.0,1.0,8,2
Clint Black,4.0,9.0,5.0,,1.0,8,2
Dropdead,,,8.0,9.0,,2,9
Anti-Cimex,4.0,3.0,9.0,4.0,9.0,2,10
Cardi B,4.0,8.0,,9.0,5.0,9,3


[Back to top](#-Index)

### Problem 1

#### Considering Alfred

**10 Points**

To begin, create `X` and `y` based on `Alfred`.  This means to drop the row for **Dropdead** (using `dropna()`), and build a model using all other rows with `slick` and `lofi` scores.  Assign the input as `X` and target as `y`, name your model `alfred_lr` and make prediction for Alfred as `alfred_dd_predict` below.  

In [14]:
### GRADED
X = reviews[["Alfred", "slick", "lofi"]].dropna()[["slick", "lofi"]]
y = reviews[["Alfred", "slick", "lofi"]].dropna()["Alfred"]
alfred_lr = LinearRegression().fit(X, y)
alfred_dd_predict = alfred_lr.predict(reviews[["slick", "lofi"]].loc[["Dropdead"], :])

### ANSWER CHECK
alfred_dd_predict

array([3.75])

[Back to top](#-Index)

### Problem 2

#### User Vector for Alfred

**10 Points**

Use your model for Alfred to construct his user vector based on the coefficients of the model. What does this tell you about Alfred's preference for slick and lofi?  Assign his user vector as a numpy array to `alfred_vector` below.

HINT: 'user vector' is simply another name for the coefficients of the linear regressions model.

In [15]:
### GRADED
alfred_vector = alfred_lr.coef_

### ANSWER CHECK
pd.DataFrame(alfred_vector.reshape(1, 2), columns=["slick", "lofi"], index=["Alfred"])

Unnamed: 0,slick,lofi
Alfred,0.25,0.25


[Back to top](#-Index)

### Problem 3

#### Considering Tino

**10 Points**

Repeat the process above for Tino.  Use Tino's user vector to predict their rating of **Dropdead**.  Assign the prediction to `tino_dd_predict` as a numpy array below.

In [3]:
### GRADED
X = reviews[["Tino", "slick", "lofi"]].dropna()[["slick", "lofi"]]
y = reviews[["Tino", "slick", "lofi"]].dropna()["Tino"]
tino_lr = LinearRegression().fit(X, y)
tino_dd_predict = tino_lr.predict(reviews[["slick", "lofi"]].loc[["Dropdead"], :])

### ANSWER CHECK
tino_dd_predict

array([6.71428571])

[Back to top](#-Index)

### Problem 4

#### Tino's user vector

**10 Points**

Now, create a user vector for Tino and assign as a numpy array to `tino_vector` below.  What does this say about their preference for *slick* versus *lofi*?  

In [4]:
### GRADED
tino_vector = tino_lr.coef_

### ANSWER CHECK
pd.DataFrame(tino_vector.reshape(1, 2), columns=["slick", "lofi"], index=["Tino"])

Unnamed: 0,slick,lofi
Tino,1.714286,2.285714


[Back to top](#-Index)

### Problem 5

#### Completing the Table

**10 Points**

Consider writing a function to loop over each column and perform the prediction process using the same columns of `slick` and `lofi` as inputs.  Create a DataFrame called `reviews_df_full` and complete the scores for each individual. 

In [18]:
reviews

Unnamed: 0,Alfred,Mandy,Lenny,Joan,Tino,slick,lofi
Michael Jackson,3.0,,2.0,3.0,1.0,8,2
Clint Black,4.0,9.0,5.0,,1.0,8,2
Dropdead,,,8.0,9.0,,2,9
Anti-Cimex,4.0,3.0,9.0,4.0,9.0,2,10
Cardi B,4.0,8.0,,9.0,5.0,9,3


In [5]:
### GRADED
reviews_df_full = reviews.copy(deep=True)
features = ["slick", "lofi"]

users = ["Alfred", "Mandy", "Lenny", "Joan", "Tino"]
for user in users:
    X = reviews[[user] + features].dropna()[features]
    y = reviews[[user] + features].dropna()[user]
    model = LinearRegression().fit(X, y)
    missing_vals = list(reviews.index[reviews[user].isnull()])
    for artist in missing_vals:
        yhat = model.predict(reviews[features].loc[[artist], :])
        reviews_df_full.loc[[artist], user] = yhat[0]

### ANSWER CHECK
reviews_df_full

Unnamed: 0,Alfred,Mandy,Lenny,Joan,Tino,slick,lofi
Michael Jackson,3.0,9.0,2.0,3.0,1.0,8,2
Clint Black,4.0,9.0,5.0,4.664444,1.0,8,2
Dropdead,3.75,3.857143,8.0,9.0,6.714286,2,9
Anti-Cimex,4.0,3.0,9.0,4.0,9.0,2,10
Cardi B,4.0,8.0,4.916667,9.0,5.0,9,3
