### Codio Activity 19.2: Models with User Feedback Values

**Expected Time = 30 minutes**

**Total Points = 20**

This activity takes a similar approach to using linear regression in filling in missing ratings.  Here, you assume the users have been asked to provide different `slick` and `lofi` scores when signing up for your streaming service.  The goal is to use these ratings across users to build regression models with `slick` and `lofi` as input and each artist as a target.

#### Problems

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)


In [4]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

#### The Data

Below, the data is loaded and displayed.  The `slick` and `lofi` columns contain user input values for their preferences accordingly.  

In [7]:
reviews = pd.read_csv('../data/user_rated.csv', index_col = 0)

In [9]:
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 1

#### Michael Jackson Model

**10 Points**

Define `X` to contain only the `slick` and `lofi` columns of the `reviews` dataframe, with rows where the `Michael Jackson` column had missing values removed. Define `y`  as a new series y that contains the non-missing values from the `Michael Jackson` column in the `reviews` dataframe.

Instantiate a new linear regression model and fit it to `X` and `y`. Assign this model to the variable `mj_lr`.

Use the `predict` function on `mj_lr` to predict the `Michael Jackson` values for rows in reviews where `Michael Jackson` is NaN, using the fitted model and the `slick` and `lofi` columns. Assign this result to `mandy_predict`.

Update the `df_mandy` dataframe by assigning the predicted values of the `Michael Jackson` column for the `Mandy` row.

In [11]:
### GRADED
X = reviews[['slick', 'lofi']].loc[~reviews['Michael Jackson'].isna()]
y = reviews['Michael Jackson'].dropna()
mj_lr = LinearRegression().fit(X, y)

X_missing = reviews[['slick', 'lofi']].loc[reviews['Michael Jackson'].isna()]
mandy_predict = mj_lr.predict(X_missing)

df_mandy = reviews.copy()
df_mandy.loc['Mandy', 'Michael Jackson'] = mandy_predict[0]

### ANSWER CHECK
df_mandy

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,4.0,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 2

#### Completing the Table

**10 Points**

Complete the missing data for all users in the `reviews` dataframe using the same process as above.  Assign the completed review data to `df_full` below. 

HINT: Use a for loop to iterate over all columns. See solution set for Activity 19.1 for an example 

In [13]:
### GRADED
# Define a copy of the original reviews DataFrame to update missing values
df_full = reviews.copy()

# Iterate over each column and fill missing values using a linear regression model
for column in reviews.columns:
    # Define X and y for rows where the current column has values
    X = reviews[['slick', 'lofi']].loc[~reviews[column].isna()]
    y = reviews[column].dropna()
    
    # Instantiate and fit the linear regression model if y has more than one unique value
    if len(y.unique()) > 1:
        model = LinearRegression()
        model.fit(X, y)
        
        # Predict missing values for rows in the current column
        X_missing = reviews[['slick', 'lofi']].loc[reviews[column].isna()]
        if not X_missing.empty:
            predictions = model.predict(X_missing)
            df_full.loc[reviews[column].isna(), column] = predictions

df_full

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,9.0,4.0,4.0,5,5
Mandy,4.0,9.0,10.0,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,5.0,3,6
Joan,3.0,6.0,9.0,4.0,9.0,5,5
Tino,1.0,1.0,6.8,9.0,5.0,1,8
