### Colab Activity 19.2: Models with User Feedback Value

**Expected Time = 30 minutes**


This activity takes a similar approach to using linear regression in filling in missing ratings.  Here, you assume the users have been asked to provide different `slick` and `lofi` scores when signing up for your streaming service.  The goal is to use these ratings across users to build regression models with `slick` and `lofi` as input and each artist as a target.

#### Problems

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)


In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression


#### The Data

Below, the data is loaded and displayed.  The `slick` and `lofi` columns contain user input values for their preferences accordingly.  

In [2]:
reviews = pd.read_csv('data/user_rated.csv', index_col = 0)

In [3]:
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 1

#### Michael Jackson Model

**10 Points**

Define `X` to contain only the `slick` and `lofi` columns of the `reviews` dataframe, with rows where the `Michael Jackson` column had missing values removed. Define `y`  as a new series y that contains the non-missing values from the `Michael Jackson` column in the `reviews` dataframe.

Instantiate a new linear regression model and fit it to `X` and `y`. Assign this model to the variable `mj_lr`.

Use the `predict` function on `mj_lr` to predict the `Michael Jackson` values for rows in reviews where `Michael Jackson` is NaN, using the fitted model and the `slick` and `lofi` columns. Assign this result to `mandy_predict`.

Update the `df_mandy` dataframe by assigning the predicted values of the `Michael Jackson` column for the `Mandy` row.

In [4]:
# For X, get rows where 'Michael Jackson' is NOT null, and select just 'slick' and 'lofi' columns
X = reviews[reviews['Michael Jackson'].notnull()][['slick', 'lofi']]

# For y, get just the 'Michael Jackson' values where they are not null
y = reviews.loc[reviews['Michael Jackson'].notnull(), 'Michael Jackson']

# Create and fit the linear regression model
mj_lr = LinearRegression()
mj_lr.fit(X, y)

# For prediction, get rows where 'Michael Jackson' IS null, and select 'slick' and 'lofi'
newx = reviews[reviews['Michael Jackson'].isnull()][['slick', 'lofi']]

# Predict the missing values
mandy_predict = mj_lr.predict(newx)

# Update the Michael Jackson column with the predicted value for Mandy
reviews.loc[reviews['Michael Jackson'].isnull(), 'Michael Jackson'] = mandy_predict

### ANSWER CHECK
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,4.0,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 2

#### Completing the Table


Complete the missing data for all users in the `reviews` dataframe using the same process as above.  Assign the completed review data to `df_full` below. 

HINT: Use a for loop to iterate over all columns. See solution set for Activity 19.1 for an example 

In [5]:
# Create a copy of the reviews dataframe to store the completed data
df_full = reviews.copy()

# Get all the artist columns (exclude 'slick' and 'lofi' which are user features)
artist_columns = [col for col in reviews.columns if col not in ['slick', 'lofi']]

# Loop through each artist column
for artist in artist_columns:
    # Skip if the artist column has no missing values
    if not reviews[artist].isnull().any():
        continue
    
    # Get rows where the current artist has non-null values for training
    X = reviews[reviews[artist].notnull()][['slick', 'lofi']]
    y = reviews.loc[reviews[artist].notnull(), artist]
    
    # Create and fit the regression model
    artist_model = LinearRegression()
    artist_model.fit(X, y)
    
    # Get rows where the current artist has null values for prediction
    newx = reviews[reviews[artist].isnull()][['slick', 'lofi']]
    
    # Predict the missing values
    predictions = artist_model.predict(newx)
    
    # Update the missing values in df_full
    df_full.loc[reviews[artist].isnull(), artist] = predictions

# Display the completed dataframe
reviews = df_full
### ANSWER CHECK
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,9.0,4.0,4.0,5,5
Mandy,4.0,9.0,10.0,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,5.0,3,6
Joan,3.0,6.0,9.0,4.0,9.0,5,5
Tino,1.0,1.0,6.8,9.0,5.0,1,8
