### Colab Activity 19.2: Models with User Feedback Value

**Expected Time = 30 minutes**


This activity takes a similar approach to using linear regression in filling in missing ratings.  Here, you assume the users have been asked to provide different `slick` and `lofi` scores when signing up for your streaming service.  The goal is to use these ratings across users to build regression models with `slick` and `lofi` as input and each artist as a target.

#### Problems

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)


In [3]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression


#### The Data

Below, the data is loaded and displayed.  The `slick` and `lofi` columns contain user input values for their preferences accordingly.  

In [4]:
reviews = pd.read_csv('data/user_rated.csv', index_col = 0)

In [5]:
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 1

#### Michael Jackson Model

**10 Points**

Define `X` to contain only the `slick` and `lofi` columns of the `reviews` dataframe, with rows where the `Michael Jackson` column had missing values removed. Define `y`  as a new series y that contains the non-missing values from the `Michael Jackson` column in the `reviews` dataframe.

Instantiate a new linear regression model and fit it to `X` and `y`. Assign this model to the variable `mj_lr`.

Use the `predict` function on `mj_lr` to predict the `Michael Jackson` values for rows in reviews where `Michael Jackson` is NaN, using the fitted model and the `slick` and `lofi` columns. Assign this result to `mandy_predict`.

Update the `df_mandy` dataframe by assigning the predicted values of the `Michael Jackson` column for the `Mandy` row.

In [6]:

X = ''
y = ''
mj_lr = ''
mandy_predict = ''

# Problem 1
# Get rows where Michael Jackson is not NaN
X = reviews[['slick', 'lofi']][reviews['Michael Jackson'].notna()]
y = reviews['Michael Jackson'].dropna()

# Fit linear regression model
mj_lr = LinearRegression()
mj_lr.fit(X, y)

# Predict missing values
mandy_predict = mj_lr.predict(reviews[['slick', 'lofi']][reviews['Michael Jackson'].isna()])

# Update reviews dataframe with predictions
reviews.loc[reviews['Michael Jackson'].isna(), 'Michael Jackson'] = mandy_predict

### ANSWER CHECK
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,4.0,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


[Back to top](#-Index)

### Problem 2

#### Completing the Table


Complete the missing data for all users in the `reviews` dataframe using the same process as above.  Assign the completed review data to `df_full` below. 

HINT: Use a for loop to iterate over all columns. See solution set for Activity 19.1 for an example 

In [8]:

X = ''
y = ''
mj_lr = ''
newx = ''

# Problem 2
# Get list of artist columns (excluding slick/lofi)
artist_cols = [col for col in reviews.columns if col not in ['slick', 'lofi']]

# Create copy of reviews dataframe
df_full = reviews.copy()

# Loop through each artist
for artist in artist_cols:
    # Check if there are any missing values for this artist
    if reviews[artist].isna().any():
        # Get rows where artist rating is not NaN
        X = reviews[['slick', 'lofi']][reviews[artist].notna()]
        y = reviews[artist].dropna()

        # Fit linear regression model
        lr = LinearRegression()
        lr.fit(X, y)

        # Predict missing values
        missing_pred = lr.predict(reviews[['slick', 'lofi']][reviews[artist].isna()])

        # Update dataframe with predictions
        df_full.loc[reviews[artist].isna(), artist] = missing_pred


### ANSWER CHECK
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B,slick,lofi
Alfred,3.0,4.0,,4.0,4.0,5,5
Mandy,4.0,9.0,,3.0,8.0,7,4
Lenny,2.0,5.0,8.0,9.0,,3,6
Joan,3.0,,9.0,4.0,9.0,5,5
Tino,1.0,1.0,,9.0,5.0,1,8


## Notebook Summary: User Preference-Based Rating Prediction

This notebook demonstrates how to use linear regression to predict missing user ratings based on user preference scores. The exercise uses a music streaming service dataset where users have provided both artist ratings and preference scores for "slick" and "lofi" characteristics.

### Key Components

1. **Data Structure**
   - User ratings for various artists
   - User preference scores (slick/lofi)
   - Partially complete rating matrix with missing values

2. **Problem Solving Approach**
   - Single artist model (Problem 1)
   - Multi-artist automated solution (Problem 2)
   - Linear regression using preference scores as features

### Key Takeaways

1. **Missing Value Imputation**
   - User preferences can serve as predictive features
   - Linear regression provides a straightforward approach to rating prediction
   - Systematic handling of missing values is crucial

2. **Model Application**
   - Single models can be generalized to multiple prediction tasks
   - Automated approaches can handle multiple artists efficiently
   - Input validation and error handling are important for robust solutions

3. **Practical Implementation**
   - Using pandas for data manipulation
   - Implementing scikit-learn for regression models
   - Scaling from single to multiple predictions

This exercise demonstrates a practical application of machine learning in recommendation systems, showing how user preference data can be leveraged to fill in missing ratings.