# Instructions

Please read through all cells in this notebook carefully.

**⭐Tasks that you need to do to complete the study are marked by stars.**

**⭐You can run cells using the run arrow (<span style="font-family: monospace">▶</span>) in the top toolbar or with Run->Run Selected Cell.**

# Background questions

**⭐ In the code block below, please answer the questions.**

In [None]:
# How many months/years of Python coding experience do you have?
# ANSWER HERE:

# Do you have any experience with machine learning (ML) programming? If yes, how many months/years?
# ANSWER HERE:

# Please record the current time (hours and minutes) that you are answering this question.
# This will be used only to measure the time it took to complete the study.
# ANSWER HERE:

## Introduction

In this task, we will ask you to generate *feature contribution explanations* of ML model predictions using the Pyreal python library.

A **feature contribution explanation** shows how much each feature contributed to an ML model's prediction. Contributions are negative if a given feature value decreased the model prediction, and positive if it increased the model prediction.

For example, consider a model that predicts the sale price of a house. A feature contribution explanation may tell you that the price increased by \\$8,000 because the house is in a good neighborhood, and decreased by \\$1,000 because it has two bedrooms. This information could be shown as a table with the following data:

| Feature | Value | Contribution |
|---|---|---|
| Neighborhood Quality | Good | 8000 |
| Bedrooms | 2 | -1000 |

## How to Use Pyreal

Pyreal is python library that generates ML model explanations such as feature contribution explanations.

To use Pyreal, you must first make a RealApp object, which takes in data, a trained ML model, and a list of transformers. We will provide any data, model, or transformers you need to complete the tasks in this study.

You can then use the RealApp to generate explanations on the input data of interest, and then use Pyreal's **visualize** module to make graphs of the explanations.

**⭐ Run the sample code below for an example of how to use Pyreal. You can use/copy this code throughout the study as needed.**

For more information on how to use Pyreal, you can consult the [documentation](https://dtail.gitbook.io/pyreal). However, the sample code below should be sufficient to complete this user study.


In [None]:
import joblib
import pandas as pd
from pyreal import RealApp
from pyreal.visualize import feature_bar_plot

DIR_BIKE = "../datasets/bike-sharing/"

# Required components - these will be given to you ------------------------------------
# Training dataset
data_train = pd.read_csv(DIR_BIKE + "bike-sharing_train-data.csv")
y_train = data_train["cnt"]
X_train = data_train.drop(columns="cnt")

# Pretrained prediction pipeline (transforms data and makes predictions using a machine learning (ML) model)
pipeline = joblib.load(DIR_BIKE + "bike-sharing_pipeline.pkl")

# Input data to explain
days_of_interest = pd.read_csv(DIR_BIKE + "bike-sharing_input.csv")
# -------------------------------------------------------------------------------------

# Use this sample code for the tasks below -------------------------------------------
# Create new explanation application object from the sklearn pipeline
realapp = RealApp.from_sklearn(pipeline, X_train=X_train, y_train=y_train)

# Produce a feature contribution explanations
exp = realapp.produce_feature_contributions(days_of_interest)

# Visualize the explanation for the first day
feature_bar_plot(exp[0], num_features=8, select_by="absolute")

# Visualize the explanation for the second day
feature_bar_plot(exp[1], num_features=8, select_by="absolute")
# -------------------------------------------------------------------------------------

**⭐ Run the cell below to initialize data access filepaths.**


In [None]:
DIR_BASE = "../datasets/"
DIR_CALIFORNIA = DIR_BASE + "california-housing/"
DIR_CHURN = DIR_BASE + "cell-phone-churn/"
DIR_STUDENT = DIR_BASE + "student-performance/"

## Dataset 1: Iranian Churn data

### Preparing data and model

We will be asking you to generate interpretable explanations on multiple datasets. We will begin with the [Iranian Churn Dataset](https://archive.ics.uci.edu/dataset/563/iranian+churn+dataset).

In this dataset, each row represents a customer of a cell phone company. The target variable we are predicting is whether that customer will churn (ie., cancel their subscription with the company) after one year of service.

**⭐Run the following cell to load the training data and the pretrained prediction pipeline.**

In [None]:
import pandas as pd
import joblib
import urllib

# Load in data
data_train = pd.read_csv(DIR_CHURN + "cell-phone-churn_train-data.csv")
y_train = data_train["Churn"]
X_train = data_train.drop(columns="Churn")

# Load in sklearn pipeline
pipeline = joblib.load(DIR_CHURN + "cell-phone-churn_pipeline.pkl")

### Generating explanations

In the following cell, we load in a few dataset customers of interest.

**⭐Generate one feature contribution explanation per customer, explaining the model's prediction on these customers. The explanations should be presented as bar plots showing the 8 features that contribute the most (absolute value).**

If you have done so correctly, you should end up with 3 bar plot explanations, similar to the ones generated by the sample code.


In [None]:
customers_of_interest = pd.read_csv(DIR_CHURN + "cell-phone-churn_input.csv")

# YOUR CODE HERE
# FOLLOW THE EXAMPLE OF THE SAMPLE CODE ABOVE
# YOU SHOULD GENERATE 3 BAR PLOT GRAPHS TOTAL

## California Housing Dataset

Next, we will ask you to make interpretable explanations on the [California Housing Dataset]()

In this dataset, each row refers to a block of houses in California, and the target variable to predict is the median price of houses in this block.

**⭐Run the following cell to load the training data, the pretrained ML model, and a list of data transformers for this dataset.**

In [None]:
# Load in data
data_train = pd.read_csv(DIR_CALIFORNIA + "california-housing_train-data.csv")
y_train = data_train["MedianPrice"]
X_train = data_train.drop(columns="MedianPrice")

# Load in sklearn pipeline
pipeline = joblib.load(DIR_CALIFORNIA + "california-housing_pipeline.pkl")

### Generating explanations

In the following cell, we load in a few dataset housing blocks of interest.

**⭐Generate one feature contribution explanation per block, explaining the model's prediction on these blocks. The explanations should be presented as bar plots showing the 8 features that contribute the most (absolute value).**

If you have done so correctly, you should end up with 3 bar plot explanations.

In [None]:
blocks_of_interest = pd.read_csv(DIR_CALIFORNIA + "california-housing_input.csv")

# YOUR CODE HERE
# FOLLOW THE EXAMPLE OF THE SAMPLE CODE ABOVE
# YOU SHOULD GENERATE 3 BAR PLOT GRAPHS TOTAL

## Student Performance Dataset

Next, we will ask you to make interpretable explanations on the [Student Performance Dataset]()

In this dataset, each row refers to a student at a school in Portugal, and the target variable to predict is whether they will pass or fail a class.

**⭐Run the following cell to load the training data, the pretrained ML model, and a list of data transformers for this dataset.**

In [None]:
# Load in data
data_train = pd.read_csv(DIR_STUDENT + "student-performance_train-data.csv")
y_train = data_train["Pass"]
X_train = data_train.drop(columns="Pass")

# Load in sklearn pipeline
pipeline = joblib.load(DIR_STUDENT + "student-performance_pipeline.pkl")

### Generating explanations

In the following cell, we load in a few dataset students of interest.

**⭐Generate one feature contribution explanation per student, explaining the model's prediction on these students. The explanations should be presented as bar plots showing the 8 features that contribute the most (absolute value).**

If you have done so correctly, you should end up with 3 bar plot explanations.


In [None]:
students_of_interest = pd.read_csv(DIR_STUDENT + "student-performance_input.csv")

# YOUR CODE HERE
# FOLLOW THE EXAMPLE OF THE SAMPLE CODE ABOVE
# YOU SHOULD GENERATE 3 BAR PLOT GRAPHS TOTAL

**⭐Please answer the final retrospective questions in the block below:**

In [None]:
# Please record the current time (hours and minutes) that you are answering this question.
# This will be used only to measure the time it took to complete the study.
# ANSWER HERE: 

# How easy was it to generate explanations using Pyreal? Please answer on a scale from 1 to 5, where 1 means very difficult and 5 means very easy, and briefly explain your rating.
# ANSWER HERE: (1-5, and briefly explain)

# If you found yourself in a situation where you needed to get explanations of ML model predictions, how likely would you be to use Pyreal? Please answer on a scale from 1 to 5, where 1 means very unlikely and 5 means very likely, and briefly explain your rating.
# ANSWER HERE: (1-5, and briefly explain)

**⭐Thank you for completing the user study! To submit your work, please use the Download button in the top toolbar (or File->Download) to save the notebook, and then submit the downloaded notebook on the original survey page.**