# Using Custom Transformers in Pyreal

In this tutorial, we will be using Pyreal to investigate the California Housing Dataset. 

In order to generate useful explanations, we will making a few custom transformers, with functionality specific to this use-case.

## Data Loading

This dataset includes 9 predictor variables, and one target variable. Each row in the dataset refers to a block of houses in California. The target variable is the median house value in this block.

**Run the cell below to load in the California Housing Dataset.**

In [9]:
import matplotlib.pyplot as plt
from urllib.parse import urljoin
import pandas as pd

AWS_BASE_URL = 'https://pyreal-data.s3.amazonaws.com/'
data_url = urljoin(AWS_BASE_URL, "usability_study/california.csv")
data = pd.read_csv(data_url)

city_url = urljoin(AWS_BASE_URL, "usability_study/cal_cities_lat_long.csv")
cities = pd.read_csv(city_url)

data = data[data["median_house_value"] < 500000]

data = data.sample(5000, random_state=100)  # we will work with a truncated dataset to avoid memory crashes

X = data.drop("median_house_value", axis=1)
y = data["median_house_value"]

data.sample(10)

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,ocean_proximity,median_house_value
6026,-117.74,34.07,42,2504,553.0,1550,509,3.0294,INLAND,135700
665,-122.16,37.7,36,2239,391.0,1203,379,5.0043,NEAR BAY,190400
19863,-119.32,36.36,18,2060,383.0,1348,397,3.4312,INLAND,68400
9102,-117.9,34.53,8,3484,647.0,2169,619,3.9766,INLAND,135800
16966,-122.31,37.55,27,3931,933.0,1877,851,3.9722,NEAR OCEAN,354100
2529,-122.22,39.51,17,1201,268.0,555,277,2.1,INLAND,66900
5232,-118.24,33.94,42,380,106.0,411,100,0.9705,<1H OCEAN,90000
4877,-118.25,34.02,50,180,89.0,356,76,2.1944,<1H OCEAN,158300
1112,-121.59,39.79,20,743,171.0,395,168,1.625,INLAND,88300
9325,-122.52,37.96,35,2012,346.0,818,352,5.2818,NEAR BAY,331000


We will be working with a pretrained model to predict the median house values for each block. 

**Run the code below to load in the model.** 

In [10]:
import lightgbm
import matplotlib.pyplot as plt
import numpy as np
import requests

model_url = urljoin(AWS_BASE_URL, "usability_study/model.model")
r = requests.get(model_url, allow_redirects=True)
open('model.model', 'wb').write(r.content)

model = lightgbm.Booster(model_file='model.model')

## Custom Transformers Basics

Pyreal generates ML explanations using Explainer objects, which take in data Transformers through their `transformer` parameter. These Transformers take in three flags in their initialization, two of which we will use in this tutorial:

- Transformers with a `model=True` flag take data from the original feature space (as we loaded in above) to the feature space used by the model.
- Transformers with an `interpret=True` flag take data from the original feature space to a feature space more readable or interpretable by humans. 

For information about the third transformer flag (`algorithm`), please see the [advanced_explanation_generation](pyreal\tutorials\advanced_explanation_generation.ipynb) tutorial.

The Pyreal `transformer` module has some common transformers available for used, but some use-cases may require you to write your own transformer. This can be done by extending the base `Transformer` class.

### Transformer Functions
When defining a custom transformer, you will have to consider three types of functions:
- `data_transform` (*required*): A single function that transforms the data from space A to B. 
- `inverse_transform_explanation_XXX` (*optional*): Functions that transform an explanation from space B to A. This type of function only needs to be considered if the transformer is used by the explanation algorithm and leads to the data being more obfuscated/less interpretable (ie, will have the `algorithm` flag set to True and the `interpret` flag set to False)
- `transform_explanation_XXX` (*optional*): Functions that transform an explanation from space A to B. This type of function only needs to be considered if the transformer is used to make data and explanations more interpretable than the algorithm-ready state (ie, will have the `algorithm` flag set to False and the `interpret` flag set to True)

The `transform_explantion` type functions are written per Explanation output type. For this tutorial, we will consider additive local feature contribution and additive global feature importance explanations. At the end of this tutorial, we will consider some special cases.

### Custom Transformer Example 1: Per-Household Averager

Let's take a look at one possible custom transformer we can add, which will average the values of certain features per household. We will follow these steps to write the function:

1. Define the transformer `__init__()` method, using a `super()` call for the parent `Transformer` class. The function can take optional arguments to configure the transformer. We will take in a list of columns to average.
2. Define the `data_transform()` function, which takes an input DataFrame `x` and returns `x` after undergoing the transformation. In this case, we simply divide the selected columns by the households feature.
3. Consider which flags we expect to be used with this transformer. In this case, our transformation will be used for the explanation algorithm, but also make the data more interpretable, so our flags are `interpret=True` and `algorithm=True`. Therefore, we do not need to define any explanation transform functions for this use case. 

**Run the cell below to define the PerHouseholdAverager** 


In [11]:
from pyreal.transformers import OneHotEncoder, MultiTypeImputer, Transformer, fit_transformers, run_transformers
from pyreal.types.explanations.dataframe import AdditiveFeatureContributionExplanation

class PerHouseholdAverager(Transformer):
    def __init__(self, columns, **kwargs):
        # columns: the columns to average. Must be list of strings (column names)
        self.columns = columns
        super().__init__(**kwargs)

    def data_transform(self, x):
        # Transform the data by adding a new column from total_[column] called 
        #   average_[column]. This feature represents the average value of 
        #   [column] per household. 
        for column in self.columns:
            name = column.replace("total", "average")
            x[name] = x[column] / x["households"]
        return x

### Custom Transformer Example 2: City Converter

Now, let's take a look at another custom transformer, this one requiring an explanation transform. This transformer will be used to convert latitude/longitude values into city areas, based on the closest city to the given coordinates. 

Again, we will follow the following steps:

1. Define the transformer `__init__()` method, using a `super()` call for the parent `Transformer` class. The function can take optional arguments to configure the transformer. In this case, we will not take in any parameters.
2. Define the `data_transform()` function, which takes an input DataFrame `x` and returns `x` after undergoing the transformation. In this case, we convert long/lat values to nearby cities. 
3. Consider which flags we expect to be used with this transformer. In this case, our transformation will be used to make the data more interpretable, but will not be fed into the model, so our flags are `interpret=True` and `algorithm=model=False`. Therefore, the `transform_explanation` functions will be called, but not the `inverse_transform_explanation` functions.
4. Consider the explanation output types you are interested in. For now, let's consider 


##  Interpretable Transformers

You may prefer the explanations given in a more human-readable format. Here, we provide a few other transformers that may make explanations easier to use. Keep in mind the model will **not** accept data that has had these transformations, and therefore they should be flagged `model=False`. Rather, set `interpret=True` to mark these as transformers that *improve interpretability*

Additionally, remember that we want to the final explanation to include the `average_rooms` and `average_bedrooms` features, so the `PerHouseholdAverager` transformer should also be flagged `interpret=True`

⭐**Run the cell below to define two more transformers, which scale the units of features to their actual value, and convert lat/long values to predicted cities.** 

In [12]:
class UnitScaler(Transformer):
  def __init__(self, column, scale, **kwargs):
    # column: string, column to scale
    # scale: value to multiply column by
    self.column = column
    self.scale = scale
    super().__init__(**kwargs)

  def data_transform(self, x):
    # Scales the data to a more human readable scale
    x[self.column] = x[self.column] * self.scale
    return x

  def transform_explanation(self, explanation):
    # This transform will not modify the explanation, so we return it as is
    return explanation

  def inverse_transform_explanation(self, explanation):
    # This transform will not modify the explanation, so we return it as is
    return explanation

class CityConverter(Transformer):
  def __init__(self, **kwargs):
    self.cities = cities
    super().__init__(**kwargs)

  def data_transform(self, x):
    # Converts latitude/longitude coordinates to closest city name. Note that 
    #    we are using a very rough estimate here, assuming constant size. 
    flag = False
    if isinstance(x, pd.Series):
      x = x.to_frame().T
      flag = True
    for index, row in self.cities.iterrows():
      lat = row["Latitude"]
      lon = row["Longitude"]
      x.loc[(x["latitude"] > lat-0.1) & (x["latitude"] < lat+0.1) & (x["longitude"] > lon-0.1) & (x["longitude"] < lon+0.1), "city"] = row["Name"]
    x = x.drop("latitude", axis=1)
    x= x.drop("longitude", axis=1)
    if flag:
      x = x.squeeze(axis=0)
    return x

  def transform_explanation_additive_contributions(self, explanation):
    # In the case of additive contributions, we can combine the latitude and 
    #    longitude explanation contributions by adding to get the city 
    #    contribution
    explanation = explanation.get()
    explanation["city"] = explanation["longitude"] + explanation["latitude"]
    explanation = explanation.drop("longitude", axis=1)
    explanation = explanation.drop("latitude", axis=1)
    return AdditiveFeatureContributionExplanation(explanation)

# Generating Global Explanations

You can now start using Pyreal Explainers to investigate the model. Remember to take a look at the links at the top of this page as needed.

We will begin by generating a *global* explanation, or an explanation of how the model makes predictions in general.

⭐**In the next cell, initialize, fit, and call the produce function on a `GlobalFeatureImportance` Explainer.**

⭐**Remember, you will need to begin by initalizing and fitting the required transformers. Revisit the cells above for details on the transformers you will need and their flags. You can use the `fit_transformers` function to quickly fit them.** 

In [13]:
from pyreal.explainers import GlobalFeatureImportance
from pyreal.transformers import fit_transformers
from pyreal.utils import visualize

# Step one: Initialize and fit transformers using fit_transformers
# ---- Your code here ----

# Step two: Initialize and fit the explainer
# ---- Your code here ----

# Step three: Generate and visualize the explanation
# ---- Your code here ----

⭐**Now, please press the "next" button on the Qualtrics survey tab and answer the next set of questions, referencing this tab as needed.** ⭐




# Generating Local Explanations

⭐ **Now, please consider the `sample_block` listed below, which refers to a hypothetical block of houses that might exist in California. Run the code block.**



In [14]:
sample_block = pd.Series({
  "longitude": -122.23,
  "latitude": 37.88,
  "housing_median_age": 16,
  "total_rooms": 672,
  "total_bedrooms": 230,
  "population": 220,
  "households": 52,
  "median_income":  5.3252,
  "ocean_proximity": "NEAR BAY" 
})


We will now generate a *local* explanation, or an explanation of why the model makes the prediction it does for the specific `sample_block` above.

⭐**In the next cell, initialize, fit, and call the produce function on a `LocalFeatureContribution` Explainer. Generate an explanation for the prediction of the `sample_block` above.**

⭐**You can likely reuse the already-fit transformers from the previous section. If you would like to use any more, remember to fit them before using.**

In [15]:
from pyreal.explainers import LocalFeatureContribution
from pyreal.transformers import fit_transformers
from pyreal.utils import visualize

# Step one: Initialize and fit the explainer
# ---- Your code here ----

# Step two: Generate and visualize the explanation
# ---- Your code here ----

⭐**Now, please press the "next" button on the Qualtrics survey tab and answer the next set of questions, referencing this tab as needed.** ⭐


# Downloading This Notebook

⭐**Once you have finished answering all questions, save and download this notebook to a location of your choosing using `File` &#8594; `Download` &#8594; `Download .ipynb`, in the upper left toolbar of this page. You will need to upload it shortly to the Qualtrics survey.**

⭐**Please return to the Qualtrics survey now and press next, then follow instructions to upload this notebook, and then answer final reflection questions.** 