# Batch Transform for Explanations

In this notebook, we'll use Amazon SageMaker Batch Transform to obtain
explanations for our complete dataset.

**Note**: When running this notebook on SageMaker Studio, you should make
sure the 'SageMaker JumpStart Data Science 1.0' image/kernel is used. You
can run all cells or step through them one at a time.

<p align="center">
  <img src="https://github.com/awslabs/sagemaker-explaining-credit-decisions/raw/master/docs/architecture_diagrams/stage_4.png" width="1000px">
</p>

We then import a variety of packages that will be used throughout
the notebook. One of the most important packages used throughout this
solution is the Amazon SageMaker Python SDK (i.e. `import sagemaker`). We
also import modules from our own custom package that can be found at
`./package`.

In [None]:
import boto3
from pathlib import Path
import sagemaker
from sagemaker.transformer import Transformer
import sys

sys.path.insert(0, '../package')
from package import config, utils

Up next, we define the current folder, a sagemaker session and a
sagemaker client (from `boto3`).

In [None]:
current_folder = utils.get_current_folder(globals())
sagemaker_session = sagemaker.Session()
sagemaker_client = boto3.client('sagemaker')

We define a function below to retrieve the same model that was created in
last stage. Model refers to the package of model assets and deployment
code. We could have created another model here (using the same model data
from the training stage) but let's use the same model to avoid
duplication.

In [None]:
def get_latest_model(name_contains):
    paginator = sagemaker_client.get_paginator('list_models')
    try:
        for page in paginator.paginate(NameContains=name_contains):
            models = page['Models']
            if len(models):
                return models[0]['ModelName']
    except:
        raise ValueError("Couldn't find any models with '{}' in name.".format(name_contains))

In [None]:
latest_model = get_latest_model(config.SOLUTION_PREFIX)
print("latest model: {}".format(latest_model))
job_name = latest_model

Unlike the last stage, where we deployed an endpoint, we define a
`Transformer` to perform the batch computation. We specify the instance
type that should be used for the computation (i.e. `ml.c5.xlarge`) and a
number of other parameters. `strategy='SingleRecord'` means that records
will be processed by the explainer one at a time. And `output_path`
defines where the Batch Transform output should be saved.

In [None]:
batch_explainer = Transformer(
    model_name=latest_model,
    instance_count=1,
    instance_type='ml.c5.xlarge',
    strategy='SingleRecord',
    assemble_with='Line',
    output_path='s3://' + str(Path(config.S3_BUCKET, 'explanations', job_name)) + '/',
    accept='application/json',
    base_transform_job_name=config.SOLUTION_PREFIX,
    sagemaker_session=sagemaker_session,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}]
)

We haven't yet started the Batch Transform Job. Calling `.transform` does
that below. We also specify the `content_type` at this stage, which gives
us control over what type of entities we want to return from the
explainer. As an example, we have requested SHAP interaction values
during this batch job.

In [None]:
entities = [
    'data',
    'features',
    'prediction',
    'explanation_shap_values',
    'explanation_shap_interaction_values'
]
batch_explainer.transform(
    data='s3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'data_test')) + '/',
    content_type="application/json; entities={}".format(",".join(entities)),
    split_type='Line',
    wait=True
)

After the Batch Transform Job has completed successfully, we will have a
complete set of explanations sitting in the Amazon S3 bucket.

## Next Stage

Up next we'll wrap things up and discuss how to clean up the solution.

[Click here to continue.](./5_conclusion.ipynb)