## Security Engineer SlackBot

#### Problem Statement

Currently, the appsec team has done good job in creating a knowledgeable for frequently asked question from service team. So, whenever a service team reach out to Individual security team member individually or in a slack group with a question that exist in the knowledge base(that security team maintains). Security team do a manual cross reference for that question in the quip doc and respond to the service team in slack with an answer. The process is good in a way that the security team doesn’t need to spend time looking out for answer for the question if it exists. However, the process of responding to service team member is still manual. It requires security team member attention and a context switch from what they currently work upon to respond to the question which has been responded earlier.



#### Proposed Solution

Security team is coming up with a solution which can automate and help answer the frequently asked questions from service team. There is reliance on the knowledge base and if the answer to a question exists in the knowledge base we inherently assume that the question has been asked earlier. 

*Note*: Process of building knowledge base is currently out of scope of this project. There is already work going on maturing the knowledge base. This project will leverage the KB to automate the response of a frequently asked question by service team in a slack message.

#### STEPS:

1. Build, train and deploy the model from the HuggingFace pretrained model library.

2. Create a knowledge base to fine tune a pretrained model from hugging face

3. Use the finetuned model to generate text responses to questions by customers.

#### AI/ML solution by: Madhur Prashant (Alias: madhurpt, madhurpt@amazon.com)

### STEP 0: INSTALL THE TRANSFORMERS SDK LOCALLY

In [3]:
pip install --upgrade pip

Collecting pip
  Using cached pip-23.2.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0
    Uninstalling pip-23.0:
      Successfully uninstalled pip-23.0
Successfully installed pip-23.2.1
[0mNote: you may need to restart the kernel to use updated packages.


In [4]:

## Before executing the notebook, there are some initial steps required for setup. This notebook requires latest version of sagemaker and ipywidgets.
!pip install sagemaker ipywidgets --upgrade --quiet

[0m

---

To train and host on Amazon Sagemaker, we need to setup and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook instance as the AWS account role with SageMaker access. It has necessary permissions, including access to your data in S3. 

---

In [5]:
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

## 2. Select a pre-trained model
***
You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of JumpStart models can also be accessed at [JumpStart Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/jumpstart.html#).
***

In [9]:
model_id = "huggingface-eqa-distilbert-base-multilingual-cased"

In [10]:
import IPython
from ipywidgets import Dropdown

# download JumpStart model_manifest file.
boto3.client("s3").download_file(
    f"jumpstart-cache-prod-{aws_region}", "models_manifest.json", "models_manifest.json"
)
with open("models_manifest.json", "rb") as json_file:
    model_list = json.load(json_file)

# filter-out all the Extractive Question Answering models from the manifest list.
eqa_models_all_versions, eqa_models = [
    model["model_id"] for model in model_list if "-eqa-" in model["model_id"]
], []
[eqa_models.append(model) for model in eqa_models_all_versions if model not in eqa_models]

# display the model-ids in a dropdown, for user to select a model.
dropdown = Dropdown(
    value=model_id,
    options=eqa_models,
    description="JumpStart Extractive Question Answering Models:",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(IPython.display.Markdown("## Select a JumpStart pre-trained model from the dropdown below"))
display(dropdown)

## Select a JumpStart pre-trained model from the dropdown below

Dropdown(description='JumpStart Extractive Question Answering Models:', index=9, layout=Layout(width='max-cont…

## 3. Run inference on the pre-trained model
***
Using JumpStart, we can perform inference on a pre-trained model, even without fine-tuning it first on a custom dataset. The model available for deployment is created by attaching an answer extracting layer to 
the output of the Text Embedding model, and then fine-tuning the entire model on 
[SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset. 
The SQuAD2.0 dataset comprises pairs of question-context, and the answer of the 
question in the context indicated by the starting and ending character position in the context.

***

### 3.1. Retrieve JumpStart Artifacts & Deploy an Endpoint
***
We retrieve the deploy_image_uri, deploy_source_uri, and base_model_uri for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it.
***

In [11]:
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# model_version="*" fetches the latest version of the model.
infer_model_id, infer_model_version = dropdown.value, "*"

endpoint_name = name_from_base(f"jumpstart-{infer_model_id}")

inference_instance_type = "ml.m5.xlarge"

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=infer_model_id,
    model_version=infer_model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri.
deploy_source_uri = script_uris.retrieve(
    model_id=infer_model_id, model_version=infer_model_version, script_scope="inference"
)
# Retrieve the base model uri.
base_model_uri = model_uris.retrieve(
    model_id=infer_model_id, model_version=infer_model_version, model_scope="inference"
)
# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)
# deploy the Model.
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name,
)

----!

### 3.2. Example question-context pair for inference
***
Let's put in some example question-contexts pairs. You can put in any question-context pairs, the model will predict a part of context that contains the answer.
These examples are taken from SQuAD2.0 dataset downloaded from [Dataset Homepage](https://rajpurkar.github.io/SQuAD-explorer/). [CC BY-SA 4.0 License](https://creativecommons.org/licenses/by-sa/4.0/legalcode).
***

In [17]:
question_context1 = [
    "How do I know if I need a security review for this change?",
    "A security review is required any time you're making a change (or releasing a new system or service) that could impact the security of customers, AWS, or Amazon. More specifically... All launches (alpha, beta, gamma, demo, GA, public, or private) require a security review. Any security-impacting change to a production environment, or a test environment that uses or has access to production data such as customer content/workloads needs a security review.  If you don't know what a security-impacting change means, ask yourself if this change is implemented, is there a likelihood that we negatively impact customers from a security perspective?"
]
question_context2 = [
    "Where can I create a security ticket for AppSec Review?",
    "https://appsec.corp.amazon.com",
]

### 3.3. Query endpoint and parse response
***
Input to the endpoint is a question-context pair. Response from the endpoint is the answer extracted from the context for the input question.
***

In [19]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text):
    response = base_model_predictor.predict(
        encoded_text, {"ContentType": "application/list-text", "Accept": "application/json;verbose"}
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response)
    answer = (model_predictions["answer"],)
    return answer


for question_context in [question_context1, question_context2]:
    query_response = query_endpoint(json.dumps(question_context).encode("utf-8"))
    answer = parse_response(query_response)
    print(
        f"Inference:{newline}"
        f"Question: {bold}{question_context[0]}{unbold}{newline}"
        f"Context: {question_context[1]}{newline}"
        f"model answer: {bold}{answer}{unbold}{newline}"
    )

Inference:
Question: [1mHow do I know if I need a security review for this change?[0m
Context: A security review is required any time you're making a change (or releasing a new system or service) that could impact the security of customers, AWS, or Amazon. More specifically... All launches (alpha, beta, gamma, demo, GA, public, or private) require a security review. Any security-impacting change to a production environment, or a test environment that uses or has access to production data such as customer content/workloads needs a security review.  If you don't know what a security-impacting change means, ask yourself if this change is implemented, is there a likelihood that we negatively impact customers from a security perspective?
model answer: [1m("If you don't know what a security-impacting change",)[0m

Inference:
Question: [1mWhere can I create a security ticket for AppSec Review?[0m
Context: https://appsec.corp.amazon.com
model answer: [1m('appsec.corp.amazon.com',)[0m



### 3.4. Clean up the endpoint

In [20]:
# Delete the SageMaker endpoint and the attached resources
base_model_predictor.delete_model()
base_model_predictor.delete_endpoint()

## 4. Finetune the pre-trained model on a custom dataset
***
Previously, we saw how to run inference on a pre-trained model, which was fine-tuned on SQuADv2 dataset. Next, we discuss how a model can be finetuned to a custom dataset. 

The Text Embedding model can be fine-tuned on any extractive question 
answering dataset in the same way the model available for inference has been 
fine-tuned on the SQuAD2.0 dataset.
The model available for fine-tuning attaches an answer extracting layer to the Text Embedding model
and initializes the layer parameters to random values. The fine-tuning step fine-tunes 
all the model parameters to minimize prediction error on the input data and returns the fine-tuned model.
The model returned by fine-tuning can be further deployed for inference. Below are the instructions 
for how the training data should be formatted for input to the model. 

- **Input:**  A directory containing a 'data.csv' file.
    - The first column of the 'data.csv' should have a question.
    - The second column should have the corresponding context.
    - The third column should have the integer character starting position for the answer in the context.
    - The fourth column should have the integer character ending position for the answer in the context.
- **Output:** A trained model that can be deployed for inference. 

### 4.1. Retrieve JumpStart Training artifacts
***
Here, for the selected model, we retrieve the training docker container, the training algorithm source, the pre-trained model, and a python dictionary of the training hyper-parameters that the algorithm accepts with their default values. Note that the model_version="*" fetches the latest model. Also, we do need to specify the training_instance_type to fetch train_image_uri.
***

In [21]:
from sagemaker import image_uris, model_uris, script_uris, hyperparameters

model_id, model_version = dropdown.value, "*"
training_instance_type = "ml.p3.2xlarge"

# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)
# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="training"
)
# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="training"
)

### 4.2. Set Training parameters
***
Now that we are done with all the setup that is needed, we are ready to fine-tune our Sentence Pair Classification model. To begin, let us create a [``sageMaker.estimator.Estimator``](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) object. This estimator will launch the training job. 

There are two kinds of parameters that need to be set for training. 

The first one are the parameters for the training job. These include: (i) Training data path. This is S3 folder in which the input data is stored, (ii) Output path: This the s3 folder in which the training output is stored. (iii) Training instance type: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training. We defined the training instance type above to fetch the correct train_image_uri. 

The second set of parameters are algorithm specific training hyper-parameters.
***

In [22]:
# Sample training data is available in this bucket
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
# For a quick demonstration of training we have created a random subset of SQuAD-v2 dataset.
# For complete QNLI dataset replace "SQuAD-v2-tiny" with "SQuAD-v2" in the line below.
training_data_prefix = "training-datasets/SQuAD-v2-tiny/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

output_bucket = sess.default_bucket()
output_prefix = "jumpstart-eqa-training"

s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"

In [23]:
from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["batch-size"] = "16"
print(hyperparameters)

{'epochs': '3', 'adam-learning-rate': '2e-05', 'batch-size': '16', 'reinitialize-top-layer': 'Auto', 'train-only-top-layer': 'False'}


### 4.3. Train with Automatic Model Tuning ([HPO](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html)) <a id='AMT'></a>
***
Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. We will use a [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html) object to interact with Amazon SageMaker hyperparameter tuning APIs.
***

In [25]:
from sagemaker.tuner import ContinuousParameter

# Use AMT for tuning and selecting the best model
use_amt = True

# Define objective metric per framework, based on which the best model will be selected.
metric_definitions_per_model = {
    "huggingface": {
        "metrics": [{"Name": "val_loss", "Regex": "'eval_loss': ([0-9]+(.|e\-)[0-9]+),?"}],
        "type": "Minimize",
    }
}

# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
    "adam-learning-rate": ContinuousParameter(0.00001, 0.01, scaling_type="Logarithmic")
}

# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 6
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 2

### 4.4. Start Training
***
We start by creating the estimator object with all the required assets and then launch the training job.
***

In [26]:
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base
from sagemaker.tuner import HyperparameterTuner

training_job_name = name_from_base(f"jumpstart-{model_id}-transfer-learning")

# Create SageMaker Estimator instance
eqa_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
    base_job_name=training_job_name,
)

if use_amt:
    metric_definitions = next(
        value for key, value in metric_definitions_per_model.items() if model_id.startswith(key)
    )

    hp_tuner = HyperparameterTuner(
        eqa_estimator,
        metric_definitions["metrics"][0]["Name"],
        hyperparameter_ranges,
        metric_definitions["metrics"],
        max_jobs=max_jobs,
        max_parallel_jobs=max_parallel_jobs,
        objective_type=metric_definitions["type"],
        base_tuning_job_name=training_job_name,
    )

    # Launch a SageMaker Tuning job to search for the best hyperparameters
    hp_tuner.fit({"training": training_dataset_s3_path})
else:
    # Launch a SageMaker Training job by passing s3 path of the training data
    eqa_estimator.fit({"training": training_dataset_s3_path}, logs=True)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateHyperParameterTuningJob operation: The account-level service limit 'ml.p3.2xlarge for training job usage' is 1 Instances, with current utilization of 0 Instances and a request delta of 2 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.