## Security Engineer SlackBot: LLM + RETREIVAL AUGMENTED GENERATION

#### Problem Statement

Currently, the appsec team has done good job in creating a knowledgeable for frequently asked question from service team. So, whenever a service team reach out to Individual security team member individually or in a slack group with a question that exist in the knowledge base(that security team maintains). Security team do a manual cross reference for that question in the quip doc and respond to the service team in slack with an answer. The process is good in a way that the security team doesn’t need to spend time looking out for answer for the question if it exists. However, the process of responding to service team member is still manual. It requires security team member attention and a context switch from what they currently work upon to respond to the question which has been responded earlier.



#### Proposed Solution

Security team is coming up with a solution which can automate and help answer the frequently asked questions from service team. There is reliance on the knowledge base and if the answer to a question exists in the knowledge base we inherently assume that the question has been asked earlier. 

*Note*: Process of building knowledge base is currently out of scope of this project. There is already work going on maturing the knowledge base. This project will leverage the KB to automate the response of a frequently asked question by service team in a slack message.

#### STEPS:

1. Build, train and deploy the model from the HuggingFace pretrained model library.

2. Create a knowledge base to fine tune a pretrained model from hugging face

3. Use the finetuned model to generate text responses to questions by customers.

#### AI/ML solution by: Madhur Prashant (Alias: madhurpt, madhurpt@amazon.com)

## Retrieval Augmented Generation (RAG) with Lanchain

1. Langchain: Framework for orchestrating the RAG Workflow
2. FAISS: Using an in-memory vector database for storing document embeddings
3. PyPDF: Python library for processing and storing the PDF Documents

In [2]:
%pip install langchain==0.0.251 --quiet --root-user-action=ignore
%pip install faiss-cpu==1.7.4 --quiet --root-user-action=ignore
%pip install pypdf==3.15.1 --quiet --root-user-action=ignore

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### FETCHING AND PROCESSING THE AppSec Team Data

In [3]:
filenames = [
    'General AppSec Related.pdf',
]

data_root = "./data/"

In [4]:
filenames = [
    'General AppSec Related.pdf',
]

data_root = "./data/"

import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

documents = []

for filename in filenames:
    loader = PyPDFLoader(data_root + filename)
    loaded_documents = loader.load()  # Use a variable to store loaded documents
    documents.extend(loaded_documents)  # Extend the list with loaded documents

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=100,
)

docs = text_splitter.split_documents(documents)

print(f'Number of Document Pages: {len(documents)}')
print(f'Number of Document Chunks: {len(docs)}')

Number of Document Pages: 28
Number of Document Chunks: 170


### Now, that we have processed the document or data, let's work with the model to embed the documents in vector stores to be able to use RAG to get the contextually correct AppSec related documents

## Deploying a Model for Embedding: All MiniLML6 v2 and the LLaMa-2-7b-chat for our LLM

In [5]:
!pip install -qU \
    sagemaker \
    pinecone-client==2.2.1 \
    ipywidgets==7.0.0

[0m

To begin, we will initialize all of the SageMaker session variables we'll need to use throughout the walkthrough.

In [6]:
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()

my_model = JumpStartModel(model_id = "meta-textgeneration-llama-2-7b-f")



#### LLaMa chat LLM endpoint: arn:aws:sagemaker:us-east-1:110011534045:endpoint-config/llama-2-generator

## Deploying the model endpoint for Sentence Transformer embedding model

In [7]:
# hub_config = {
#     "HF_MODEL_ID": "sentence-transformers/all-MiniLM-L6-v2",  # model_id from hf.co/models
#     "HF_TASK": "feature-extraction",
# }

# huggingface_model = HuggingFaceModel(
#     env=hub_config,
#     role=role,
#     transformers_version="4.6",  # transformers version used
#     pytorch_version="1.7",  # pytorch version used
#     py_version="py36",  # python version of the DLC
# )

In [8]:
from sagemaker.jumpstart.model import JumpStartModel

embedding_model_id, embedding_model_version = "huggingface-textembedding-all-MiniLM-L6-v2", "*"
model = JumpStartModel(model_id=embedding_model_id, model_version=embedding_model_version)
embedding_predictor = model.deploy()

--------!

In [9]:
embedding_model_endpoint_name = embedding_predictor.endpoint_name
embedding_model_endpoint_name

'hf-textembedding-all-minilm-l6-v2-2023-09-04-19-33-04-133'

In [10]:
import boto3
aws_region = boto3.Session().region_name

print(aws_region)

us-east-1


## Creating and Populating our Vector Database:

In [11]:
from typing import Dict, List
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
import json

class CustomEmbeddingsContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"
    
    def transform_input(self, inputs: list[str], model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": inputs, **model_kwargs})
        return input_str.encode("utf-8")
    
    def transform_output(self, output: bytes) -> List[List[float]]:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json.get("embedding", [])  # Use get() with a default value
        return embeddings  # Make sure to return the embeddings
    

embeddings_content_handler = CustomEmbeddingsContentHandler()

embeddings = SagemakerEndpointEmbeddings(
    endpoint_name= embedding_model_endpoint_name,
    region_name=aws_region,
    content_handler=embeddings_content_handler,
)

Now, with our embeddings, we can process our document chunks into vectors and actually store them somewhere. Our project will use the:

#### FAISS: In-Memory vector database

In [12]:
from langchain.schema import Document

In [13]:
from langchain.vectorstores import FAISS

#### Now, we will store our FAISS database


In [14]:
db = FAISS.from_documents(docs, embeddings)


### NOW, RUNNING VECTOR QUERIES!!

In [15]:
query = "What is penetration testing?"

In [16]:
results_with_scores = db.similarity_search_with_score(query)

for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nScore {score}\n\n")

Content: customers expect our products, services, and systems to be hardened and a
critical check that we've met this expectation is a pen test.
Penetration testing complements earlier-lifecycle activities such as design
reviews, threat modeling, code reviews for security, and security unit and
integration testing. Ideally, these earlier-lifecycle activities would obviate the
need for a pen test but our experience has shown that this late-lifecycle
Score 0.6931231021881104


Content: Status of this Document
This is a living document and will be updated as our criteria and process
improve. Unless you are making cosmetic changes, please work with subur@.
Why Do A Penetration Test?
Penetration testing, or simulating how a malicious actor could try to misuse
our systems, is an essential security activity to validate that what has been
built meets our high security bar and will protect customers and AWS. Our
customers expect our products, services, and systems to be hardened and a
Score 0.7

## PROMPT ENGINEERING FOR CUSTOM DATA

In [65]:
from langchain.prompts import PromptTemplate

prompt_template = """
<s>[INST] <<SYS>>
Use the context provided below to answer the question at the end. If you don't know the answer, please state that you don't know and do not attempt to make up an answer.
<</SYS>>

Context:
----------------
{context}
----------------

Question: {question} [/INST]
"""

PROMPT = PromptTemplate(
    template = prompt_template, 
    input_variables=["context", "question"]
)

#### Now that we have defined what our prompt template is going to look like, we will create and prepare our LLM

## PREPARING OUR CUSTOM LLM

In [67]:
from typing import Dict

from langchain import SagemakerEndpoint, PromptTemplate
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA
import json

class QAContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"
    
    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps(
            {"inputs" : [
                [
                    {
                        "role": "system", 
                        "content": ""
                    },
                    {
                        "role": "user", 
                        "content": prompt
                    }
                ]], 
             "parameters": {**model_kwargs}
            })
        return input_str.encode('utf-8')
    
    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generation"]["content"]
    
qa_content_handler = QAContentHandler()

Now that we have our content handler, we will deploy a sagemaker endpoint for our Large Language Model that will work with the embedding model to generate outputs.

## SageMaker LLaMa-2-7b-f LLM for our CUSTOM DATASET

In [32]:
# from sagemaker.jumpstart.model import JumpStartModel

llm_model_id, llm_model_version = "meta-textgeneration-llama-2-7b-f", "*"
llm_model = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version)
llm_predictor = llm_model.deploy(
    initial_instance_count=1, instance_type="ml.g5.4xlarge")

---------------!

In [68]:
llm_model_endpoint_name = llm_predictor.endpoint_name
llm_model_endpoint_name

'meta-textgeneration-llama-2-7b-f-2023-09-04-19-49-48-116'

In [70]:
llm = SagemakerEndpoint(
    endpoint_name=llm_model_endpoint_name, 
    region_name=aws_region, 
    model_kwargs={"max_new_tokens": 1000, "top_p":0.9, "temperature": 1e-11}, 
    endpoint_kwargs={"CustomAttributes": "accept_eula=true"},
    content_handler=qa_content_handler
)

Now, we can use our 'llm' object to query and make predictions on our dataset

In [71]:
query = "Hello"
llm.predict(query)

" Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"

In [72]:
query = "What is penetration testing in AWS?"
llm.predict(query)

" Penetration testing, also known as pen testing or ethical hacking, is a cybersecurity assessment where a trained security professional simulates a cyber attack on an organization's computer systems, network, or web application to identify vulnerabilities and weaknesses. The goal of penetration testing is to help organizations strengthen their defenses and protect against real-world attacks.\n\nIn the context of Amazon Web Services (AWS), penetration testing can be performed on AWS resources and infrastructure to identify potential security risks and weaknesses. This can include testing the security of AWS services such as EC2 instances, RDS instances, S3 buckets, and Lambda functions, as well as the security of the network connectivity between these resources.\n\nPenetration testing in AWS typically involves the following steps:\n\n1. Planning and preparation: The security professional will work with the organization to identify the scope of the test, the systems and resources to be 

## Not a bad answer, but we will create an Langchain CHAIN  using the RetrievalQA chain which will:

1. Take a query as input
2. Generate query embeddings
3. Query the vector database for revelant chunks from the knowledge you supply
4. Inject the context and original query in the Prompt Template
5. Invoke the LLM with a completed prompt and
6. Successfuly get the LLM Response/Completion:

In [73]:
qa_chain = RetrievalQA.from_chain_type(
    llm, 
    chain_type = 'stuff',
    retriever=db.as_retriever(), 
    return_source_documents=True, 
    chain_type_kwargs={"prompt":PROMPT}
)

### Now that our chain has been created, we can supply queries to it and generate responses based on our source documents

In [74]:
query = "What is penetration testing in AWS Security?"
result = qa_chain({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
    print(f'{srcdoc}\n')

Query: What is penetration testing in AWS Security?

Result:  Based on the provided context, penetration testing in AWS Security refers to the process of simulating attacks on an AWS system or application to identify vulnerabilities and weaknesses, and to validate the effectiveness of the security controls in place. The purpose of penetration testing is to ensure that the system or application meets the required security standards and can protect against potential threats.

The document provides criteria for determining when penetration testing is required, which includes:

1. When a new feature or functionality meets the conditions in criterion 5 of the Penetration Testing Criteria.
2. If you are concerned about the scalability or sustainability of these security review criteria, please discuss them with your AWS security engineer or AWS security manager.

The document also provides a PT determination criteria and a PT scoping template to help identify the assets that require penetrat

In [75]:
query = "What is AWS security?"
result = qa_chain({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
    print(f'{srcdoc}\n')

Query: What is AWS security?

Result:  Based on the context provided, AWS security refers to the measures and controls implemented by Amazon Web Services (AWS) to protect its infrastructure, services, and customer data from various security threats. The context mentions several security-related concepts and tools, including:

1. Baseline security control checklist: A reference guide for security controls that AWS uses to evaluate the security posture of its services.
2. Automated scanners (ACAT, MonkeyTester, and SI): Tools used to automate security testing and vulnerability assessment of AWS services.
3. AWS Security Bar: A security framework that provides a set of security controls and guidelines for AWS services to follow.
4. Penetration testing: A security assessment methodology used to identify vulnerabilities in AWS services.
5. Security canaries / tests: A process of analyzing mitigations with the service team to identify security improvements.

Overall, AWS security is focused 

In [76]:
query = "When do I need an appsec review?"
result = qa_chain({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
    print(f'{srcdoc}\n')

Query: When do I need an appsec review?

Result:  Based on the provided context, you need an AppSec review any time you're making a change (or releasing a new system or service) that could impact the security of customers, AWS, or Amazon. Specifically, all launches (alpha, beta, gamma, demo, GA, public, or private) require a security review. Additionally, if you are updating or publishing documentation that includes code snippets, templates, or configuration details, the content should be reviewed by an AppSec engineer. You can create a security ticket for AppSec review through the following link: <https://appsec.corp.amazon.com>.

Context Documents: 
page_content="General AppSec Related\n●\n●\n●\n●\n●\n●\n●\n●\n●\n●\n●\n●\n●\n○\n○\n●\n●\n●\n●\n●\n● How do I know if I need a security review for this change? \nWhen do I need an Appsec review?\nA security review is required any time you're making a change (or releasing a new system or service) that could impact\nthe security of customers

In [78]:
query = "What is the BSC Structure?"
result = qa_chain({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
    print(f'{srcdoc}\n')

Query: What is the BSC Structure?

Result:  Based on the provided context, the BSC Structure is as follows:

1. Applicability: The BSCs are applicable where a customer has a strong expectation of the control. The control must be related to a security control and not merely a process hook.
2. Scope: The scope of the control is sufficiently abstract while remaining specific to a topic. The control is not too broad, such as "Limit actions to the minimum necessary," but rather specific, such as "Encrypt all data at rest using automatically rotated KMS keys."
3. Preview, GA, etc.): The BSCs are intended to give teams a reasonable, transparent baseline from which to start. They help teams reason about specific controls and enable them to build security into products from the start.
4. Leadership Team Roles and Responsibilities: The BSC Leadership Team is responsible for reviewing and approving proposed BSC adds/changes/deletes, guiding, mentoring, and providing feedback on BSC adds/changes/d

## CLEAN UP YOUR ENDPOINT!

In [None]:
# sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# sagemaker_client.delete_endpoint(EndpointName=embedding_model_endpoint_name)
# sagemaker_client.delete_endpoint(EndpointName=llm_model_endpoint_name)