# Retrieval-Augmented Generation
This notebook covers the implimentation of a Retrieval-Augemented Generation framework with a pre-trained LLM and has been largely adopted from James Briggs work linked in the resources. I have contributed a few explainers as well as PDF document intergration pipeline using AWS Textract in the interest of expanding potential data sources for the RAG knowledge library (otherwise known as a 'content store' or 'data store'). 

Below is the project outline involving both .csv ingest as well as .pdf ingest for the knowledge library. A quick video primer on the basics of the Retrieval-Augmented Generation framework can be found under "What is Retrieval-Augmented Generation (RAG)?" in the resources.

<img src="./images/project_overview.png" width="700" />

#### Topics Covered:
- Retrieval-Augmented Generation (RAG) Basics
- AWS Hugging Face SDK and deploying pre-trained models
- AWS Textract for PDF Optical Character Recognition (OCR)
- Pinecone Vector Database - Creating, populating and Querying Indexes

#### Resources:
- James Briggs Hugging Face LLMs with SageMaker + RAG with Pinecone - https://github.com/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-huggingface-rag.ipynb
- What is Retrieval-Augmented Generation (RAG)?: https://www.youtube.com/watch?v=T-D1OfcDW1M
- Pinecone Database: https://www.pinecone.io
- SageMaker Hugging Face Documentation - https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html#hugging-face-model
- AWS Textract: https://aws.amazon.com/textract/



<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Set Up

In [133]:
!pip install -qU \
    sagemaker==2.173.0 \
    pinecone-client==2.2.1 \
    ipywidgets==7.0.0

[0m

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
🤗 HF LLM

The SageMaker HuggingFace SDK allows you to deploy models from 2 different sources:
- Trained models stored in s3
- Pre-trained models from the HuggingFace Hub

We will be deploying a pre-trained model from the HuggingFace Hub

In [134]:
import sagemaker
from sagemaker.huggingface import (
    HuggingFaceModel,
    get_huggingface_llm_image_uri
)

role = sagemaker.get_execution_role()

hub_config = {
    'HF_MODEL_ID':'google/flan-t5-xl', # model_id from hf.co/models
    'HF_TASK':'text-generation' # NLP task you want to use for predictions
}

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.8.2"
)

# Create Huggingface Model Class
huggingface_model = HuggingFaceModel(
    env=hub_config,
    role=role, # iam role with permissions to create an Endpoint
    image_uri=llm_image
)

Deploy Model

In [135]:
llm = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    endpoint_name="flan-t5-demo"
)

--------------!

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Context Demonstration

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Questioning the Model Without Context

In [136]:
question = "Which instances can I use with Managed Spot Training in SageMaker?"

out = llm.predict({"inputs": question})
out

[{'generated_text': 'SageMaker and SageMaker XL.'}]

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Questioning the Model With Manually Added Context

In [137]:
context = """Managed Spot Training can be used with all instances
supported in Amazon SageMaker. Managed Spot Training is supported
in all AWS Regions where Amazon SageMaker is currently available."""

In [138]:
prompt_template = """Answer the following QUESTION based on the CONTEXT
given. If you do not know the answer and the CONTEXT doesn't
contain the answer truthfully say "I don't know".

CONTEXT:
{context}

QUESTION:
{question}

ANSWER:
"""

text_input = prompt_template.replace("{context}", context).replace("{question}", question)

out = llm.predict({"inputs": text_input})
generated_text = out[0]["generated_text"]
print(f"[Input]: {question}\n[Output]: {generated_text}")

[Input]: Which instances can I use with Managed Spot Training in SageMaker?
[Output]: all instances supported in Amazon SageMaker


<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Asking Unanswerable Questions

Notice how the mere addition of:

'If you do not know the answer and the CONTEXT doesn't contain the answer truthfully say "I don't know".' 

In the prompt template can gaurd against hallucination in the model. We will test this behavior out with the query below

In [139]:
unanswerable_question = "What color is my desk?"

text_input = prompt_template.replace("{context}", context).replace("{question}", unanswerable_question)

out = llm.predict({"inputs": text_input})
generated_text = out[0]["generated_text"]
print(f"[Input]: {unanswerable_question}\n[Output]: {generated_text}")

[Input]: What color is my desk?
[Output]: I don't know


This of course is the desired behavior which we will test for again at the end of the notebook.

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
RAG-Based Approach

Deploy Embedding Model from HuggingFace Hub

In [140]:
hub_config = {
    'HF_MODEL_ID': 'sentence-transformers/all-MiniLM-L6-v2', # model_id from hf.co/models
    'HF_TASK': 'feature-extraction'
}

huggingface_model = HuggingFaceModel(
    env=hub_config,
    role=role,
    transformers_version="4.6", # transformers version used
    pytorch_version="1.7", # pytorch version used
    py_version="py36", # python version of the DLC
)

In [141]:
encoder = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.large",
    endpoint_name="minilm-demo"
)

-----!

In [142]:
out = encoder.predict({"inputs": ["some text here", "some more text goes here too"]})

In [143]:
print(len(out))
print(len(out[0]), len(out[1]))
print(len(out[0][0]))

2
8 8
384


In [144]:
import numpy as np

embeddings = np.mean(np.array(out), axis=1)
embeddings.shape

(2, 384)

In [145]:
from typing import List

def embed_docs(docs: List[str]) -> List[List[float]]:
    out = encoder.predict({'inputs': docs})
    embeddings = np.mean(np.array(out), axis=1)
    return embeddings.tolist()

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Knowledge Library

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
AWS FAQ (.csv)

The AWS SageMaker FAQ dataset is a two-column, question & answer .csv file stored on s3

In [146]:
s3_path = f"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"

In [147]:
# Downloading the Database
!aws s3 cp $s3_path Amazon_SageMaker_FAQs.csv

download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to ./Amazon_SageMaker_FAQs.csv


In [148]:
import pandas as pd

df_knowledge = pd.read_csv("Amazon_SageMaker_FAQs.csv", header=None, names=["Question", "Answer"])
df_knowledge.head()

Unnamed: 0,Question,Answer
0,What is Amazon SageMaker?,Amazon SageMaker is a fully managed service to...
1,In which Regions is Amazon SageMaker available...,For a list of the supported Amazon SageMaker A...
2,What is the service availability of Amazon Sag...,Amazon SageMaker is designed for high availabi...
3,How does Amazon SageMaker secure my code?,Amazon SageMaker stores code in ML storage vol...
4,What security measures does Amazon SageMaker h...,Amazon SageMaker ensures that ML model artifac...


In [149]:
df_knowledge.drop(["Question"], axis=1, inplace=True)
df_knowledge.head()

Unnamed: 0,Answer
0,Amazon SageMaker is a fully managed service to...
1,For a list of the supported Amazon SageMaker A...
2,Amazon SageMaker is designed for high availabi...
3,Amazon SageMaker stores code in ML storage vol...
4,Amazon SageMaker ensures that ML model artifac...


<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
PDF Integration (AWS Textract)

In [150]:
!python -m pip install amazon-textract-caller --upgrade -q
!python -m pip install amazon-textract-response-parser --upgrade -q

[0m

In [151]:
import boto3
import time

In [152]:
mySession = boto3.session.Session()
awsRegion = mySession.region_name

In [153]:
s3BucketName = "aws-workshops-" + awsRegion
print(s3BucketName)

aws-workshops-us-east-1


In [154]:
# Amazon S3 client
s3 = boto3.client('s3')

# Amazon Textract client
textract = boto3.client('textract')

In [155]:
# Document
documentName = "textract-samples/Amazon-Textract-Pdf.pdf"

Below are a few AWS helper functions for processing PDF's with AWS textract. Addtional information can be found: https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/Textract.ipynb

In [156]:
def startJob(s3BucketName, objectName):
    response = None
    response = textract.start_document_text_detection(
    DocumentLocation={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': objectName
        }
    })

    return response["JobId"]

def isJobComplete(jobId):
    response = textract.get_document_text_detection(JobId=jobId)
    status = response["JobStatus"]
    print("Job status: {}".format(status))

    while(status == "IN_PROGRESS"):
        time.sleep(5)
        response = textract.get_document_text_detection(JobId=jobId)
        status = response["JobStatus"]
        print("Job status: {}".format(status))

    return status

def getJobResults(jobId):

    pages = []
    response = textract.get_document_text_detection(JobId=jobId)
        
    pages.append(response)
    print("Resultset page recieved: {}".format(len(pages)))
    nextToken = None
    if('NextToken' in response):
        nextToken = response['NextToken']

    while(nextToken):
        response = textract.get_document_text_detection(JobId=jobId, NextToken=nextToken)

        pages.append(response)
        print("Resultset page recieved: {}".format(len(pages)))
        nextToken = None
        if('NextToken' in response):
            nextToken = response['NextToken']

    return pages

In [157]:
# Invoke textract API
jobId = startJob(s3BucketName, documentName)
print("Started job with id: {}".format(jobId))
if(isJobComplete(jobId)):
    response = getJobResults(jobId)

#print(response)

# Print detected text
for resultPage in response:
    for item in resultPage["Blocks"]:
        if item["BlockType"] == "LINE":
            print ('\033[94m' +  item["Text"] + '\033[0m')

Started job with id: 19227996172071c389f2660e8433ed7540473e9dcdeeeafd7ddd6f6281f96b01
Job status: IN_PROGRESS
Job status: IN_PROGRESS
Job status: SUCCEEDED
Resultset page recieved: 1
[94mAmazon Textract[0m
[94mAmazon Textract is a service that automatically extracts text and data from scanned[0m
[94mdocuments. Amazon Textract goes beyond simple optical character recognition (OCR) to[0m
[94malso identify the contents of fields in forms and information stored in tables.[0m
[94mMany companies today extract data from documents and forms through manual data[0m
[94mentry that's slow and expensive or through simple optical character recognition (OCR)[0m
[94msoftware that is difficult to customize. Rules and workflows for each document and form[0m
[94moften need to be hard-coded and updated with each change to the form or when dealing[0m
[94mwith multiple forms. If the form deviates from the rules, the output is often scrambled[0m
[94mand unusable.[0m
[94mAmazon Textract o

The response will be a JSON containing our desired text as well as metadata and additional information. In order to feed the text into the embedding model, we will need to extract it from the JSON and perform some simple processing.

In [158]:
def extract_text_from_json(data):
    try:
        text_lines = []
        for block in data['Blocks']:
            if block['BlockType'] == 'LINE':
                text_lines.append(block['Text'])
        return text_lines
    except Exception as e:
        return str(e)

In [159]:
extracted_text = extract_text_from_json(response[0])

Perfectly parsing the OCR response of AWS Textract can be an involved process depending on the PDF. For example, this document contains paragraph headers whose formatting was not explicitly translated through the OCR process. This results in chunks of text that are neither separated with punctuation, nor should they be included with neighboring sentences.

In [160]:
extracted_text

['Amazon Textract',
 'Amazon Textract is a service that automatically extracts text and data from scanned',
 'documents. Amazon Textract goes beyond simple optical character recognition (OCR) to',
 'also identify the contents of fields in forms and information stored in tables.',
 'Many companies today extract data from documents and forms through manual data',
 "entry that's slow and expensive or through simple optical character recognition (OCR)",
 'software that is difficult to customize. Rules and workflows for each document and form',
 'often need to be hard-coded and updated with each change to the form or when dealing',
 'with multiple forms. If the form deviates from the rules, the output is often scrambled',
 'and unusable.',
 'Amazon Textract overcomes these challenges by using machine learning to instantly',
 '"read" virtually any type of document to accurately extract text and data without the',
 'need for any manual effort or custom code. With Textract you can quickly auto

As noted, the current parsing is not flawless, but should be sufficient for our retrieved context as many LLMs can handle more extreme redundancies and syntax errors than this.

In [161]:
import pandas as pd

# Concatenate the texts by their separators
full_text = " ".join(extracted_text)

# Split the full text by sentence-ending punctuation to create individual sentences
sentences = [sentence.strip() for sentence in full_text.split('.') if sentence]

# Create a pandas dataframe with the column 'sentence' containing each sentence
pdf_df = pd.DataFrame(sentences, columns=['Answer'])
pdf_df

Unnamed: 0,Answer
0,Amazon Textract Amazon Textract is a service t...
1,Amazon Textract goes beyond simple optical cha...
2,Many companies today extract data from documen...
3,Rules and workflows for each document and form...
4,"If the form deviates from the rules, the outpu..."
5,Amazon Textract overcomes these challenges by ...
6,With Textract you can quickly automate documen...
7,"Once the information is captured, you can take..."
8,"Additionally, you can create smart search inde..."
9,Use cases Create smart search indexes Extract ...


In [162]:
df_sentences.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   sentence  17 non-null     object
dtypes: object(1)
memory usage: 264.0+ bytes


<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Pinecone Database

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Initializing Database

You will need to make a free Pinecone account at: https://www.pinecone.io

Copy/paste your Pinecone account API key into "YOUR_API_KEY" and the environment in "YOUR_ENV"

In [163]:
import pinecone
import os

# add Pinecone API key from app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY") or "YOUR_API_KEY"
# set Pinecone environment - find next to API key in console
env = os.environ.get("PINECONE_ENVIRONMENT") or "YOUR_ENV"

pinecone.init(
    api_key=api_key,
    environment=env
)

In [164]:
pinecone.list_indexes()

['aws-rag-knowledge-base']

In [165]:
import time

index_name = 'aws-rag-knowledge-base'

while not pinecone.describe_index(index_name).status['ready']:
    time.sleep(1)

In [166]:
pinecone.list_indexes()

['aws-rag-knowledge-base']

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Upserting Records

In [167]:
from tqdm.auto import tqdm

batch_size = 2  # can increase but needs larger instance size otherwise instance runs out of memory
vector_limit = 1000

answers = df_knowledge[:vector_limit]
index = pinecone.Index(index_name)

for i in tqdm(range(0, len(answers), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(answers))
    # create IDs batch
    ids = [str(x) for x in range(i, i_end)]
    # create metadata batch
    metadatas = [{'text': text} for text in answers["Answer"][i:i_end]]
    # create embeddings
    texts = answers["Answer"][i:i_end].tolist()
    embeddings = embed_docs(texts)
    # create records list for upsert
    records = zip(ids, embeddings, metadatas)
    # upsert to Pinecone
    index.upsert(vectors=records)

100%|██████████| 77/77 [00:26<00:00,  2.85it/s]


Lets get a quick look at our index. The following code returns 4 metrics:
- `dimesnion`: The dimensionality of the vectors stored in the index
- `index_fullness`: A value between 1-0 indicating how much of the index is being used. 0.00154 = 0.154%
- `namespaces`: Namespaces are a way to segment/categorize stored vectors in the index. There is currently the empty '' default namespace with 154 records in it
- `total_vector_count`: Total # of stored vectors in the database

In [168]:
# check number of records in the index
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00171,
 'namespaces': {'': {'vector_count': 171}},
 'total_vector_count': 171}

Now we will do the same thing for the PDF df we created.

In [169]:
answers = pdf_df[:vector_limit]
index = pinecone.Index(index_name)

for i in tqdm(range(0, len(answers), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(answers))
    # create IDs batch
    ids = [str(x) + 'pdf' for x in range(i, i_end)]
    # create metadata batch
    metadatas = [{'text': text} for text in answers["Answer"][i:i_end]]
    # create embeddings
    texts = answers["Answer"][i:i_end].tolist()
    embeddings = embed_docs(texts)
    # create records list for upsert
    records = zip(ids, embeddings, metadatas)
    # upsert to Pinecone
    index.upsert(vectors=records)

100%|██████████| 9/9 [00:01<00:00,  6.24it/s]


If the upsertion was successful, we should now have 171 records in the index

In [170]:
# check number of records in the index
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00171,
 'namespaces': {'': {'vector_count': 171}},
 'total_vector_count': 171}

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Retrieval Augmented Generation

<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Retrieving Context From Database

In [171]:
question

'Which instances can I use with Managed Spot Training in SageMaker?'

In [172]:
# extract embeddings for the questions
query_vec = embed_docs(question)[0]

# query pinecone
res = index.query(query_vec, top_k=5, include_metadata=True)

# show the results
res

{'matches': [{'id': '90',
              'metadata': {'text': 'Managed Spot Training can be used with all '
                                   'instances supported in Amazon '
                                   'SageMaker.\r\n'},
              'score': 0.881003916,
              'values': []},
             {'id': '91',
              'metadata': {'text': 'Managed Spot Training is supported in all '
                                   'AWS Regions where Amazon SageMaker is '
                                   'currently available.\r\n'},
              'score': 0.799601316,
              'values': []},
             {'id': '85',
              'metadata': {'text': 'You enable the Managed Spot Training '
                                   'option when submitting your training jobs '
                                   'and you also specify how long you want to '
                                   'wait for Spot capacity. Amazon SageMaker '
                                   'will then use Amazo

In [173]:
# Build list from text fields of database response 
contexts = [match.metadata['text'] for match in res.matches]

The block below concatenates as many returned texts as possible while not exceeding the character threshold (1000 characters). Two things to note:
- This values completed sentences over maximizing the character limit for overall context coherence.
- `max_selection_len` is a self-imposed limit to optimize the LLM response (with regard to the passed context). This is based on the general observation in LLM's that passing large context lengths results in the middle of the context being overlooked, essentially passing more context information than the model can handle which negatively impacts performance.

In [174]:
max_section_len = 1000
separator = "\n"

def construct_context(contexts: List[str]) -> str:
    chosen_sections = []
    chosen_sections_len = 0

    for text in contexts:
        text = text.strip()
        # Add contexts until we run out of space.
        chosen_sections_len += len(text) + 2
        if chosen_sections_len > max_section_len:
            break
        chosen_sections.append(text)
    concatenated_doc = separator.join(chosen_sections)
    print(
        f"With maximum sequence length {max_section_len}, selected top {len(chosen_sections)} document sections: \n{concatenated_doc}"
    )
    return concatenated_doc

In [175]:
context_str = construct_context(contexts=contexts)

With maximum sequence length 1000, selected top 4 document sections: 
Managed Spot Training can be used with all instances supported in Amazon SageMaker.
Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.
You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training jobs, both while they are running and while they are waiting for capacity.
Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.


In [176]:
text_input = prompt_template.replace("{context}", context_str).replace("{question}", question)

out = llm.predict({"inputs": text_input})
generated_text = out[0]["generated_text"]
print(f"[Input]: {question}\n[Output]: {generated_text}")

[Input]: Which instances can I use with Managed Spot Training in SageMaker?
[Output]: all instances supported in Amazon SageMaker


<div style="background-color:darkblue; color:white; padding:1px; font-size:20px">
Bringing it All Together

The function below brings together the steps outlined in the green path of the project overview.

<img src="./images/query_overview.png" width="700" />

In [177]:
def rag_query(question: str) -> str:
    # create query vec
    query_vec = embed_docs(question)[0]
    # query pinecone
    res = index.query(query_vec, top_k=5, include_metadata=True)
    # get contexts
    contexts = [match.metadata['text'] for match in res.matches]
    # build the multiple contexts string
    context_str = construct_context(contexts=contexts)
    # create our retrieval augmented prompt
    text_input = prompt_template.replace("{context}", context_str).replace("{question}", question)
    # make prediction
    out = llm.predict({"inputs": text_input})
    return out[0]["generated_text"]

In [178]:
rag_query("Which instances can I use with Managed Spot Training in SageMaker?")

With maximum sequence length 1000, selected top 4 document sections: 
Managed Spot Training can be used with all instances supported in Amazon SageMaker.
Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.
You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training jobs, both while they are running and while they are waiting for capacity.
Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.


'all instances supported in Amazon SageMaker'

In [179]:
rag_query("How do I create a Hugging Face instance on Sagemaker?")

With maximum sequence length 1000, selected top 1 document sections: 
To get started with Amazon SageMaker Edge Manager, you need to compile and package your trained ML models in the cloud, register your devices, and prepare your devices with the SageMaker Edge Manager SDK. To prepare your model for deployment, SageMaker Edge Manager uses SageMaker Neo to compile your model for your target edge hardware. Once a model is compiled, SageMaker Edge Manager signs the model with an AWS generated key, then packages the model with its runtime and your necessary credentials to get it ready for deployment. On the device side, you register your device with SageMaker Edge Manager, download the SageMaker Edge Manager SDK, and then follow the instructions to install the SageMaker Edge Manager agent on your devices. The tutorial notebook provides a step-by-step example of how you can prepare the models and connect your models on edge devices with SageMaker Edge Manager.


"I don't know"

In [180]:
rag_query("Does Amazon textract just do OCR?")

With maximum sequence length 1000, selected top 5 document sections: 
Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables
Amazon Textract Amazon Textract is a service that automatically extracts text and data from scanned documents
Amazon Textract overcomes these challenges by using machine learning to instantly "read" virtually any type of document to accurately extract text and data without the need for any manual effort or custom code
Many companies today extract data from documents and forms through manual data entry that's slow and expensive or through simple optical character recognition (OCR) software that is difficult to customize
Maintain compliance in document archives Because Amazon Textract identifies data types and form labels automatically, it's easy to maintain compliance with information controls


'No'

In [181]:
rag_query("Where did I leave my AWS Textract this morning?")

With maximum sequence length 1000, selected top 4 document sections: 
Amazon Textract Amazon Textract is a service that automatically extracts text and data from scanned documents
" Build automated document processing workflows Amazon Textract can provide the inputs required to automatically process forms without human intervention
Amazon Textract overcomes these challenges by using machine learning to instantly "read" virtually any type of document to accurately extract text and data without the need for any manual effort or custom code
Once a Managed Spot Training job is completed, you can see the savings in the AWS Management Console and also calculate the cost savings as the percentage difference between the duration for which the training job ran and the duration for which you were billed. Regardless of how many times your Managed Spot Training jobs are interrupted, you are charged only once for the duration for which the data was downloaded.


"I don't know"

<div style="background-color:teal; color:white; padding:10px; font-size:20px">
Clean-up Resources

In [182]:
# Delete LLM Model Endpoint
llm.delete_endpoint()

In [183]:
# Delete Embedding Model Endpoint
encoder.delete_endpoint()

You can verify endpoint deletion under 'Inference' -> 'Endpoints' in the SageMaker UI