<a href="https://colab.research.google.com/github/frank-morales2020/generative-ai-on-aws-book/blob/main/01_langchain_llama2_sagemaker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval Augmented Generation (RAG) with LangChain

In this example notebook, you will see how to perform basic Retrieval Augmented Generation (RAG) using a collection of Amazon's Letters to Shareholders to run basic Q&A.

This notebook does not have any specific CPU/GPU requirements, and was built using the `Data Science 3.0 Python 3` kernel.

## Dependencies

Install the dependencies for this example:
- LangChain: Framework for Orchestrating the RAG workflow
- FAISS: In-Memory Vector Database for storing document embeddings
- PyPDF: Python library for processing PDF documents

In [None]:
%pip install langchain==0.0.309 --quiet --root-user-action=ignore
%pip install faiss-cpu==1.7.4 --quiet --root-user-action=ignore
%pip install pypdf==3.15.1 --quiet --root-user-action=ignore

## added by frank morales
%pip install boto3 --quiet --root-user-action=ignore
%pip install colab-env --upgrade --quiet --root-user-action=ignore
%pip install sagemaker --quiet --root-user-action=ignore
%pip install --upgrade urllib3 --quiet --root-user-action=ignore

In [2]:
# https://github.com/generative-ai-on-aws/generative-ai-on-aws/tree/main
#!pip install --upgrade pip

import colab_env
import os

aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
region=os.getenv("region")
output=os.getenv("output")

#print(aws_access_key_id)
#print()
#print(f"aws_access_key_id: ('{aws_access_key_id}')")
#print(f"aws_secret_access_key: ('{aws_secret_access_key}')")
#print()


Mounted at /content/gdrive


## Fetching and Processing the Sample Data

Next, fetch the sample data for this example. This section will download the publicly available Amazon Letters to Shareholders, that are provided yearly as a "Year in Review" of Amazon's business.

This will download the pdfs locally and store them in a `data` directory local to this notebook.

In [3]:
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

metadata = [
    dict(year=2022, source=filenames[0]),
    dict(year=2021, source=filenames[1]),
    dict(year=2020, source=filenames[2]),
    dict(year=2019, source=filenames[3])]

data_root = "./data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)

As a part of Amazon's peculiar culture, the CEO always attaches the original 1997 Letter to Shareholders to the current letter. To reduce the amount of processing necessary, reduce bias towards that year, and improve output, you will use PyPDF to remove those pages from each file and re-save it over the original.

In [4]:
from pypdf import PdfReader, PdfWriter
import glob

local_pdfs = glob.glob(data_root + '*.pdf')

for local_pdf in local_pdfs:
    pdf_reader = PdfReader(local_pdf)
    pdf_writer = PdfWriter()
    for pagenum in range(len(pdf_reader.pages)-3):
        page = pdf_reader.pages[pagenum]
        pdf_writer.add_page(page)

    with open(local_pdf, 'wb') as new_file:
        new_file.seek(0)
        pdf_writer.write(new_file)
        new_file.truncate()


Now that you have clean PDFs to work with, they need to be broken down into manageable pieces so you can provide the most relevant sections to the LLM as part of your RAG workflow. Here, you will iterate over all the documents and break them down into 512 character chunks with an overlap of 100 characters.

The `chunk_size` dictates the size of the documents that will be embedded and stored in the vector database.

The `chunk_overlap` dictates the amount of text that is used from a previous chunk when building the next one. This allows you to maintain some of the context between chunks.

The `RecursiveCharacterTextSplitter` attempts to split up text recursively using delimeters of `["\n\n", "\n", " ", ""]` until achieving the desired chunk size. This attempts to keep paragraphs/sentences/words together to allow for better semantic analysis.

In [5]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]

    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 512,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)

print(f'# of Document Pages {len(documents)}')
print(f'# of Document Chunks: {len(docs)}')

# of Document Pages 25
# of Document Chunks: 299


## Deploy Model for Embedding

In the following sections you will need to deploy a set of ML Models, one for Embeddings and a LLM for Language Generation. This example assumes you are working inside of SageMaker studio, so you can deploy them yourself or through SageMaker Jumpstart.

For these examples, you will use `All MiniLM L6 v2` as the embedding model, and `LLaMa-2-7B-chat` as the LLM for language generation.

__Note:__ If you choose other options, you may have to adjust the `transform_input` and `transform_output` functions in future sections for embedding and llm to match the models you've selected.

Refer to the [SageMaker Jumpstart Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html) for details on how to deploy models via Jumpstart.

If you already have an embedding endpoint deployed, you can skip the following cell, and modify the `embedding_model_endpoint_name` value to match your endpoint.

In [6]:
# https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html
# https://stackoverflow.com/questions/68607118/aws-sagemaker-iam-permission-to-call-get-role

import boto3
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

iam_client = boto3.client("iam")

role = iam_client.get_role(
    RoleName=os.getenv("ROLENAME")
)

ROLE_ARN = role['Role']['Arn']

#print()
#print(f"ROLE_ARN: ('{ROLE_ARN}')")
#print()

#mistral- TBD
#model = JumpStartModel(model_id="huggingface-llm-mistral-7b-instruct")

# Use the IAM role ARN for the `executionRoleArn` parameter.

embedding_model_id, embedding_model_version = "huggingface-textembedding-all-MiniLM-L6-v2", "1.0.0"
model = JumpStartModel(model_id=embedding_model_id, model_version=embedding_model_version, role=ROLE_ARN)#model = JumpStartModel(model_id=embedding_model_id, model_version=embedding_model_version)

embedding_predictor = model.deploy()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
--------!

__Note: running the following cell will deploy a SageMaker endpoint. You will need to delete the endpoint to stop charges from accumulating. See the clean up step at the end of this notebook.__

In [7]:
#this is the model endpoint NAME, not the ARN
embedding_model_endpoint_name = embedding_predictor.endpoint_name

To use your SageMaker model endpoints, you need to have a set of credentials. This section will assume them from your SageMaker Studio session.

In [8]:
import boto3
#aws_region = boto3.Session().region_name
aws_region=region
#aws_region='us-east-1'

## Creating and Populating the Vector Database

Next you need to set up how to process the embeddings for the input documents.

The provided CustomEmbeddingsContentHandler class has a set of functions, transform_input and transform_output, for porcessing data going into and out of the embedding model.

With the content handler defined, you will then use the SageMakerEndpointEmbeddings class from LangChain to create an embeddings object that corresponds to your hosted embeddings model along with the appropriate content handler for processing its inputs/outputs.

In [9]:
from typing import Dict, List
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
import json

class CustomEmbeddingsContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, inputs: list[str], model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"text_inputs": inputs, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> List[List[float]]:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embedding"]


embeddings_content_handler = CustomEmbeddingsContentHandler()

embeddings = SagemakerEndpointEmbeddings(
    endpoint_name=embedding_model_endpoint_name,
    region_name=aws_region,
    content_handler=embeddings_content_handler,
)

With our embeddings references ready, the next step is to actually process those document chunks into vectors and store them somewhere. This example uses a FAISS in-memory vector database, but there are many other options available.

In [10]:
from langchain.schema import Document
from langchain.vectorstores import FAISS

In [11]:
db = FAISS.from_documents(docs, embeddings)

## Running Vector Queries

Now that you have a populated vector database, you can run queries against it to return relevant document chunks.

Start with a simple query that corresponds to the source material.

In [12]:
query = "How has AWS evolved?"

The results that come back from the `similarity_search_with_score` API are sorted by score from lowest to highest. The score value is represented by the [L-squared (or L2)](https://en.wikipedia.org/wiki/Lp_space) distance of each result. Lower scores are better, repesenting a shorter distance between vectors.

In [13]:
results_with_scores = db.similarity_search_with_score(query)
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\nScore: {score}\n\n")

Content: done innovating here,and this long-term investment should prove fruitful for both customers and AWS. AWS is still in the earlystages of its evolution, and has a chance for unusual growth in the next decade.
Metadata: {'year': 2022, 'source': 'AMZN-2022-Shareholder-Letter.pdf'}
Score: 0.5685306191444397


Content: customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launchedin 2022), and invest in long-term inventions that change what’s possible.
Metadata: {'year': 2022, 'source': 'AMZN-2022-Shareholder-Letter.pdf'}
Score: 0.7789842486381531


Content: We had a head start on potential competitors;and if anything, we wanted to accelerate our pace of innovation. We made the long-term decision tocontinue investing in AWS. Fifteen years later, AWS is now an $85B annual revenue run rate business, withstrong profitability, that has transformed how customers from start-ups to multinational companies to publicsector organizations manage the

In [14]:
filter={"year": 2021}

results_with_scores = db.similarity_search_with_score(query,
  filter=filter)

for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\nScore: {score}\n\n")


Content: back and determining what they wanted to change coming out of the pandemic. Many concludedthat they didn’t want to continue managing their technology infrastructure themselves, and made thedecision to accelerate their move to the cloud. This shift by so many companies (along with the economyrecovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.
Metadata: {'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}
Score: 0.7898486852645874


Content: customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
Metadata: {'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}
Score: 0.8196681141853333


Content: AWS : As we were defining AWS and working backwards on the services we thought customers wanted, we
Metadata: {'year': 2021, 'source': 'AMZN-2021-Shareholder-Letter.pdf'}
Score: 0.8815345764160156


Content

## Creating Prompts

You've gotten results from your vector database, but currently they are just chunks of the original documents and some of them might not even contain the information you want to provide as an answer to your original query.

To generate the appropriate response, you will leverage a prompt template that takes the original question asked along with relevant context chunks from your vector database to generate a new response from your language generator model.

LangChain provides functionality to allow for easier creation and population of prompt templates. The template below has specific markup for LLaMa-2-chat, but also has placeholder values for `{context}` and `{question}`, which you will provide to fill out the template.

In [15]:
from langchain.prompts import PromptTemplate

prompt_template = """
<s>[INST] <<SYS>>
Use the context provided to answer the question at the end. If you dont know the answer just say that you don't know, don't try to make up an answer.
<</SYS>>

Context:
----------------
{context}
----------------

Question: {question} [/INST]
"""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

## Preparing the LLM

The next step is a process similar to the one you did earlier for the embedding model, but now for your LLM.

In the QAContentHandler class, you will see `transform_input` and `transform_output` functions to manipulate the inputs and outputs of your LLM.

In [16]:
from typing import Dict

from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA
import json

class QAContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps(
            {"inputs" : [
                [
                    {
                        "role" : "system",
                        "content" : ""
                    },
                    {
                        "role" : "user",
                        "content" : prompt
                    }
                ]],
                "parameters" : {**model_kwargs}
            })
        return input_str.encode('utf-8')

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generation"]["content"]
        #return response_json[0]['generated_text']["content"]

qa_content_handler = QAContentHandler()



Now you will deploy a SageMaker endpoint for language generation LLM. Afterward you will create an object pointed to that endpoint and provide inference parameters to the endpoint and model.

If you already have a LLM endpoint deployed, you can skip the following cell, and modify the `llm_model_endpoint_name` value to match your endpoint.

iam_client = boto3.client("iam")

role = iam_client.get_role(
    RoleName=os.getenv("ROLENAME")
)

ROLE_ARN = role['Role']['Arn']__Note: running the following cell will deploy a SageMaker endpoint which takes a few minutes. You will need to delete the endpoint to stop charges from accumulating. See the clean up step at the end of this notebook.__

In [17]:
import colab_env
import boto3
import os
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

# added by frank morales december 13, 2023
iam_client = boto3.client("iam")

role = iam_client.get_role(
    RoleName=os.getenv("ROLENAME")
)

ROLE_ARN = role['Role']['Arn']

# https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-choose.html#jumpstart-foundation-models-choose-eula

#original
#llm_model_id, llm_model_version = "meta-textgeneration-llama-2-7b-f", "*"
#llm_model = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version)
## error below

#modified by frankmorales
#llm_model_id, llm_model_version = "meta-textgeneration-llama-2-7b-f", "2.*"
llm_model_id = 'meta-textgeneration-llama-2-7b-f'
llm_model_version = '2.0.4'
llm_model = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version, role=ROLE_ARN, region='us-east-1')
#modified by frankmorales
llm_predictor = llm_model.deploy(accept_eula=True)

#original
#llm_predictor = llm_model.deploy()


For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.


----------------!

In [18]:
#this is the model endpoint NAME, not the ARN
llm_model_endpoint_name = llm_predictor.endpoint_name
llm_model_endpoint_name

'meta-textgeneration-llama-2-7b-f-2023-12-28-01-04-31-733'

In [19]:
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
region=os.getenv("region")
output=os.getenv("output")

llm = SagemakerEndpoint(
        endpoint_name=llm_model_endpoint_name,
        region_name=region,
        model_kwargs={"max_new_tokens": 1000, "top_p": 0.9, "temperature": 1e-11},
        endpoint_kwargs={"CustomAttributes": 'accept_eula=true'},
        content_handler=qa_content_handler
    )

You can invoke this LLM object directly to get a baseline response without any contextual information provided. You'll notice the answer to the question `How has AWS evolved?` is more about __what__ AWS has done rather than a more internal take on how AWS has evolved. This is likely due to the corpus of data that the LLM was trained on which contained a large amount of articles from the internet.

Note that this is not a bad response by any stretch, but it might not be the response you're looking for.

You'll see how context can evolve the reponse in a moment.

In [20]:
# added by frank morales dec 18, 2023
# https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-text-completion.ipynb

#original by the book
query = "How has AWS evolved?"

##modified by Frank Morales
response=llm.predict(query)
print(f'Query:', query)
print()
print(f'Response:', response)



Query: How has AWS evolved?

Response:  AWS (Amazon Web Services) has evolved significantly since its launch in 2006. Here are some key milestones and developments in AWS's evolution:

1. 2006: Amazon Web Services (AWS) is launched as a separate business unit within Amazon, offering a limited set of cloud computing services, including Elastic Compute Cloud (EC2), Simple Storage Service (S3), and Simple Queue Service (SQS).
2. 2008: AWS introduces its first virtual private cloud (VPC), allowing customers to launch AWS resources within their own virtual network.
3. 2010: AWS launches its first data center outside of the United States, in Ireland.
4. 2011: AWS introduces the Elastic Block Store (EBS), providing block-level storage volumes for EC2 instances.
5. 2012: AWS launches its CloudFormation service, allowing customers to define and manage their AWS infrastructure using templates.
6. 2013: AWS introduces the Amazon Elastic Container Service (ECS), providing a highly scalable, high-p

With the LLM endpoint object created, you are ready to create your first chain!

This chain is a simple example using LangChain's RetrievalQA chain, which will:
- take a query as input
- generate query embeddings
- query the vector database for relevant document chunks based on the query embedding
- inject the context and original query into the prompt template
- invoke the LLM with the completed prompt
- return the LLM result

The [`stuff` chain type](https://python.langchain.com/docs/modules/chains/document/stuff) simply takes the context documents and inserts them into the prompt.

By setting `return_source_documents` to `True`, the LLM responses will also contain the document chunks from the vector database, to illustrate where the context came from.

In [21]:
# https://stackoverflow.com/questions/77352474/langchain-how-to-get-complete-prompt-retrievalqa-from-chain-type

#### original in the book
qa_chain = RetrievalQA.from_chain_type(
    llm,
    chain_type='stuff',
    retriever=db.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)


Now that your chain is set up, you can supply queries to it and generate responses based on your source documents.

A few examples have been provided.

In [22]:
query = "How has AWS evolved?"

result = qa_chain({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

### error for llm_model_version = '2.*'
#ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (424) from primary with message "{
#  "code":424,
#  "message":"prediction failure",
#  "error":"can only concatenate str (not \"list\") to str"
#}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/meta-textgeneration-llama-2-7b-2023-12-18-21-47-42-330
# in account xxxxxxxxxxxx for more information.

Query: How has AWS evolved?

Result:  Based on the provided context, AWS has evolved in the following ways:

1. Rapid innovation: AWS continues to deliver new capabilities rapidly, launching over 3,300 new features and services in 2022 alone.
2. Long-term investment: AWS has made a long-term decision to continue investing in its infrastructure, even during challenging times such as the 2008-2009 recession.
3. Expansion of services: AWS has expanded its services to cater to a wide range of customers, from start-ups to multinational companies to public sector organizations.
4. Profitability: AWS has achieved strong profitability, with an $85B annual revenue run rate business.
5. Adaptation to customer needs: AWS has adapted to changing customer needs, such as the shift towards cloud computing and the desire to manage technology infrastructure.
6. Acceleration of revenue growth: AWS's revenue growth has accelerated in recent years, with a 37% YoY increase in 2021.

Overall, AWS has evolve

In [23]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    chain_type='stuff',
    retriever=db.as_retriever(
        search_type="mmr", # Maximum Marginal Relevance (MMR)
        search_kwargs={"k": 3, "lambda_mult": 0.9}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

Now that your chain is set up, you can supply queries to it and generate responses based on your source documents.

A few examples have been provided.

In [24]:
query = "How has AWS evolved?"
result = qa_chain({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Query: How has AWS evolved?

Result:  Based on the provided context, AWS has evolved in the following ways:

1. Shift towards cloud computing: Many companies have moved away from managing their technology infrastructure themselves and have accelerated their move to the cloud, which has helped re-accelerate AWS's revenue growth.
2. Innovation: AWS continues to deliver new capabilities rapidly, launching over 3,300 new features and services in 2022, and investing in long-term inventions that change what's possible.
3. Focus on long-term investments: AWS is still in the early stages of its evolution and has a chance for unusual growth in the next decade, indicating a focus on long-term investments.

I don't know the answer to the question "What specific areas of AWS have seen the most growth?" as the context does not provide information on this topic.

Context Documents: 
page_content='done innovating here,and this long-term investment should prove fruitful for both customers and AWS. AWS

In [25]:
query = "Why is Amazon successful?"
query = "What business challenges has Amazon experienced?"

result = qa_chain({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Query: What business challenges has Amazon experienced?

Result:  Based on the provided context, Amazon has experienced the following business challenges:

1. Unusual number of simultaneous challenges in the past year: The context mentions that Amazon faced an unusual number of challenges simultaneously, without providing more details.
2. Dynamic and competitive market segments: Amazon operates in large, dynamic, and global market segments with many capable and well-funded competitors, which can create challenges for the company.
3. Constant change: The context states that in the 25 years Jeff Bezos has been at Amazon, there has been constant change, much of which the company has initiated itself.
4. Difficulty in expanding internationally: The context mentions that expanding internationally and pursuing large retail market segments that are still nascent for Amazon can be challenging.
5. Investing in new areas: The context mentions that Amazon is making investments in areas that are f

In [26]:
query = "What business challenges has Amazon experienced?"
result = qa_chain({"query": query})
print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')

Query: What business challenges has Amazon experienced?

Result:  Based on the provided context, Amazon has experienced the following business challenges:

1. Unusual number of simultaneous challenges in the past year: The context mentions that Amazon faced an unusual number of challenges simultaneously, without providing more details.
2. Operating in a dynamic and competitive market: Amazon operates in large, dynamic, and global market segments with many capable and well-funded competitors, which can create challenges for the company.
3. Constant change: The context states that in the 25 years Jeff Bezos has been at Amazon, there has been constant change, much of which the company has initiated itself.
4. Difficulty in expanding internationally: The context mentions that expanding internationally, pursuing large retail market segments that are still nascent for Amazon, and using unique assets to help merchants sell more effectively on their own websites are somewhat natural extensions

# Clean up

Uncomment the `delete_endpoint` calls to remove the resources you created.

In [27]:

#  Frank Morales created this cell on December 14, 2023; it fully allows automatically the deletion of endpoints, models, and endpoint configurations.

import colab_env
import os

aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
aws_region=os.getenv("AWS_DEFAULT_REGION")
aws_output=os.getenv("AWS_DEFAULT_OUTPUT")

import boto3

sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

def cleanup_sagemaker_resources(resource_name,resourceid):

    if resourceid==0:
       response=sagemaker_client.list_endpoints()
    elif resourceid==1:
         response=sagemaker_client.list_models()
    elif resourceid==2:
         response=sagemaker_client.list_endpoint_configs()

    print(resource_name)

    number_of_endpoints=len(response['%s'%resource_name])
    for i in range(number_of_endpoints):
        resource_nametmp='%s'%resource_name[0:len(resource_name)-1]
        print('%sName'%resource_nametmp)
        print(response['%s'%resource_name][i]['%sName'%resource_nametmp])

        if resourceid==0:
           endpoint_name=response['%s'%resource_name][i]['%sName'%resource_nametmp]
           sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
        elif resourceid==1:
           sagemaker_client.delete_model(ModelName=response['Models'][i]['ModelName'])
        elif resourceid==2:
           sagemaker_client.delete_endpoint_config(EndpointConfigName=response['EndpointConfigs'][i]['EndpointConfigName'])

    print("\n==================================\n")


cleanup_sagemaker_resources('Endpoints',0)
cleanup_sagemaker_resources('Models',1)
cleanup_sagemaker_resources('EndpointConfigs',2)

Endpoints
EndpointName
meta-textgeneration-llama-2-7b-f-2023-12-28-01-04-31-733
EndpointName
hf-textembedding-all-minilm-l6-v2-2023-12-28-00-59-28-407


Models
ModelName
meta-textgeneration-llama-2-7b-f-2023-12-28-01-04-31-731
ModelName
hf-textembedding-all-minilm-l6-v2-2023-12-28-00-59-28-405


EndpointConfigs
EndpointConfigName
meta-textgeneration-llama-2-7b-f-2023-12-28-01-04-31-733
EndpointConfigName
hf-textembedding-all-minilm-l6-v2-2023-12-28-00-59-28-407


