## Run inference on the Llama 2 endpoint you have created.

In [2]:
import json
import boto3

### Supported Parameters

***
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 

**NOTE**: If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

**NOTE**: In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

**NOTE**: This model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...).

***

In [3]:
dialogs = [
    [{"role": "user", "content": "what is the recipe of mayonnaise?"}],
    [
        {"role": "user", "content": "I am going to Paris, what should I see?"},
        {
            "role": "assistant",
            "content": """\
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:

1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.

These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.""",
        },
        {"role": "user", "content": "What is so great about #1?"},
    ],
    [
        {"role": "system", "content": "Always answer with Haiku"},
        {"role": "user", "content": "I am going to Paris, what should I see?"},
    ],
    [
        {
            "role": "system",
            "content": "Always answer with emojis",
        },
        {"role": "user", "content": "How to go from Beijing to NY?"},
    ],
]

### Query endpoint that you have created

---

To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute. 

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

In [4]:
endpoint_name = "jumpstart-dft-meta-textgeneration-llama-2-7b-f"


def query_endpoint(payload):
    client = boto3.client("sagemaker-runtime")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(payload),
        CustomAttributes="accept_eula=true",
    )
    response = response["Body"].read().decode("utf8")
    response = json.loads(response)
    return response

In [5]:
for dialog in dialogs:
    payload = {
        "inputs": [dialog], 
        "parameters": {"max_new_tokens": 256, "top_p": 0.9, "temperature": 0.6}
    }
    result = query_endpoint(payload)[0]
    for msg in dialog:
        print(f"{msg['role'].capitalize()}: {msg['content']}\n")
    print(f"> {result['generation']['role'].capitalize()}: {result['generation']['content']}")
    print("\n==================================\n")

User: what is the recipe of mayonnaise?

> Assistant:  Thank you for your question! Mayonnaise is a popular condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here is a basic recipe for homemade mayonnaise:

Ingredients:

* 2 egg yolks
* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed
* 1 tablespoon (15 ml) vinegar or lemon juice
* 1/2 teaspoon (2.5 ml) salt
* 1/4 teaspoon (1.25 ml) sugar (optional)
* Flavorings, such as mustard powder, dried herbs, or grated ginger (optional)

Instructions:

1. In a small bowl, whisk together the egg yolks and salt until well combined.
2. Slowly pour the oil into the egg yolk mixture while continuously whisking. You can do this by hand with a whisk or use an electric mixer


User: I am going to Paris, what should I see?

Assistant: Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions 

## get_llm

In [175]:
client = boto3.client("sagemaker-runtime")
text = 'Building a website can be done in 10 simple steps'

In [176]:
def get_llm(text):
    dialog = [{"role": "user", "content": text}]
    
    parameters = {
        "max_new_tokens": 256, 
        "top_p": 0.9, 
        "temperature": 0.6
    }   

    payload = {
        "inputs": [dialog], 
        "parameters":parameters
    }
    
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, 
        ContentType='application/json', 
        Body=json.dumps(payload).encode('utf-8'),
        CustomAttributes="accept_eula=true",
    )                

    body = response["Body"].read().decode("utf8")
    body_resp = json.loads(body)
    print(body_resp[0]['generation']['content'])

    return body_resp[0]['generation']['content']

In [178]:
get_llm(text)

 I'm glad you're interested in building a website! However, I must respectfully point out that the statement "Building a website can be done in 10 simple steps" is not entirely accurate. While it is possible to build a basic website with a few simple steps, the process can be much more complex and involve many more steps depending on the scope and complexity of the website.

Here are some more accurate steps to consider when building a website:

1. Define your website's purpose and goals: Before starting to build your website, it's important to determine what purpose it will serve and what goals you want to achieve with it.
2. Choose a domain name and web hosting: Your domain name is the address of your website, while web hosting is the service that allows your website to be viewed on the internet.
3. Plan your website's design and layout: Think about the overall design and layout of your website, including the navigation, colors, and fonts.
4. Create content: Write and gather content 

' I\'m glad you\'re interested in building a website! However, I must respectfully point out that the statement "Building a website can be done in 10 simple steps" is not entirely accurate. While it is possible to build a basic website with a few simple steps, the process can be much more complex and involve many more steps depending on the scope and complexity of the website.\n\nHere are some more accurate steps to consider when building a website:\n\n1. Define your website\'s purpose and goals: Before starting to build your website, it\'s important to determine what purpose it will serve and what goals you want to achieve with it.\n2. Choose a domain name and web hosting: Your domain name is the address of your website, while web hosting is the service that allows your website to be viewed on the internet.\n3. Plan your website\'s design and layout: Think about the overall design and layout of your website, including the navigation, colors, and fonts.\n4. Create content: Write and ga

In [6]:
import json
import boto3

In [29]:
endpoint_name = "jumpstart-dft-meta-textgeneration-llama-2-7b-f"

In [30]:
text = 'Building a website can be done in 10 simple steps'

In [286]:
dialog = [{"role": "user", "content": text}]

In [287]:
payload = {
    "inputs": [dialog], 
    "parameters":{
        "max_new_tokens": 256, 
        "top_p": 0.9, 
        "temperature": 0.6
    }
}

In [295]:
payload = {
    "inputs": "안녕하세요", 
    "parameters":{
        "max_new_tokens": 256, 
        "top_p": 0.9, 
        "temperature": 0.6
    }
}

In [296]:
client = boto3.client("sagemaker-runtime")

In [297]:
response = client.invoke_endpoint(
    EndpointName=endpoint_name, 
    ContentType='application/json', 
    Body=json.dumps(payload).encode('utf-8'),
    CustomAttributes="accept_eula=true",
)                


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (424) from primary with message "{
  "code":424,
  "message":"prediction failure",
  "error":"string indices must be integers"
}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-dft-meta-textgeneration-llama-2-7b-f in account 677146750822 for more information.

In [290]:
print('response:', response)

response: {'ResponseMetadata': {'RequestId': 'bbba00c8-638e-47f6-aa4a-89f7acac4072', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bbba00c8-638e-47f6-aa4a-89f7acac4072', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Mon, 31 Jul 2023 06:17:18 GMT', 'content-type': 'application/json', 'content-length': '1243', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f75294f54f0>}


In [291]:
body = response["Body"].read().decode("utf8")
body_resp = json.loads(body)

In [292]:
body_resp

[{'generation': {'role': 'assistant',
   'content': " I'm glad you're interested in building a website! However, I must respectfully point out that building a website is a complex process that cannot be reduced to 10 simple steps. While it's possible to create a basic website using a website builder or CMS like WordPress, there are many other factors to consider when creating a website, such as:\n\n1. Define your website's purpose and goals\n2. Choose a domain name and web hosting provider\n3. Plan your website's structure and layout\n4. Design your website's visual elements\n5. Write and optimize your website's content\n6. Set up and configure your website's technical aspects\n7. Test and launch your website\n8. Maintain and update your website's content and functionality\n9. Monitor and analyze your website's performance\n10. Continuously improve and update your website to keep up with changing technology and user preferences.\n\nEach of these steps involves a significant amount of p

In [293]:
body_resp[0]['generation']['content']

" I'm glad you're interested in building a website! However, I must respectfully point out that building a website is a complex process that cannot be reduced to 10 simple steps. While it's possible to create a basic website using a website builder or CMS like WordPress, there are many other factors to consider when creating a website, such as:\n\n1. Define your website's purpose and goals\n2. Choose a domain name and web hosting provider\n3. Plan your website's structure and layout\n4. Design your website's visual elements\n5. Write and optimize your website's content\n6. Set up and configure your website's technical aspects\n7. Test and launch your website\n8. Maintain and update your website's content and functionality\n9. Monitor and analyze your website's performance\n10. Continuously improve and update your website to keep up with changing technology and user preferences.\n\nEach of these steps involves a significant amount of planning, design, development, and maintenance work, 

In [294]:
print(body_resp[0]['generation']['content'])

 I'm glad you're interested in building a website! However, I must respectfully point out that building a website is a complex process that cannot be reduced to 10 simple steps. While it's possible to create a basic website using a website builder or CMS like WordPress, there are many other factors to consider when creating a website, such as:

1. Define your website's purpose and goals
2. Choose a domain name and web hosting provider
3. Plan your website's structure and layout
4. Design your website's visual elements
5. Write and optimize your website's content
6. Set up and configure your website's technical aspects
7. Test and launch your website
8. Maintain and update your website's content and functionality
9. Monitor and analyze your website's performance
10. Continuously improve and update your website to keep up with changing technology and user preferences.

Each of these steps involves a significant amount of planning, design, development, and maintenance work, and cannot be 

In [238]:
!pip install langchain --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [274]:
from langchain.llms.sagemaker_endpoint import LLMContentHandler
class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"
    
    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps({
            'inputs': prompt, 
            'parameters': model_kwargs
        })
        return input_str.encode('utf-8')
      
    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generation"]["content"]

In [275]:
content_handler = ContentHandler()

In [276]:
aws_region = boto3.Session().region_name

In [277]:
parameters = {
    "max_new_tokens": 256, 
    "top_p": 0.9, 
    "temperature": 0.6,
}  

In [278]:
from langchain import PromptTemplate, SagemakerEndpoint

In [279]:
llm = SagemakerEndpoint(
    endpoint_name = endpoint_name, 
    region_name = aws_region, 
    model_kwargs = parameters,
    endpoint_kwargs={"CustomAttributes": "accept_eula=true"},
    content_handler = content_handler    
)

In [280]:
llm('who is the president of usa?')

ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (424) from primary with message "{
  "code":424,
  "message":"prediction failure",
  "error":"string indices must be integers"
}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-dft-meta-textgeneration-llama-2-7b-f in account 677146750822 for more information.