# Basic Prompt Engineering

## Step 1. Prepare Large Language Model (LLM) and Embedding Model 
---

In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append('../utils')
sys.path.append('../templates') 

In [2]:
import time
import sagemaker, boto3, json
import glob
import os
import pandas as pd
import requests
import json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from ssm import parameter_store
from termcolor import colored
from common import get_apigateway_url

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name

RESTAPI_ID, URL = get_apigateway_url()
print("RESTAPI_ID = ", RESTAPI_ID)
print("API GATEWAY URL = ", URL)

RESTAPI_ID =  a6uibyuabj
API GATEWAY URL =  https://a6uibyuabj.execute-api.us-west-2.amazonaws.com/api/


In [3]:
MODEL_NAME = "FALCON-40B" 
#MODEL_NAME = "LLAMA2-7B" 

LLM_INFO = {
    "LLAMA2-7B": f"{URL}llm/llama2_7b", # g5.12xlarge * 4ea
    "FALCON-40B": f"{URL}llm/falcon_40b",    # g5.48xlarge * 8ea 
    "KULLM-12-8B": f"{URL}llm/kkulm_12_8b", # g5.24xlarge * 4ea
}

LLM_URL = LLM_INFO[MODEL_NAME]
EMB_URL = f"{URL}/emb/gptj_6b"             # g5.4xlarge * 4ea 

HEADERS = {    
    'Content-Type': 'application/json',
    'Accept': 'application/json',
}

if 'falcon_40b' in LLM_URL:
    LLM_RESPONSE_KEY = "generated_text"
else:
    LLM_RESPONSE_KEY = "generation"
    
print (f'MODEL_NAME: {MODEL_NAME}\nLLM_URL: {LLM_URL}')    

MODEL_NAME: FALCON-40B
LLM_URL: https://a6uibyuabj.execute-api.us-west-2.amazonaws.com/api/llm/falcon_40b


#### Generative Configuration
<img src="https://huggingface.co/blog/assets/02_how-to-generate/sampling_search_with_temp.png"/>

* **max_new_tokens**: The maximum number of tokens to generate. Default value is 20, max value is 512.
* repetition_penalty: Controls the likelihood of repetition, defaults to null.
* seed: The seed to use for random generation, default is null.
* stop: A list of tokens to stop the generation. The generation will stop when one of the tokens is generated.
* **do_sample**: Whether or not to use sampling ; use greedy decoding otherwise. Default value is false.
* **top_k**: The number of highest probability vocabulary tokens to keep for top-k-filtering. Default value is null, which disables top-k-filtering.
* **top_p**: The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling, default to null
* **temperature**: Controls randomness in the model. Lower values will make the model more deterministic and higher values will make the model more random. Default value is 1.0.
* best_of: Generate best_of sequences and return the one if the highest token logprobs, default to null.
* details: Whether or not to return details about the generation. Default value is false.
* return_full_text: Whether or not to return the full text or only the generated part. Default value is false.
* truncate: Whether or not to truncate the input to the maximum length of the model. Default value is true.
* typical_p: The typical probability of a token. Default value is null.
* watermark: The watermark to use for the generation. Default value is false.

In [11]:
PARAMS = {
    "LLAMA2-7B": {
        'max_new_tokens': 128,
        'top_p': 0.9,
        'temperature': 0.1,
        'return_full_text': False
    },    
    "FALCON-40B": {
        "max_new_tokens": 128,
        "max_length": 256,
        "top_p": 0.95,
        "do_sample": True,
        "temperature": 0.2,
        "return_full_text": False,
        "include_prompt_in_result": False
    } 
}

<br>

## Step 2. Ask a question to LLM without RAG
---
<img src="../images/RAG-Page-1.png"/>

### Simple prompt engineering

In [19]:
from lib_en import Llama2ContentHandlerAmazonAPIGateway, FalconContentHandlerAmazonAPIGateway
from langchain.llms import AmazonAPIGateway

llm = AmazonAPIGateway(api_url=LLM_URL, headers=HEADERS)
if MODEL_NAME == "FALCON-40B": llm.content_handler = FalconContentHandlerAmazonAPIGateway()
elif MODEL_NAME in ["LLAMA2-7B", "LLAMA2-13B"]: llm.content_handler = Llama2ContentHandlerAmazonAPIGateway()
params = PARAMS[MODEL_NAME]
llm.model_kwargs = params

In [10]:
%%time

payload = {
    "inputs": "Generative AI is",
    "parameters": params
}
response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

Generative AI is a type of AI that can create new content or data based on existing patterns or inputs. It can be used in various fields such as text generation, image generation, and music generation. Generative AI can be used to create personalized content, generate new ideas, and improve efficiency in various industries.
CPU times: user 15.3 ms, sys: 3.17 ms, total: 18.5 ms
Wall time: 3.22 s


In [14]:
%%time

payload = {
    "inputs": """A brief email message of Amazon SageMaker's main features

Hi everyone,

We are announcing""",
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

 the launch of Amazon SageMaker, a fully managed platform that enables developers and data scientists to build, train, and deploy machine learning models quickly. Amazon SageMaker removes the heavy lifting from machine learning so you can focus on building the best models for your use case.

Here are some of the main features of Amazon
CPU times: user 18.2 ms, sys: 37 µs, total: 18.3 ms
Wall time: 3.43 s


### More complex prompts: Play the role of AWS SA


In [15]:
architect_prompt_template = """
Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
{requirements}
#Scale:
{scale}
#Features:
{features}

Describe an architecture on AWS in technical detail with sentences.
"""
prompt = architect_prompt_template.format(
    requirements="A website for computer advertising", 
    scale="Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast", 
    features="Landing page describing our product. About page describing the company."
)

In [16]:
payload = {
    'inputs': prompt,
    'parameters': params
}
print(colored(prompt, 'green'))
response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

[32m
Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
A website for computer advertising
#Scale:
Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast
#Features:
Landing page describing our product. About page describing the company.

Describe an architecture on AWS in technical detail with sentences.
[0m

Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
A website for computer advertising
#Scale:
Must handle 10k requests per second in peak. Must be globally available. Must be repon

### Applying LangChain
<img src="../images/RAG-Page-2.png"/>

In [20]:
llm.model_kwargs = params
print(llm(prompt))

The architecture on AWS would consist of the following components:

1. Elastic Load Balancer (ELB) - This would distribute incoming traffic across multiple instances of the application.

2. Auto Scaling Group - This would automatically scale up or down the number of instances based on the load on the application.

3. EC2 instances - These would run the application and handle the incoming requests.

4. RDS (Relational Database Service) - This would store the data for the application.

5. S3 (Simple Storage Service) - This would store


In [21]:
from langchain.prompts import PromptTemplate

# First we can define an exposed parameter interface to the format string
prompt = PromptTemplate(
    input_variables=["requirements", "scale", "features"],
    template=architect_prompt_template,
)

final_prompt = prompt.format(
    requirements="External facing web application written in Javascript, global deployment",
    scale="Average of 500 requests per minute, scale events up to 3000 requests per second",
    features="Mobile website, desktop version, javascript"
)

In [22]:
print(colored(final_prompt, 'green'))
print(llm(final_prompt))

[32m
Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
External facing web application written in Javascript, global deployment
#Scale:
Average of 500 requests per minute, scale events up to 3000 requests per second
#Features:
Mobile website, desktop version, javascript

Describe an architecture on AWS in technical detail with sentences.
[0m
The architecture on AWS would consist of the following components:

1. Elastic Load Balancer (ELB) - This would distribute incoming traffic across multiple instances of the web application.

2. Auto Scaling Group - This would automatically scale up or down the number of instances based on the load on the application.

3. EC2 instances - These would run the web application and handle incoming requests.

4. Elastic Block

In [24]:
topic_recommender_prompt = "List {number} topics to write on blog posts about {topic}"

recommend_topic_prompt = PromptTemplate(
    input_variables=['topic', 'number'],
    template=topic_recommender_prompt    
)

final_prompt = recommend_topic_prompt.format(topic="Machine Learning", number=5)
print(colored(final_prompt, 'green'))
print(llm(final_prompt))

[32mList 5 topics to write on blog posts about Machine Learning[0m

1. Introduction to Machine Learning
2. Supervised Learning
3. Unsupervised Learning
4. Reinforcement Learning
5. Deep Learning


In [25]:
from langchain.output_parsers import CommaSeparatedListOutputParser
parsed_recommender_prompt = topic_recommender_prompt + "\n{format_instructions}"

parser = CommaSeparatedListOutputParser()

parsed_recommender_template = PromptTemplate(
    template=parsed_recommender_prompt,
    input_variables=['topic', 'number'],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [27]:
gen_prompt = parsed_recommender_template.format(topic='Generative AI', number=5)

In [28]:
print(colored(gen_prompt, 'green'))
output = llm(gen_prompt)
print(output)

[32mList 5 topics to write on blog posts about Generative AI
Your response should be a list of comma separated values, eg: `foo, bar, baz`[0m

1. How Generative AI can help in generating creative content
2. How Generative AI can be used in the healthcare industry
3. How Generative AI can be used in the education industry
4. How Generative AI can be used in the finance industry
5. How Generative AI can be used in the entertainment industry


In [29]:
parser.parse(output)

['1. How Generative AI can help in generating creative content\n2. How Generative AI can be used in the healthcare industry\n3. How Generative AI can be used in the education industry\n4. How Generative AI can be used in the finance industry\n5. How Generative AI can be used in the entertainment industry']