# SageMaker Code Generation with Code-Llama

#### Importing sys and other important libraries: Lanchain, Chromadb as our vectordb to store indexes and boto3 for our environment

In [3]:
import sys
!{sys.executable} -m pip install langchain
!{sys.executable} -m pip install chromadb
!{sys.executable} -m pip install --upgrade boto3

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### Import other libraries and document loaders as well as libraries like the recursive character splitting to be able to efficiently generate code through our model

In [4]:
import argparse
import os
from langchain.document_loaders import DirectoryLoader
import chromadb
import json
import boto3
import time
import glob
from langchain.text_splitter import (
    RecursiveCharacterTextSplitter,
    Language,
)
import ast
import sys

### Deploy the code LLaMa 7b mode


In [5]:
model_id = "meta-textgeneration-llama-codellama-7b"

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id)
predictor = model.deploy()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
-------------------!

In [8]:
# Get the name of the endpoint
endpoint_name = str(predictor.endpoint)

print(endpoint_name)

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


meta-textgeneration-llama-codellama-7b-2023-10-19-01-05-43-652


In [9]:
def query_endpoint(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload).encode('utf-8'),
        CustomAttributes="accept_eula=true",
    )
    response = response["Body"].read().decode("utf8")
    response = json.loads(response)
    return response

### Supported parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.
***

## Code completion without context
***
This section demonstrate how to perform code generation where the expected endpoint response is the natural continuation of the prompt. No context is provided to. As seen below the LLM hallucinates when providing the continuation of the code because it has not been trained on the library used to test
***

In [10]:
def print_completion(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Input{unbold}\n{prompt}{bold}\n> Output{unbold}\n{response['generated_text']}\n")

In [16]:
%%time

prompt = """\
import sagemaker

# Create an HTML page about Amazon SageMaker
html_content = f'''
<!DOCTYPE html>
<html>
<head>
    <title>Amazon SageMaker</title>
</head>
<body>
    <h1>Welcome to Amazon SageMaker</h1>
    <p>Amazon SageMaker is a fully managed service for building, training, and deploying machine learning models.</p>
    <h2>Key Features</h2>
    <ul>
        <li>Easy to use</li>
        <li>Scalable</li>
        <li>End-to-end machine learning workflow</li>
    </ul>
    <p>Get started with SageMaker today and unlock the power of machine learning!</p>
</body>
</html>
'''

html_content
"""

payload = {"inputs": prompt, "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}}
response = query_endpoint(payload)
print_completion(prompt, response)

[1m> Input[0m
import sagemaker

# Create an HTML page about Amazon SageMaker
html_content = f'''
<!DOCTYPE html>
<html>
<head>
    <title>Amazon SageMaker</title>
</head>
<body>
    <h1>Welcome to Amazon SageMaker</h1>
    <p>Amazon SageMaker is a fully managed service for building, training, and deploying machine learning models.</p>
    <h2>Key Features</h2>
    <ul>
        <li>Easy to use</li>
        <li>Scalable</li>
        <li>End-to-end machine learning workflow</li>
    </ul>
    <p>Get started with SageMaker today and unlock the power of machine learning!</p>
</body>
</html>
'''

html_content
[1m
> Output[0m

# Create a SageMaker client
sagemaker_client = sagemaker.SageMakerClient()

# Create a SageMaker role
role = sagemaker.get_execution_role()

# Create a SageMaker notebook instance
sagemaker_notebook_instance = sagemaker.notebook_instance.NotebookInstance(
    role=role,
    instance_type='ml.t2.medium',
    instance_count=1,
    volume_size_in_gb=5,
    volume_kms_k

## Code generation with context using RAG

This section demonstrate how to perform code generation by providing context to the model using a RAG approach to attempt generating a more accurate result

#### Helper variables: Such as chroma database directory and root directory of context code, embedding model etc.

In [23]:
txtdir = "sagemaker"
chrdir = "chroma"
embedding_model = 'amazon.titan-embed-text-v1'
chroma_client = chromadb.PersistentClient(chrdir)

#### Helper functions: Such as for creating embeddings, getting context, formatting prompt etc.

In [24]:
from typing import Dict, List

def class_list(filename:str):
    with open(filename,"r") as f:
        file_raw = f.read()
   
   # Convert the loaded file into an Abstract Syntax Tree
    file_ast = ast.parse(file_raw)
    cnames = []

   # Walk every node in the tree
    for node in ast.walk(file_ast):
        if isinstance(node,ast.ClassDef):
            cnames.append(node.name)
            
    return cnames

def get_embedding(text, modelId, client):
    accept = 'application/json'
    contentType = 'application/json'
    inp = json.dumps({"inputText": text})
    response = client.invoke_model(body=inp, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')
    return embedding

def get_context(prompt, q_filter=None):
    print(f"Creating embedding for question")

    bedrock = boto3.client(
        service_name='bedrock',
        region_name='us-east-1'
    )
    
    collection = chroma_client.get_collection(name="pyrag")
    embedding = get_embedding(prompt, embedding_model, bedrock_runtime)
    if q_filter is None:
        q_embed = collection.query(query_embeddings = embedding, n_results=3)
    else:
        q_embed = collection.query(query_embeddings = embedding, n_results=3, where=q_filter)
    context_docs = q_embed['documents'][0]
    print(f"Found {len(context_docs)} context docs")
    context = "\n".join(context_docs)
    
    return context


def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
    """Format instructions for CodeLlama.
    
    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and 
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if instructions[0]["role"] == "system":
        content = "".join(["<<SYS>>\n", instructions[0]["content"], "\n<</SYS>>\n\n", instructions[1]["content"]])
        instructions = [{"role": instructions[1]["role"], "content": content}] + instructions[2:]

    for user, answer in zip(instructions[::2], instructions[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])

    prompt.extend(["<s>", "[INST] ", (instructions[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)


def print_instructions(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Output{unbold}\n{response['generated_text']}\n")
    # print(f"{bold}> Input{unbold}\n{prompt}\n\n{bold}> Output{unbold}\n{response['generated_text']}\n")

#### Create Embedding of all context code

In [37]:
print(f"Splitting python files in {txtdir}")
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=5000, chunk_overlap=0
)
texts = []
metadatas = []
txtdir_len = len(txtdir) + 1
for filename in glob.iglob(os.path.join(txtdir, '**/*.py'), recursive=True):
    sub_dir, sub_file = os.path.split(filename[txtdir_len:])
    mname = sub_file[:-3]
    parent_module = sub_dir.split("/")[-1]
    with open(filename, 'r') as IF:
        doc_lines = IF.readlines()
        doc_text = "".join(doc_lines)
    texts.append(doc_text)
    cnames = class_list(filename)
    if len(cnames) > 0:
        metadatas.append({'module': mname, 'module': parent_module, 'class': cnames[0]})
    else:
        metadatas.append({'module': mname, 'module': parent_module})

python_docs = python_splitter.create_documents(texts, metadatas)
print(f"Creating embeddings")
os.makedirs(chrdir, exist_ok=True)
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

collection = chroma_client.get_or_create_collection(name="pyrag")
cnt = 0
for t in python_docs:
    embedding = get_embedding(t.page_content, embedding_model, bedrock_runtime)
    collection.add(
        embeddings=embedding,
        documents=t.page_content,
        ids=f"id{cnt}",
        metadatas=t.metadata
    )
    cnt = cnt + 1
    time.sleep(1)
print(f"Embeddings created")


Splitting python files in sagemaker
Creating embeddings
Embeddings created


### Instruction Prompt with RAG

In this Pattern, a descriptive instruction is passed to the model to generate code

In [38]:
prompt="""Write a code that uses sagemaker to deploy a code llama model and checks evaluates its performance"""

context = get_context(prompt)


prompt_data = f"""Use the following pieces of related code to respond to the request.

{context}

Request: {prompt}
"""

instructions = [
    {
        "role": "user",
        "content": prompt_data,
    }
]

prompt = format_instructions(instructions)
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 1000, "temperature": 0.2, "top_p": 0.9}
}
response = query_endpoint(payload)
print_instructions(prompt, response)

Creating embedding for question
Found 3 context docs
[1m> Output[0m




































































































































































































































































































































































































































































































































































































































































































































































































































































































































































### Code Completion Prompt with RAG

This pattern demonstrate how to perform code completion where the expected result is the natural continuation of the prompt. Context is provided to the model using a RAG approach

In [34]:
import ast
import re

def extract_imports_create_filter(prompt):
    
    # Extract import statements using regular expression
    import_statements_str = "\n".join(re.findall(r"(import .+)", prompt))

    # Convert the loaded file into an Abstract Syntax Tree
    prompt_ast = ast.parse(import_statements_str)
    modules = []

    # Walk every node in the tree
    for node in ast.walk(prompt_ast):

        # If the node is 'import x', then extract the module names
        if isinstance(node,ast.Import):
            modules.extend([x.name for x in node.names])

        # If the node is 'from x import y', then extract the module name
        #   and check level so we can ignore relative imports
        if isinstance(node,ast.ImportFrom):
            modules.extend([f"{node.module}.{x.name}" for x in node.names])

    # Get all modules including child modules actually being referenced in import statement 
    # Use set to remove duplicates 
    am = list(set([x.split(".")[-1] for x in modules]))
    
    # create query filter: Include documents that has any relevant module in metadata field 'module'
    # https://docs.trychroma.com/usage-guide#using-where-filters
    module_filter = [ {"module": {"$eq": module}} for module in am ]
    q_filter = {"$or" : module_filter}
    
    return q_filter


In [35]:
prompt = """\
from datetime import datetime
import os
import sys

import torch
from peft import 
    LoraConfig,
    get_peft_model,
    get_peft_model_state_dict,
    prepare_model_for_int8_training,
    set_peft_model_state_dict,
\
"""

q_filter = extract_imports_create_filter(prompt)
context = get_context(prompt, q_filter)

prompt_data = f"""
Use the following pieces of related code to complete the code in the request. Provide only code completion, no explanation

{context}

Request: {prompt}
"""

instructions = [
    {
        "role": "user",
        "content": prompt_data,
    }
]

prompt = format_instructions(instructions)
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 1000, "temperature": 0.2, "top_p": 0.9}
}
response = query_endpoint(payload)
print_instructions(prompt, response)

Creating embedding for question
Found 0 context docs
[1m> Output[0m


































































































































































































































































































































































































































































































































































































































































































































































































































































































































































