# Deploying Granite Code models in Amazon SageMaker

## Introduction to Granite Code Models

We introduce the Granite series of decoder-only code models for code generative tasks (e.g., fixing bugs, explaining code, documenting code), trained with code written in 116 programming languages. A comprehensive evaluation of the Granite Code model family on diverse tasks demonstrates that our models consistently reach state-of-the-art performance among available open source code LLMs.

The key advantages of Granite Code models include:

- **All-rounder Code LLM**: Granite Code models achieve competitive or state-of-the-art performance on different kinds of code-related tasks, including code generation, explanation, fixing, editing, translation, and more. Demonstrating their ability to solve diverse coding tasks.
- **Trustworthy Enterprise-Grade LLM**: All our models are trained on license-permissible data collected following IBM's AI Ethics principles and guided by IBM’s Corporate Legal team for trustworthy enterprise usage. We release all our Granite Code models under an Apache 2.0 license for research and commercial use.

The family of Granite Code Models comes in two main variants:

- **Granite Code Base Models**: Base foundational models designed for code-related tasks (e.g., code repair, code explanation, code synthesis).
- **Granite Code Instruct Models**: Instruction-following models finetuned using a combination of Git commits paired with human instructions and open source synthetically generated code instruction datasets.

Both base and instruct models are available in sizes of 3B, 8B, 20B, and 34B parameters.

IBM has released the Granite Code models to open source under the permissive Apache 2.0 license, enabling their use for both research and commercial purposes with no restrictions. The models are available on Amazon SageMaker JumpStart, the AWS Marketplace, and on [Hugging Face](https://huggingface.co/ibm-granite).

In this notebook, we will deploy the Granite models on Amazon SageMaker for accelerating legacy code conversion and modernization use cases.

## Pre-requisites

- Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
- *Note*: Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
- Ensure that the IAM role used has **AmazonSageMakerFullAccess**.

## Contents

1. **Deploying Granite Code models in Amazon SageMaker**
    - To subscribe to the model package
    - Select the model package
2. **Create an endpoint and perform real-time inference**
    - Define the endpoint configuration
    - Create the endpoint
3. **Run inference with the model**
    - Example 1: Code Generation
    - Example 2: Code Conversion
    - Example 3: Code Conversion (C to Java)
3. **Clean-up**
    - Delete the endpoint
    - Delete the model    

## Usage Instructions

You can run this notebook one cell at a time by using **Shift+Enter** to run a cell.

## Deploying Granite Code models in Amazon SageMaker

### To subscribe to the model package:

1. Open the model package listing page [IBM Granite 20B Code Instruct - 8K](https://aws.amazon.com/marketplace/pp/prodview-ezh4cr7om23rm)
2. On the AWS Marketplace listing, click on the Continue to subscribe button.
3. On the Subscribe to this software page, review and click on "Accept Offer" if you and your organization agrees with EULA, pricing, and support terms.
4. Once you click on Continue to configuration button and then choose a region, you will see a Product Arn displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

### 1. Select the model package

Confirm that you received this notebook from model catalog on SageMaker AWS Management Console.

In [None]:
model_package_map = {
    "us-east-1": "arn:aws:sagemaker:us-east-1:865070037744:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "us-east-2": "arn:aws:sagemaker:us-east-2:057799348421:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "us-west-1": "arn:aws:sagemaker:us-west-1:382657785993:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "us-west-2": "arn:aws:sagemaker:us-west-2:594846645681:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ca-central-1": "arn:aws:sagemaker:ca-central-1:470592106596:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "eu-central-1": "arn:aws:sagemaker:eu-central-1:446921602837:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "eu-west-1": "arn:aws:sagemaker:eu-west-1:985815980388:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "eu-west-2": "arn:aws:sagemaker:eu-west-2:856760150666:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "eu-west-3": "arn:aws:sagemaker:eu-west-3:843114510376:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "eu-north-1": "arn:aws:sagemaker:eu-north-1:136758871317:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ap-southeast-1": "arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ap-southeast-2": "arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ap-northeast-2": "arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ap-northeast-1": "arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "ap-south-1": "arn:aws:sagemaker:ap-south-1:077584701553:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd",
    "sa-east-1": "arn:aws:sagemaker:sa-east-1:270155090741:model-package/granite-20b-code-instruct-8k-0c483c60841136e9a384e4af2503f5dd"
}

In [None]:
!pip install --upgrade pip
!pip install -U sagemaker -q

In [None]:
import json
import pprint
from datetime import datetime

import boto3
import sagemaker
from sagemaker import ModelPackage, get_execution_role

In [None]:
sagemaker_session = sagemaker.Session()

try:
    execution_role_arn = sagemaker.get_execution_role()
except ValueError:
    execution_role_arn = None

if execution_role_arn == None:
    execution_role_arn = input("Enter your execution role ARN: ")

region = sagemaker_session.boto_region_name
runtime_sm_client = boto3.client("runtime.sagemaker")

print ("execution_role_arn: ", execution_role_arn)
print ("region: ", region)

In [None]:
if region not in model_package_map.keys():
    raise "UNSUPPORTED REGION"

model_package_arn = model_package_map[region]

print ("model_package_arn: ", model_package_arn)

## Create an endpoint and perform real-time inference

In this example, we're deploying IBM  Granite-20B-Code-Instruct-8K on an Amazon SageMaker real-time endpoint hosted on a GPU instance. If you need general information on real-time inference with Amazon SageMaker, please refer to the SageMaker documentation.

For flexibility, you can pick from two sample configurations, depending your use case and the instances types available to you. Please make sure to run just one of the configuration cells below.

The endpoint configuration focuses on cost efficiency. It uses a ml.g5.12xlarge instance. This instance has a four NVIDIA A10G GPU, with 96 GB of GPU RAM. Granite-20B-Code-Instruct-8K is a 20B parameter long-context instruct model fine tuned from Granite-20B-Code-Base-8K on a combination of permissively licensed data used in training the original Granite code instruct models, in addition to synthetically generated code instruction datasets tailored for solving long context problems. By exposing the model to both short and long context data, we aim to enhance its long-context capability without sacrificing code generation performance at short input context.


### 2. Define the endpoint configuration

In [None]:
model_name = "granite-20b-code-instruct-8k"
inference_instance_type = "ml.g5.12xlarge"
model_download_timeout = 3600
health_check_timeout = 900
instance_count = 1

### 3. Create the endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
    role=execution_role_arn, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# create a unique endpoint name
timestamp = "{:%Y-%m-%d-%H-%M-%S}".format(datetime.now())
endpoint_name = f"{model_name}-{timestamp}"
print(f"Deploying endpoint {endpoint_name}")

In [None]:
# deploy the model
deployed_model = model.deploy(
    initial_instance_count=instance_count,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name,
    model_data_download_timeout=model_download_timeout,
    container_startup_health_check_timeout=health_check_timeout,
)


If you have already deployed your model, you can also access it via your chosen endpoint_name and sagemaker_session:

In [None]:
deployed_model = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker_session,
)

SageMaker will now create our endpoint and deploy the model to it. This can takes a 10-15 minutes. Once the endpoint is in service, you will be able to perform real-time inference.

## Run inference with the model

Now that we have the Granite Code model loaded and deployed to a SageMaker endpoint, we can start generating or converting code. We use the predict method from the predictor to run inference on our endpoint. We can inference with different parameters to impact the generation. Parameters can be defined as in the parameters attribute of the payload.

### 4. Example 1: Code Generation

In this example, we want to write a function in the Python programming language that reverses a string.

In [None]:
prompt_1 = """Using the directions below, generate Python code for the specified task.

Question:
# Write a Python function that prints 'Hello World!' string 'n' times.

Answer:
def print_n_times(n):
    for i in range(n):
        print("Hello World!")

<end of code>

Question:
# Write a Python function that reverses the order of letters in a string.
# The function named 'reversed' takes the argument 'my_string', which is a string. It returns the string in reverse order.

Answer:"""

In [None]:
# hyperparameters for llm
payload_1 = {
    "inputs": prompt_1,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 1000,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response_1 = deployed_model.predict(
    data=json.dumps(payload_1),
    initial_args={"Accept": "application/json", "ContentType": "application/json"},
).decode("utf-8")

generated_text_1 = json.loads(response_1)["generated_text"]
print(generated_text_1)


The output contains Python code similar to the following snippet:

```python
    def reverse_string(my_string):
        return my_string[::-1]
```

Be sure to test the generated code to verify that it works as you expect.

For example, if you run `reversed(\"good morning\")`, the result is `gninrom doog`.

### 5. Example 2: Code Conversion

In this example, we want to convert code from one programming language to another. The prompt below converts a code snippet from C++ to Python.

In [None]:
prompt_2 = """
Question:
Translate the following code from C++ to Python.
C++:
#include "bits/stdc++.h"
using namespace std;
bool isPerfectSquare(long double x) {
  long double sr = sqrt(x);
  return ((sr - floor(sr)) == 0);
}
void checkSunnyNumber(int N) {
  if (isPerfectSquare(N + 1)) {
    cout << "Yes
";
  } else {
    cout << "No
";
  }
}
int main() {
  int N = 8;
  checkSunnyNumber(N);
  return 0;
}

Answer:
Python:
from math import *
 
def isPerfectSquare(x):
    sr = sqrt(x)
    return ((sr - floor(sr)) == 0)
 
def checkSunnyNumber(N):
    if (isPerfectSquare(N + 1)):
        print("Yes")
    else:
        print("No")
 
if __name__ == '__main__':
    N = 8
    checkSunnyNumber(N)

<end of code>

Question:
Translate the following code from C++ to Python.
C++:
#include <bits/stdc++.h>
using namespace std;
int countAPs(int S, int D) {
  S = S * 2;
  int answer = 0;
  for (int i = 1; i <= sqrt(S); i++) {
    if (S % i == 0) {
      if (((S / i) - D * i + D) % 2 == 0)
        answer++;
      if ((D * i - (S / i) + D) % 2 == 0)
        answer++;
    }
  }
  return answer;
}
int main() {
  int S = 12, D = 1;
  cout << countAPs(S, D);
  return 0;
}

Answer:
"""

You can send the prompt to the Granite Code model loaded and deployed to the SageMaker endpoint, and adjust the following hyperparameters.

In [None]:
# hyperparameters for llm
payload_2 = {
    "inputs": prompt_2,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 1000,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response_2 = deployed_model.predict(
    data=json.dumps(payload_2),
    initial_args={"Accept": "application/json", "ContentType": "application/json"},
).decode("utf-8")

generated_text_2 = json.loads(response_2)["generated_text"]
print(generated_text_2)


The output contains Python code similar to the following snippet:

```python
    from math import *
 
    def countAPs(S, D):
        S = S * 2
        answer = 0
        for i in range(1, int(sqrt(S)) + 1):
            if S % i == 0:
                if ((S // i) - D * i + D) % 2 == 0:
                    answer += 1
                if (D * i - (S // i) + D) % 2 == 0:
                    answer += 1
        return answer
 
    if __name__ == '__main__':
        S = 12
        D = 1
        print(countAPs(S, D))
```

Be sure to test the generated code to verify that it works as you expect.

### 6. Example 3: Code Conversion (C to Java)

In this example, you want to convert code from one programming language to another. The prompt below converts a code snippet from C to Java.

Specifically, we cover common programming constructs like linked lists and file I/O operations. The C code is converted to Java while preserving the functionality and logic. In the Java code, we utilize classes, objects, and Java-specific APIs like **FileWriter** and **BufferedReader** to achieve similar results as the C code.

- In the first example, the C code implements a singly linked list data structure. It defines a **Node** struct containing an integer data value and a pointer to the next node. The code provides functions to create a new node, add a node to the end of the list, and print the list.

- In the second example, the C code demonstrates how to write data to a file and then read the data back from the file. It opens a file named **"example.txt"** in write mode, writes a string to the file, and closes the file. Then, it opens the same file in read mode, reads the contents into a buffer, and prints the buffer to the console.

In [None]:
prompt_3 = """
Question:
Translate the following code from C to Java.

C Code:

```c
#include <stdio.h>
#include <stdlib.h>

typedef struct Node {
    int data;
    struct Node* next;
} Node;

Node* createNode(int data) {
    Node* newNode = (Node*)malloc(sizeof(Node));
    newNode->data = data;
    newNode->next = NULL;
    return newNode;
}

void addNode(Node** head, int data) {
    Node* newNode = createNode(data);
    if (*head == NULL) {
        *head = newNode;
        return;
    }
    Node* temp = *head;
    while (temp->next != NULL) {
        temp = temp->next;
    }
    temp->next = newNode;
}

void printList(Node* head) {
    Node* temp = head;
    while (temp != NULL) {
        printf("%d ", temp->data);
        temp = temp->next;
    }
    printf("\n");
}

int main() {
    Node* head = NULL;
    addNode(&head, 1);
    addNode(&head, 2);
    addNode(&head, 3);
    printList(head);
    return 0;
}
```

Java Code:

```java
class Node {
    int data;
    Node next;

    Node(int data) {
        this.data = data;
        next = null;
    }
}

class LinkedList {
    Node head;

    void addNode(int data) {
        Node newNode = new Node(data);
        if (head == null) {
            head = newNode;
            return;
        }
        Node temp = head;
        while (temp.next != null) {
            temp = temp.next;
        }
        temp.next = newNode;
    }

    void printList() {
        Node temp = head;
        while (temp != null) {
            System.out.print(temp.data + " ");
            temp = temp.next;
        }
        System.out.println();
    }

    public static void main(String[] args) {
        LinkedList list = new LinkedList();
        list.addNode(1);
        list.addNode(2);
        list.addNode(3);
        list.printList();
    }
}
```

<end of code>

Question:
Translate the following code from C to Java.

C Code:

```c
#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE* file = fopen("example.txt", "w");
    if (file == NULL) {
        printf("Error opening file!\n");
        return 1;
    }

    fprintf(file, "This is an example of writing to a file.\n");
    fclose(file);

    file = fopen("example.txt", "r");
    if (file == NULL) {
        printf("Error opening file!\n");
        return 1;
    }

    char buffer[100];
    while (fgets(buffer, sizeof(buffer), file) != NULL) {
        printf("%s", buffer);
    }

    fclose(file);
    return 0;
}
```
Answer:
"""

In [None]:
# hyperparameters for llm
payload_3 = {
    "inputs": prompt_3,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 1000,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response_3 = deployed_model.predict(
    data=json.dumps(payload_3),
    initial_args={"Accept": "application/json", "ContentType": "application/json"},
).decode("utf-8")

generated_text_3 = json.loads(response_3)["generated_text"]
print(generated_text_3)




The output contains Java code similar to the following snippet:

```java
    import java.io.*;

    public class FileExample {
        public static void main(String[] args) {
            try (FileWriter writer = new FileWriter("example.txt");
                BufferedReader reader = new BufferedReader(new FileReader("example.txt"))) {

                writer.write("This is an example of writing to a file.");

                String line;
                while ((line = reader.readLine())!= null) {
                    System.out.println(line);
                }
            } catch (IOException e) {
                System.out.println("An error occurred: " + e.getMessage());
            }
        }
    }
```

## Clean-up

Please don't forget to run the cells below to delete all resources and avoid unecessary charges.

### 7. Delete the endpoint

In [None]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### 8. Delete the model

In [None]:
model.delete_model()

Thank you for trying out IBM Granite Code Model on SageMaker. We have only scratched the surface of what you can do with this model.

Welcome to your IBM Granite Model support experience! You can view, start, or contribute to community discussions (sign in to contribute). View supplemental resources and  [sign](https://www.ibm.com/mysupport/s/?language=en_US) in to open a new case.

## Would you like to provide feedback?

Please let us know your comments about our family of code models by visiting our collection. Select the repository of the model you would like to provide feedback about. Then, go to Community tab, and click on New discussion. Alternatively, you can also post any questions/comments on our github discussions page.

Do you have an idea for improving a product? Submit it [here](https://www.ibm.com/mysupport/s/topic/0TO5000000026r6GAA/ibm-feedback?language=en_US).