## AIGC from Papers to Practice Series (1)

# A Look at ChatGPT  &LLM Training Optimization with Amazon SageMaker

### DEMO One: Build apps on open-source GPT-J Model with Amazon SageMaker

Prepared by: Haowen Huang

Feb 16, 2023

The goal of this demo is to help developers with little AI knowledge get started trying out ChatGPT-like technology, based on the open-source GPT-J model. At present it only goes as far as building the applications to get responses from input questions. 

## GPT-J Model Overview

GPT-J is a generative pretrained (GPT) language model and, in terms of its architecture, it’s comparable to popular, private, large language models like Open AI’s GPT-3. It consists of approximately 6 billion parameters and 28 layers, which consist of a feedforward block and a self-attention block. 

Serving GPT-J for inference has much lower memory requirements—in FP16, model weights occupy less than 13 GB, which means that inference can easily be conducted on a single 16 GB GPU. 

### Step 1: Start the SageMaker Instance

1/ In AWS console, go to SageMaker and then click Notebook on the left.

2/ Ensure you are in US-East-1 because that is where the S3 bucket is.

3/ You'll need to create a new notebook instance and start it.

4/ I used a ml.m5.4xlarge instance size. but remember I'm only using it for several minutes or hours and then you can stop the 2 instances (notebook and endpoint). 

5/ Click Open Jupyter (or Open JupyterLab)to launch the web interface.

6/ Click New, and choose the following notebook type: conda_amazonei_pytorch_latest_p37

### Step 2: Add Codes to Your Jupyter Notebook

1/ Please update SageMaker Python SDK

2/ Import HuggingFaceModel

3/ Define the IAM role with permissions to create endpoint

4/ Define the public S3 URI to GPT-J artifact

5/ Create the Hugging Face model class

6/ Deploy model to SageMaker Inference

In [None]:
!pip install -U sagemaker

In [None]:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker

# IAM role with permissions to create endpoint
role = sagemaker.get_execution_role()

# public S3 URI to gpt-j artifact
model_uri="s3://huggingface-sagemaker-models/transformers/4.12.3/pytorch/1.9.1/gpt-j/model.tar.gz"

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=model_uri,
    transformers_version='4.12.3',
    pytorch_version='1.9.1',
    py_version='py38',
    role=role,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.xlarge' #'ml.p3.2xlarge' # ec2 instance type
)

Then, run all code above. (Use the Run button.) You will see this output: ----------------! Please note that as GPT loads, you will see dashes ("-") added to the output.

Common error: Ensure you are in US-East-1 since that is where the S3 bucket is.

Next add this and run it..

### Step 3: Run the Prediction

Here is the step three: run your own prediction with GPT-J model!

My own experience is that it takes about 10 minutes to complete the deployment of this demo, so we can start predicting. Let's give you 20 minutes to experience the whole process of deploying the GPT-J model on Amazon SageMaker and completing the Q&A reasoning process.


In [None]:
predictor.predict({
    "inputs": "The tallest building in Hong Kong is"
})

In [None]:
predictor.predict({
    "inputs": "The most expensive property in Hong Kong is"
})