# Serve gpt-j-6B on SageMaker with DJLServing using PySDK

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

---

### Update pip package to the latest version

In [1]:
%%bash
pip install -U pip --quiet
pip install -U sagemaker --quiet
pip install -U boto3 --quiet

pip install -U transformers --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.34.16 requires botocore==1.35.16, but you have botocore 1.35.38 which is incompatible.[0m[31m
[0m

### Configure instance type, S3 bucket etc

In [2]:
import sagemaker
from sagemaker.s3 import S3Uploader
from transformers import AutoModel, AutoTokenizer

# Replace with your own settings
instance_type = "ml.p4d.24xlarge"

role = sagemaker.get_execution_role()  # execution role for the endpoint
session = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
region = session._region_name

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### Download model from Hugging Face

Downloading model from Hugging Face hub is time-consuming, it will slow down SageMaker host startup.
We recommend you download the model and upload uncompressed artifacts to S3 bucket to speed up SageMaker startup.

In [3]:
model_id = "cloudyu/Llama-3-70Bx2-MOE"
model_name = "llama-3-70-moe"

model = AutoModel.from_pretrained(model_id)
model.save_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.save_pretrained(model_name)

bucket = session.default_bucket()      # bucket to house artifacts
s3_location = f"s3://{bucket}/djl-serving/model_name"
S3Uploader.upload(model_name, s3_location)

config.json:   0%|          | 0.00/909 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/89.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/52 [00:00<?, ?it/s]

model-00001-of-00052.safetensors:   0%|          | 0.00/4.75G [00:00<?, ?B/s]

model-00002-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00003-of-00052.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00004-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00005-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00006-of-00052.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00007-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00008-of-00052.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00009-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00010-of-00052.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00011-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00012-of-00052.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]



model-00013-of-00052.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

OSError: [Errno 28] No space left on device

For demo purpose, we use gpt-j-6b-model artifacts from our S3 bucket

In [None]:
pretrained_model_location = s3_location
print(f"Pretrained model will be downloaded from ---- > {pretrained_model_location}")

### Deploy the model to SageMaker

In [None]:
from sagemaker.djl_inference import DJLModel

model = DJLModel(
    pretrained_model_location,
    role,
    task="text-generation",
    number_of_partitions=8,
    data_type="fp16",
)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)

### Run inference using your endpoint

In [None]:
data = [
    {"role": "user", "content": "Who are you?"},
]
outputs = predictor.predict(data)
for output in outputs:
    print(output["generated_text"])

### Clean up your resource after testing

In [None]:
# Delete SageMaker endpoint and model
predictor.delete_endpoint()
model.delete_model()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)
