# Deploy Fine-tuned GPT-J-6B on SageMaker hosting with Deepspeed


## Set model location 
Here you set the S3 location of the model we fine-tuned in the previous notebook. 



In [None]:
import sagemaker 

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket() # Set a default S3 bucket

account = sagemaker_session.boto_session.client('sts').get_caller_identity()['Account']
region = sagemaker_session.boto_session.region_name


model_s3_uri= f"s3://{bucket}/fine-tune-GPTJ/checkpoint/checkpoint-120/"

In [None]:
!aws s3 ls $model_s3_uri

Remove checkpoints from model artifacts `global_step<StepNumber>`


In [None]:
!aws s3 rm $model_s3_uri/global_step120/ --recursive 

## Prepare docker image

We have a `build.sh` bash script which builds the container and a `push_to_ecr.sh` script that will push the image to ECR.


In [None]:
%%sh
cd ../Deploy_GPTJ/
./build.sh gptj-inference-endpoint

In [None]:
%%sh
cd ../Deploy_GPTJ/
chmod +x push_to_ecr.sh
./push_to_ecr.sh gptj-inference-endpoint

First, this script will push your image to ECR. For reference later, note the address of the repository that the container is pushed to. It should appear below the line `Login Succeeded` in the output from the call to `push_to_ecr.sh`.

# Inference

Now, you can deploy your endpoint as follows:

### Initialize configuration variables

If you run into the error that endpoint already exists on a rerun, please change the model_name and endpoint_name. 

In [None]:
import sagemaker
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor
import time 

role = sagemaker.get_execution_role()

# Specify path to gptj-inference-endpoint image in ECR
image = '{}.dkr.ecr.{}.amazonaws.com/gptj-inference-endpoint:latest'.format(account, region)

# Specify sagemaker model_name
sm_model_name = "gptj-completion-gpu-test"

# Specify endpoint_name
endpoint_name = "gptj-completion-gpu-test"

# Specify instance_type
instance_type = 'ml.g4dn.2xlarge'

# Specify initial_instance_count
initial_instance_count = 1


### 4.2 Initialize endpoint

In [None]:
sm_model = Model(
                        image_uri = image,
                        role = role,
                         env={"S3_MODEL_LOCATION":model_s3_uri},
                        predictor_cls=RealTimePredictor,
                        name = sm_model_name)

predictor = sm_model.deploy(
        instance_type=instance_type,
        initial_instance_count=1,
        endpoint_name = endpoint_name
)

### Query model

To query your endpoint, you can use the code below. 


#### Initialize SageMaker Run-time client 

In [None]:
import boto3
import json 

# Get the boto3 session and sagemaker client, as well as the current execution role
sess = boto3.Session()

# Specify your AWS Region
aws_region=sess.region_name


# Create a low-level client representing Amazon SageMaker Runtime
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name=aws_region)

In [None]:
%%time

text = "love: "

parameters = {
    "do_sample": True,
    "temperature": 0.7,
    "max_new_tokens":200,
    "min_tokens": 100,
    "repetition_penalty": 1.1,
    "top_p": 500,
    }

data = {
    "inputs": {
        "text_inputs": text,
        "parameters": parameters
    }
}


body = json.dumps(data)


response = sagemaker_runtime.invoke_endpoint( 
        EndpointName=endpoint_name, 
        Body = body, 
        ContentType = 'application/json'
)

In [None]:
response