# Deploy the Fine-tuned LLM

## Set the OCI Authentication Method

To deploy the model and model deployment to the Oracle Cloud Infrastructure (OCI), users must authenticate using a supported method, for example, API keys or resource principal. By now, you would have configured a dynamic group that includes the OCI Data Science notebook sessions, and a policy that will allow a notebook session to create a model in the model catalog, and deploy the model. Hence, we will use the resource principal method to authenticate to the OCI.

In [None]:
import ads

ads.set_auth(auth='resource_principal')

## Set Variables

The `model_path` is the file system path to where the model was saved to in the previous lab. 

> **IMPORTANT**
> Set the values of the `inference_conda_env_pack_path` and `training_conda_env_pack_path` variables using the pack path of the Conda environment that was deployed to the Object Storage bucket in an earlier lab.

In [None]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

pretrained_model_name = "google-bert/bert-base-uncased"
model_path = "./review_model"
model_display_name = "review_model"
artifact_path = "./review_model_artifact"
python_version = "3.9"
inference_conda_env_pack_path = "oci://<BUCKET_NAME>@<TENANCY_NAMESPACE>/conda_environments/<ARCHITECTURE>/<CONDA_ENV_NAME>/<VERSION>/<CONDA_ENV_SLUG>"
training_conda_env_pack_path = "oci://<BUCKET_NAME>@<TENANCY_NAMESPACE>/conda_environments/<ARCHITECTURE>/<CONDA_ENV_NAME>/<VERSION>/<CONDA_ENV_SLUG>"

## Prepare the Model Artifact

The Oracle Accelerated Data Science (ADS) is a Python package managed by the OCI Data Science service team. It provides utilities for assisting data scientists to work with OCI components and performing common data science tasks. We will use ADS to prepare the model artifacts, registering the model, and then deploying it to the OCI.

In [None]:
from ads.common.model_metadata import UseCaseType
from ads.model.framework.huggingface_model import HuggingFacePipelineModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Instantiate the same tokenizer used for the model training.
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)

# Load the model saved from the previous lab.
model = AutoModelForSequenceClassification.from_pretrained(model_path)

# Create the pipeline.
inference = pipeline(task="sentiment-analysis", model=model, tokenizer=tokenizer)

# Instantiate the ADS HuggingFace pipeline.
huggingface_pipeline_model = HuggingFacePipelineModel(estimator=inference, artifact_dir=artifact_path)

# Prepare the pipeline for saving to the catalog and eventual deployment.
huggingface_pipeline_model.prepare(
    inference_conda_env=inference_conda_env_pack_path,
    inference_python_version=python_version,
    training_conda_env=training_conda_env_pack_path,
    use_case_type=UseCaseType.SENTIMENT_ANALYSIS,
    force_overwrite=True,
)

## Generate a Summary Table of the Model's Status

The summary table shows which methods are available to call and which ones aren’t. Plus it outlines what each method does. If extra actions are required, it also shows those actions.

In [None]:
huggingface_pipeline_model.summary_status()

## Register the Model to the Model Catalog

In [None]:
model_id = huggingface_pipeline_model.save(
    display_name=model_display_name,
    description="Fine-tuned BERT model to automatically label a review."
)

## Validate the Saved Model

Execute the `.introspect()` method to make final checks on the artifacts. All results should show `Passed`.

In [None]:
huggingface_pipeline_model.introspect()

## Deploy the Model and Display the Deployment Endpoint URL

Deploy the registered model. Specify the required compute shape to perform the inference tasks. Optionally, and preferably, create an OCI log group containing two logs for capturing the outputs generated when the model is accessed, and the inference task is performed.

We will print the endpoint URL for invoking the deployed model, however, you may also obtain this through the OCI Console.

```
https://modeldeployment.{region}.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.xxx.xxxxx
```

In [None]:
huggingface_pipeline_model.deploy(
    display_name="LiveLabs_Review_Model",
#     deployment_log_group_id="ocid1.loggroup.oc1.xxx.xxxxx",
#     deployment_access_log_id="ocid1.log.oc1.xxx.xxxxx",
#     deployment_predict_log_id="ocid1.log.oc1.xxx.xxxxx",

    # Shape config details mandatory for flexible shapes:
    deployment_instance_shape="VM.Standard.E4.Flex",
    deployment_ocpus=1,
    deployment_memory_in_gbs=16,
)

print(f"Endpoint: {huggingface_pipeline_model.model_deployment.url}")

## Test the Deployed Model

In this final cell, call the predict endpoint for the model that you have just deployed on the OCI. The only input is the review text, and if successful, a JSON response will be returned containing the assigned label (review rating), and a prediction score.

Example response:

```json
{'prediction': [{'label': 'LABEL_2', 'score': 0.2783530652523041}]}
```

In [None]:
import requests
import json
import oci

review = """
I have been through other brands of individual pods coffee brewing, like Keurig or Kienna, not Tassimo. This 
time I would like to try Nespresso which uses up to 19-bar high pressure to extract concentrated coffee of 
two sizes, espresso and lungo. The delivery was Prime and quick, the product is brand new and made by well 
known quality Breville brand. I have only used it for about 10 times in 4 days and so far, the temperature 
is hotter (preferred) than my Keurig or even the regular Hamilton Beach brew station. The only downside I may 
add would be when I will have to descale the machine using only the proprietary Nespresso descaling liquid 
sold separately. I usually only used white vinegar to clean my coffee drip machines or Keurig. This is part 
of the only Nespresso model ( Original Line, not Virtuo) that may be able to use cheaper compatible pods that 
may go as low as almost half the price of original Nespresso pods.
"""

response = requests.post(
    url=huggingface_pipeline_model.model_deployment.url + "/predict",
    auth = oci.auth.signers.get_resource_principals_signer(),
    json = f"[\"{review}\"]"
) 

assert response.status_code == 200, "Request failed."

print(json.loads(response.content))