Import pieces from Codeflare-SDK

In [None]:
from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration
from codeflare_sdk.cluster.auth import TokenAuthentication

Create authentication object for user permissions.

IF unused, SDK will automatically check for default kubeconfig, then in-cluster config. 

KubeConfigFileAuthentication can also be used to specify kubeconfig path manually.

In [None]:
auth = TokenAuthentication(
    token = "XXXXX",
    server = "XXXXX",
    skip_tls=False
)
auth.login()

Create a cluster configuration. Change the name, namespace, resource requests as needed. Since the cluster supports InstaScale, in this case it is enabled. If the g4dn.2xlarge instance isn't available, InstaScale will scale them up (given that there is still available quota).

In [None]:
cluster = Cluster(ClusterConfiguration(
    name='gptfttest',
    namespace='default',
    num_workers=1,
    min_cpus=2,
    max_cpus=2,
    min_memory=8,
    max_memory=8,
    num_gpus=1,
    instascale=True, #<---instascale enabled
    machine_types=["m5.xlarge", "g4dn.2xlarge"],
))

Request the Ray cluster and wait until it is ready. Once ready, `cluster.details` will display the details of the Ray cluster with a link to Ray Dashboard.

In [None]:
cluster.up()

In [None]:
cluster.wait_ready()

In [None]:
cluster.details()

Import the necessary library, define argument list for the fine-tuning job where you set the name of the model, the data to perform fine-tuning on and other variables.

In [None]:
from codeflare_sdk.job.jobs import DDPJobDefinition

In [None]:
arg_list = [
    "--model_name_or_path", "gpt2",
    "--dataset_name", "wikitext",
    "--dataset_config_name", "wikitext-2-raw-v1",
    "--per_device_train_batch_size", "2",
    "--per_device_eval_batch_size", "2",
    "--do_train",
    "--do_eval",
    "--output_dir", "/opt/app-root/src,
    "--overwrite_output_dir"
]

Now submit the job into the Ray cluster for fine-tuning. You can check the status of job afterwards and confirm that it changes from QUEUE to RUNNING.

In [None]:
jobdef = DDPJobDefinition(
    name="gpttest",
    script="gpt_og.py",
    script_args=arg_list,
    scheduler_args={"requirements": "requirements_gpt.txt"}
)
job = jobdef.submit(cluster)

In [None]:
job.status()

Retrieve raw log output at anytime with:

In [None]:
job.logs()

View live updates for status, logs, and other information with:

In [None]:
cluster.cluster_dashboard_uri()

Once the fine-tuning process is complete, the model file should show up on the left sidebar (if not, click on the folder icon that will take you to /opt/app-root/stc directory of the container). You would need to follow three steps:
1. Download the model file into your local laptop.
2. Convert the model to caikit format by following instructions outlined [here](https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md).
3. Containerize your model into a MinIO bucket by following instructions outlined [here](https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/create-minio.md). 

Once your converted model is in a MinIO container available at a quay.io registry, continue executing the cells below for serving of the model.

In [None]:
cluster.down()

Let's create a namespace for serving the GPT2 model

In [None]:
!oc create namespace demo-serving
!oc patch smmr/default -n istio-system --type='json' -p="[{'op': 'add', 'path': '/spec/members/-', 'value': \"demo-serving\"}]"

Deploy the MinIO image that contains the GPT-2 model

In [None]:
ACCESS_KEY_ID=admin
SECRET_ACCESS_KEY=password
TEST_NS=demo-serving

!oc apply -f minio.yaml -n $TEST_NS
!oc apply -f minio-secret.yaml -n $TEST_NS
!oc apply -f serviceaccount-minio.yaml -n $TEST_NS

Create a Caikit-TGIS ServingRuntime. By default, it requests 4 CPU and 8 Gi of memory. Adjust as needed.

In [None]:
!oc apply -f caikit-servingruntime.yaml -n $TEST_NS

Deploy the Inference Service. It points to the model located at `/modelmesh-example-models/llm/gpt2` location.

In [None]:
!oc apply -f caikit-isvc-demo.yaml -n $TEST_NS

Ensure that the inference service's `READY` state is `True`.

In [None]:
!oc get isvc/caikit-example-isvc -n demo-serving

Perform inference request with a gRPC call.

In [None]:
!export KSVC_HOSTNAME=$(oc get ksvc caikit-example-isvc-predictor -n demo-serving -o jsonpath='{.status.url}' | cut -d'/' -f3)
!grpcurl -insecure -d '{"text": "This demo is awesome because"}' -H "mm-model-id: gpt2-caikit" ${KSVC_HOSTNAME}:443 caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict

Logout from the OpenShift cluster.

In [None]:
auth.logout()