This SDK tool provides some helper functions to allow you to create and deploy custom models with ease

Let's say we want to serve a [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) with [Instill Model](https://github.com/instill-ai/model)

1. First we need to create a file structure like the following

```bash
.
├── README.md
└── sd_turbo                <=== your model name
    └── 1                   <=== your model version
        ├── model.py        <=== your model file
        ├── ray_pb2.py
        ├── ray_pb2.pyi
        ├── ray_pb2_grpc.py
        └── sd_turbo        <=== model weights and dependecy folder clone from huggingface (remember to follow the LICENSE of each model)
```

Within the `README.md` you will have to put in the info about the model inbetween the `---` section, and a brief intro down below. For example
```
---
Task: TextToImage
Tags:
  - TextToImage
  - Text-To-Image
  - Diffusion
---

# Model-SD-Turbo

🔥🔥🔥 Deploy [Stable Diffusion Turbo](https://huggingface.co/stabilityai/sd-turbo)

```

2. Then we put the 3 proto definition files inside the `./{model_name}/{version}` folder, you can find them [here](https://github.com/instill-ai/model-backend/tree/main/assets/ray/proto), we are working to avoid this step in the future.
3. Now we can `git clone` the dependencies from huggingface, with git lfs.
```
git lfs install
git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6 tinyllama
```
4. Next, we start writting our model file, which with the help of the SDK, is relatively similar to what you would expect when developing in your local environment.

In [None]:
# import neccessary packages
import numpy as np
import random
import torch
import ray
from ray import serve
from diffusers import DiffusionPipeline

# import SDK helper functions
# const package hosts the standard Datatypes and Input class for each standard Instill AI Tasks
from instill.helpers.const import DataType, TextToImageInput
# ray_io package hosts the parsers to easily convert request payload into input paramaters, and model outputs to response
from instill.helpers.ray_io import StandardTaskIO
# ray_config package hosts the config for the model resource
from instill.helpers.ray_config import (
    InstillRayModelConfig,
    entry,
)
# ray_pb2 is the proto definition of the grpc request/response
from ray_pb2 import (
    ModelReadyRequest,
    ModelReadyResponse,
    ModelMetadataRequest,
    ModelMetadataResponse,
    ModelInferRequest,
    ModelInferResponse,
    InferTensor,
)

# use Ray's serve.deployment decorator to convert the model class to servable model
@serve.deployment()
class SDTurbo:

    # within the __init__ function, setup the model instance with the desired framework, in this
    # case is the pipeline from transformers
    def __init__(self, model_path: str):
        self.pipeline = DiffusionPipeline.from_pretrained(model_path)

    # ModelMetadata tells the server what inputs the model is expecting
    # It will be standard for the same AI task
    def ModelMetadata(self, req: ModelMetadataRequest) -> ModelMetadataResponse:
        resp = ModelMetadataResponse(
            name=req.name,
            versions=req.version,
            framework="python",
            inputs=[
                ModelMetadataResponse.TensorMetadata(
                    name="prompt",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="negative_prompt",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="samples",
                    datatype=str(DataType.TYPE_UINT32.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="scheduler",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[-1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="steps",
                    datatype=str(DataType.TYPE_UINT32.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="guidance_scale",
                    datatype=str(DataType.TYPE_FP32.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="seed",
                    datatype=str(DataType.TYPE_UINT64.name),
                    shape=[1],
                ),
                ModelMetadataResponse.TensorMetadata(
                    name="extra_params",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
            ],
            outputs=[
                ModelMetadataResponse.TensorMetadata(
                    name="images",
                    datatype=str(DataType.TYPE_FP32.name),
                    shape=[-1, -1, -1, -1],
                ),
            ],
        )
        return resp

    # ModelReady is the healthcheck method for the server
    # implement your own logic and it will reflect on the console
    def ModelReady(self, req: ModelReadyRequest) -> ModelReadyResponse:
        resp = ModelReadyResponse(ready=True)
        return resp

    # ModelInfer is the method handling the trigger request from Instill Model
    async def ModelInfer(self, request: ModelInferRequest) -> ModelInferResponse:
        # prepare the response
        resp = ModelInferResponse(
            model_name=request.model_name,
            model_version=request.model_version,
            outputs=[],
            raw_output_contents=[],
        )

        # use StandardTaskIO package to parse the request and get the corresponding input
        # for text-to-image task
        task_text_to_image_input: TextToImageInput = (
            StandardTaskIO.parse_task_text_to_image_input(request=request)
        )

        # prepare prompt with chat template
        prompt = self.pipeline.tokenizer.apply_chat_template(
            task_text_generation_chat_input.conversation,
            tokenize=False,
            add_generation_prompt=True,
        )

        # inference
        image = self.pipeline(
            prompt=task_text_to_image_input.prompt,
            negative_prompt=task_text_to_image_input.negative_prompt,
            do_sample=True,
            num_images_per_prompt=task_text_to_image_input.samples,
            guidance_scale=task_text_to_image_input.guidance_scale,
            num_inference_steps=task_text_to_image_input.steps,
            **task_text_to_image_input.extra_params,
        ).images[0]

        # convert the output into response output with again the StandardTaskIO
        task_text_to_image_output = StandardTaskIO.parse_task_text_to_image_output(
            image=image
        )

        # specify the output dimension
        resp.outputs.append(
            InferTensor(
                name="text",
                shape=[1, 1, -1, -1],
                datatype=str(DataType.TYPE_FP32),
            )
        )

        # finally insert the output into the response
        resp.raw_output_contents.append(task_text_to_image_output)

        return resp

# global deploy_model is necessary for server to trigger the deployment
# basically nothing needs to be changed here, we are working on to remove
# the neccessity of this method as well
def deploy_model(model_config: InstillRayModelConfig):
    c_app = SDTurbo.options(
        name=model_config.application_name,
        ray_actor_options=model_config.ray_actor_options,
        max_concurrent_queries=model_config.max_concurrent_queries,
        autoscaling_config=model_config.ray_autoscaling_options,
    ).bind(model_config.model_path)

    serve.run(
        c_app, name=model_config.model_name, route_prefix=model_config.route_prefix
    )

# global deploy_model is necessary for server to trigger the undeployment
def undeploy_model(model_name: str):
    serve.delete(model_name)


if __name__ == "__main__":
    # the value passed into entry() needs to match the dependecy folder/file name
    func, model_config = entry("sd_turbo")

    ray.init(address=model_config.ray_addr)

    # setup how many resources the model needs
    # this heavily depends on your machine
    # can reference `ray_actor_options` from Ray Serve
    model_config.ray_actor_options["num_cpus"] = 6

    if func == "deploy":
        deploy_model(model_config=model_config)
    elif func == "undeploy":
        undeploy_model(model_name=model_config.model_name)


5. Finally, we can pack it up and serve it on `Instill Model`! Simply
```bash
zip -r "sd-turbo.zip" .
```
Or alternatively, if you have a LFS server or DVC bucket setup somewhere, you can also push the files along with the `.dvc` or lfs files onto github, and use our github import.

Now go to `Model Hub` page on Instill console and create a model from local with this zip, and profit!