This SDK tool provides some helper functions to allow you to create and deploy custom models with ease

Let's say we want to serve a [Tiny-Llama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) with [Instill Model](https://github.com/instill-ai/model)

1. First we need to create a file structure like the following

```bash
.                    <=== your model folder
├── README.md        <=== your model README
├── model.py         <=== your model file
└── tinyllama        <=== model weights and dependecy folder clone from huggingface (remember to follow the LICENSE of each model)
```

Within the `README.md` you will have to put in the info about the model in-between the `---` section, and a brief intro down below. For example
```
---
Task: TextGenerationChat
Tags:
  - TextGenerationChat
  - TinyLlama-1.1B-Chat
---

Learn more about it [here](https://www.instill.tech/docs/latest/model/prepare#model-card-metadata)

# Model-TinyLlama-1.1b-chat-dvc

🔥🔥🔥 Deploy [TinyLlama-1.1B-Chat](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model.
```

2. Now we can `git clone` the dependencies from huggingface, with git lfs.
```
git lfs install
git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 $PROJECT_ROOT/{modelname}/{version}/tinyllama
```
3. Next, we start writing our model file, which with the help of the SDK, is relatively similar to what you would expect when developing in your local environment.

In [None]:
# import neccessary packages
import torch
from transformers import pipeline

# import SDK helper functions
# const package hosts the standard Datatypes and Input class for each standard Instill AI Tasks
from instill.helpers.const import DataType, TextGenerationChatInput

# ray_io package hosts the parsers to easily convert request payload into input paramaters, and model outputs to response
from instill.helpers.ray_io import StandardTaskIO

# ray_config package hosts the decorators and deployment object for model class
from instill.helpers.ray_config import instill_deployment, InstillDeployable
from instill.helpers import (
    construct_infer_response,
    construct_metadata_response,
    Metadata,
)


# use instill_deployment decorator to convert the model class to servable model
@instill_deployment
class TinyLlama:
    # within the __init__ function, setup the model instance with the desired framework, in this
    # case is the pipeline from transformers
    def __init__(self):
        self.pipeline = pipeline(
            "text-generation",
            model="tinyllama",
            torch_dtype=torch.bfloat16,
            device_map="auto",
        )

    # ModelMetadata tells the server what inputs and outputs the model is expecting
    def ModelMetadata(self, req):
        resp = construct_metadata_response(
            req=req,
            inputs=[
                Metadata(
                    name="prompt",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                Metadata(
                    name="prompt_images",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                Metadata(
                    name="chat_history",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                Metadata(
                    name="system_message",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
                Metadata(
                    name="max_new_tokens",
                    datatype=str(DataType.TYPE_UINT32.name),
                    shape=[1],
                ),
                Metadata(
                    name="temperature",
                    datatype=str(DataType.TYPE_FP32.name),
                    shape=[1],
                ),
                Metadata(
                    name="top_k",
                    datatype=str(DataType.TYPE_UINT32.name),
                    shape=[1],
                ),
                Metadata(
                    name="seed",
                    datatype=str(DataType.TYPE_UINT64.name),
                    shape=[1],
                ),
                Metadata(
                    name="extra_params",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[1],
                ),
            ],
            outputs=[
                Metadata(
                    name="text",
                    datatype=str(DataType.TYPE_STRING.name),
                    shape=[-1, -1],
                ),
            ],
        )
        return resp

    # ModelInfer is the method handling the trigger request from Instill Model
    async def __call__(self, request):
        resp_outputs = []
        resp_raw_outputs = []

        # use StandardTaskIO package to parse the request and get the corresponding input
        # for text-generation-chat task
        task_text_generation_chat_input: TextGenerationChatInput = (
            StandardTaskIO.parse_task_text_generation_chat_input(request=request)
        )

        # prepare prompt with chat template
        prompt = self.pipeline.tokenizer.apply_chat_template(
            task_text_generation_chat_input.chat_history,
            tokenize=False,
            add_generation_prompt=True,
        )

        # inference
        sequences = self.pipeline(
            prompt,
            max_new_tokens=task_text_generation_chat_input.max_new_tokens,
            do_sample=True,
            temperature=task_text_generation_chat_input.temperature,
            top_k=task_text_generation_chat_input.top_k,
            top_p=0.95,
        )

        # convert the output into response output with again the StandardTaskIO
        task_text_generation_chat_output = (
            StandardTaskIO.parse_task_text_generation_chat_output(sequences=sequences)
        )

        # specify the output dimension
        resp_outputs.append(
            Metadata(
                name="text",
                shape=[1, len(sequences)],
                datatype=str(DataType.TYPE_STRING),
            )
        )
        # finally insert the output into the response
        resp_raw_outputs.append(task_text_generation_chat_output)

        return construct_infer_response(
            req=request,
            outputs=resp_outputs,
            raw_outputs=resp_raw_outputs,
        )


# now simply declare a global deployable instance with model weight name or model file name
# and specify if this model is going to use GPU or not
deployable = InstillDeployable(
    TinyLlama, model_weight_or_folder_name="tinyllama", use_gpu=True
)

# you can also have a fine-grained control of the min/max replica numbers
deployable.update_max_replicas(2)
deployable.update_min_replicas(0)

# we plan to open up more detailed resource control in the future

5. Finally, we can pack it up and serve it on `Instill Core`! Simply
```bash
zip -r "tiny-llama.zip" .
```
Or alternatively, if you have a LFS server or DVC bucket setup somewhere, you can also push the files along with the `.dvc` or lfs files onto github, and use our github import.

Now go to `Model` page on Instill console and create a model from local with this zip, and profit!

Here is a sample request and response with this model

_req:_
```bash
curl --location 'http://localhost:8080/model/v1alpha/users/admin/models/tinyllama/trigger' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer instill_sk_***' \
--data '{
    "task_inputs": [
        {
            "text_generation_chat": {
                "conversation": [
                    {
                        "role": "user",
                        "content": "is it unhealthy to stay up late?"
                    }
                ],
                "top_k": 5,
                "temperature": 0.7
            }
        }
    ]
}'
```
_resp:_
```json
{
    "task": "TASK_TEXT_GENERATION_CHAT",
    "task_outputs": [
        {
            "text_generation": {
                "text": "<|user|>\nis it unhealthy to stay up late?</s>\n<|assistant|>\nYes, staying up late can be unhealthy. Longer hours of sleep are important for good health and well-being. The body needs time to rest and recover after a long day, and excessive sleep can lead to a range of health problems, including insomnia, obesity, and heart disease. It's essential to set a regular sleep schedule, limit screen time before bedtime, and get enough sleep to avoid sleep-related health issues."
            }
        }
    ]
}
```