# QuickStart по работе с Triton 

Данный ноутбук демонстрирует полный цикл обучения, конвертации модели и запуск инференса Triton.




#### Что такое NVIDIA Triton?
Triton Inference Server оптимизирует вывод ИИ, позволяя командам развертывать, запускать и масштабировать обученные модели ИИ из любой среды в любой инфраструктуре на основе графического процессора или процессора. Это дает исследователям искусственного интеллекта и специалистам по данным свободу выбора правильной платформы для своих проектов, не влияя на производственное развертывание


## Установка зависимостей 

In [1]:
!pip list | grep torch

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
torch                     1.7.1+cu110
torchaudio                0.7.2
torchvision               0.8.2+cu110


In [2]:
!pip install transformers==4.21.2
!pip install torch #==1.7.1 
!pip install datasets==2.4.0



In [3]:
import transformers, torch, datasets
print("transformers", transformers.__version__)
print("torch", torch.__version__)
print("datasets", datasets.__version__)

transformers 4.21.2
torch 1.7.1+cu110
datasets 2.4.0


## Набор данных

В этом примере используется датасет [emotion](https://huggingface.co/datasets/emotion). Этот датасет содержит набор сообщений из Twitter и размечен на 6 эмоций sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5).

In [4]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

dataset = load_dataset("emotion")

2022-09-01 09:30:27.311060: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Using custom data configuration default
Reusing dataset emotion (/home/jovyan/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705)


  0%|          | 0/3 [00:00<?, ?it/s]

## Предобработка 

Этот этап необходим для предобработки текстовых сообщений (конвертации текста в вектор)

In [5]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

  0%|          | 0/16 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

# Обучение

In [6]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer


model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=6)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_clas

In [7]:
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

  if not hasattr(tensorboard, '__version__') or LooseVersion(tensorboard.__version__) < LooseVersion('1.15'):


In [8]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 16000
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 5000


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
500,0.7181
1000,0.248
1500,0.1671
2000,0.1643
2500,0.1139
3000,0.1114
3500,0.0839
4000,0.083
4500,0.0618
5000,0.0572


Saving model checkpoint to ./results/checkpoint-500
Configuration saved in ./results/checkpoint-500/config.json
Model weights saved in ./results/checkpoint-500/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-500/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-500/special_tokens_map.json
Saving model checkpoint to ./results/checkpoint-1000
Configuration saved in ./results/checkpoint-1000/config.json
Model weights saved in ./results/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-1000/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-1000/special_tokens_map.json
Saving model checkpoint to ./results/checkpoint-1500
Configuration saved in ./results/checkpoint-1500/config.json
Model weights saved in ./results/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-1500/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-1500/special_toke

TrainOutput(global_step=5000, training_loss=0.1808741497039795, metrics={'train_runtime': 155.5923, 'train_samples_per_second': 514.164, 'train_steps_per_second': 32.135, 'total_flos': 973613755907712.0, 'train_loss': 0.1808741497039795, 'epoch': 5.0})

# Инференс

Для удобства использования модели в инференсе, можно переименовать параметры с помощью словарей label2id и id2label. Это позволит при выводе результатов, видеть классы.

In [10]:
from transformers import AutoConfig, AutoModelForSequenceClassification

label2id = {
    "sadness": 0,
    "joy": 1,
    "love": 2,
    "anger": 3,
    "fear": 4,
    "surprise": 5
  }
id2label = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
  }
model_ckpt = "./results/checkpoint-5000"
config = AutoConfig.from_pretrained(model_ckpt, label2id=label2id, id2label=id2label)


loading configuration file ./results/checkpoint-5000/config.json
Model config DistilBertConfig {
  "_name_or_path": "./results/checkpoint-5000",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "sadness",
    "1": "joy",
    "2": "love",
    "3": "anger",
    "4": "fear",
    "5": "surprise"
  },
  "initializer_range": 0.02,
  "label2id": {
    "anger": 3,
    "fear": 4,
    "joy": 1,
    "love": 2,
    "sadness": 0,
    "surprise": 5
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.21.2",
  "vocab_size": 30522
}



In [11]:
from transformers import DistilBertForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("./results/checkpoint-5000")
model = AutoModelForSequenceClassification.from_pretrained("./results/checkpoint-5000", config=config)

Didn't find file ./results/checkpoint-5000/added_tokens.json. We won't load it.
loading file ./results/checkpoint-5000/vocab.txt
loading file ./results/checkpoint-5000/tokenizer.json
loading file None
loading file ./results/checkpoint-5000/special_tokens_map.json
loading file ./results/checkpoint-5000/tokenizer_config.json
loading weights file ./results/checkpoint-5000/pytorch_model.bin
All model checkpoint weights were used when initializing DistilBertForSequenceClassification.

All the weights of DistilBertForSequenceClassification were initialized from the model checkpoint at ./results/checkpoint-5000.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.


In [12]:
text = "I am incredibly happy to start using Triton on ML-Space from Cloud.ru"

tensor = tokenizer(text, padding="max_length",  truncation=True, max_length=512, return_tensors="pt")

In [13]:
print("Example output", model(**tensor))

Example output SequenceClassifierOutput(loss=None, logits=tensor([[-1.9722,  7.3201, -2.4923, -1.7046, -2.8764, -1.7970]],
       grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)


In [14]:
logits = model(**tensor).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

'joy'

In [16]:
del config
del model
del tokenizer

## Подготовка модели к инференсу на Triton


Для инференса модели на Triton необходимо PyTorch модель перевести в TorchScript. Для этой конвертации неоходимо показать модели пример входного и выходного вектора

In [17]:
tokenizer = AutoTokenizer.from_pretrained("./results/checkpoint-5000")

tensors = tokenizer(text, padding="max_length",  truncation=True, return_tensors='pt', max_length=512)
example_inputs = tensors['input_ids'], tensors['attention_mask']

Didn't find file ./results/checkpoint-5000/added_tokens.json. We won't load it.
loading file ./results/checkpoint-5000/vocab.txt
loading file ./results/checkpoint-5000/tokenizer.json
loading file None
loading file ./results/checkpoint-5000/special_tokens_map.json
loading file ./results/checkpoint-5000/tokenizer_config.json


In [18]:
import torch

class PyTorch_to_TorchScript(torch.nn.Module):
    def __init__(self):
        super(PyTorch_to_TorchScript, self).__init__()
        self.model = AutoModelForSequenceClassification.from_pretrained("./results/checkpoint-5000")

    def forward(self,data, attention_mask=None):
        return self.model(data, attention_mask)["logits"]

In [19]:
pt_model = PyTorch_to_TorchScript().eval()

loading configuration file ./results/checkpoint-5000/config.json
Model config DistilBertConfig {
  "_name_or_path": "./results/checkpoint-5000",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.21.2",
  "vocab_size": 30522
}

loading weights fil

In [20]:
scripted_model = torch.jit.trace(pt_model, [tensors['input_ids'], tensors['attention_mask']], strict=False)

  position_ids = self.position_ids[:, :seq_length]
  mask, torch.tensor(torch.finfo(scores.dtype).min)


In [21]:
scripted_model.graph

graph(%self.1 : __torch__.___torch_mangle_295.PyTorch_to_TorchScript,
      %input_ids : Long(1:512, 512:1, requires_grad=0, device=cpu),
      %300 : Long(1:512, 512:1, requires_grad=0, device=cpu)):
  %1399 : __torch__.transformers.models.distilbert.modeling_distilbert.___torch_mangle_294.DistilBertForSequenceClassification = prim::GetAttr[name="model"](%self.1)
  %1495 : Tensor = prim::CallMethod[name="forward"](%1399, %input_ids, %300)
  return (%1495)

In [23]:
outs = scripted_model(tensors['input_ids'], tensors['attention_mask'])
outs

tensor([[-1.9722,  7.3201, -2.4923, -1.7046, -2.8764, -1.7970]],
       grad_fn=<AddBackward0>)

In [25]:
import numpy as np

In [26]:
np.argmax(list(outs.detach().numpy()[0]))

1

In [28]:
predicted_class_id = outs.argmax().item()
id2label = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
  }
{"class": id2label[predicted_class_id]}

{'class': 'joy'}

Перед сохранением модели необходимо создать каталог:

```
model_repository_path/
|- <pytorch_model_name>/
|  |- config.pbtxt
|  |- 1/
|     |- model.pt
|
```

Где **pytorch_model_name** - название модели, **config.pbtxt** - конфигурация для Triton, **model.pt** - экспортированная модель. Структура каталогов будет выглядеть так:

```
triton_inf/
|- / distil_bert_emotion
|  |- config.pbtxt
|  |- 1/
|     |- model.pt
|
```

In [30]:
!mkdir Triton
!mkdir Triton/Predictor
!mkdir Triton/Predictor/distil_bert_emotion
!mkdir Triton/Predictor/distil_bert_emotion/1

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
mkdir: cannot create directory ‘Triton’: File exists
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
mkdir: cannot create directory ‘Triton/Predictor’: File exists
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
mkdir: cannot create directory ‘Tri

In [31]:
scripted_model.save('./Triton/Predictor/distil_bert_emotion/1/model.pt')

Теперь необходимо описать модель для Triton

Пример **config.pbtxt** 
```
name: "distil_bert_emotion"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [1, 512]
  } ,
{
    name: "input__1"
    data_type: TYPE_INT32
    dims: [1, 512]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [1, 6]
  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
    }
]
```

Также можно заменить KIND_GPU на KIND_СPU, если используются ресурсы без GPU

Где поле **name** - наименование модели,  **input** - описывает входной массив модели, **output** - описывает выходной массив. 

**input** указываются входные вектора. В этом примере у нас два входных вектора *input_ids* и *attention_mask* каждый имеет размерность `[1,512]` и тип данных `int32`. 

**output** указывает выходной вектор. В этом примере выходной вектор `[1,6]` и формат fp32

Более подробно о написании **config.bptxt** можно ознакомиться в документации [Triton](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md)

In [32]:
%%bash

cat > Triton/Predictor/distil_bert_emotion/config.pbtxt << EOF
name: "distil_bert_emotion"
platform: "pytorch_libtorch"
input [
 {
    name: "input__0"
    data_type: TYPE_INT32
    dims: [1, 512]
  } ,
{
    name: "input__1"
    data_type: TYPE_INT32
    dims: [1, 512]
  }
]
output {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [1, 6]
  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
    }
]
EOF

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Transformer-скрипт

Serving-скрипт отвечает за получение запроса, предобработку, отправку запроса в предиктор, постобработку предиктора.

Для предобработки используется AutoTokenizer, ему необходимо указать откуда загрузить токенизатор.

Для этого создадим директорию Transformer 

In [33]:
!mkdir Triton/Transformer
!mkdir Triton/Transformer/tokenizer

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [34]:
!cp results/checkpoint-5000/tokenizer.json Triton/Transformer/tokenizer
!cp results/checkpoint-5000/tokenizer_config.json Triton/Transformer/tokenizer
!cp results/checkpoint-5000/vocab.txt Triton/Transformer/tokenizer

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [35]:
!ls -l Triton/Transformer/tokenizer

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
total 936
-rw-r--r-- 1 jovyan jovyan 711494 Sep  1 09:33 tokenizer.json
-rw-r--r-- 1 jovyan jovyan    360 Sep  1 09:33 tokenizer_config.json
-rw-r--r-- 1 jovyan jovyan 231508 Sep  1 09:33 vocab.txt


In [36]:
# Transformer/kf_serving.py

import re
import os
import argparse

import kfserving
from typing import Dict
import numpy as np
import tritonclient.http as httpclient
import logging
from transformers import AutoTokenizer

logging.basicConfig(level=logging.DEBUG)


class BertTransformer(kfserving.KFModel):
    def __init__(self, name: str, predictor_host: str):
        super().__init__(name)
        self.predictor_host = predictor_host
        # токенайзер с сохранеными файлами
        self.tokenizer = AutoTokenizer.from_pretrained('./tokenizer') 
        # наименование модели (из configb.pbtxt)
        self.model_name = "distil_bert_emotion" 
        self.triton_client = None
        
        # Словарь с сопоставлением: "порядок в векторе выхода сети" -> "эмоция"
        self.id2label = {
            0: "sadness",
            1: "joy",
            2: "love",
            3: "anger",
            4: "fear",
            5: "surprise"
          }

    def preprocess(self, inputs: Dict) -> Dict:
        """
            Препроцесинг входных данных 
        """
         # токенезируем входной запрос
        tensors = self.tokenizer(inputs["instances"][0], padding="max_length",  truncation=True, return_tensors='pt', max_length=512)

        return {"input__0":tensors['input_ids'], "input__1":tensors['attention_mask']}

    def predict(self, features: Dict) -> Dict:
        """
            Предикт     
        """
        if not self.triton_client:
            self.triton_client = httpclient.InferenceServerClient(
                url=self.predictor_host, verbose=True)

        input__0 = np.array(features['input__0'], dtype=np.int32) # конвертируем вектор  в int32
        input__1 = np.array(features['input__1'], dtype=np.int32) # конвертируем вектор  в int32

        input__0 = input__0.reshape(1, 512) # преобразуем в [1,512]
        input__1 = input__1.reshape(1, 512)  # преобразуем в [1,512]

        # Формируем запрос в тритон
        inputs = [httpclient.InferInput('input__0', [1, 512], "INT32"), 
                  httpclient.InferInput('input__1', [1, 512], "INT32")]  
        # Заполняем запрос данными из numpy массива
        inputs[0].set_data_from_numpy(input__0) 
        inputs[1].set_data_from_numpy(input__1)

        
        # Указываем ожидаемый выходной результат сети
        outputs = [httpclient.InferRequestedOutput('output__0', binary_data=False),] 
        result = self.triton_client.infer(self.model_name, inputs, outputs=outputs)
        return result.get_response()

    def postprocess(self, result: Dict) -> Dict:
        """
            Обработка результата сети
        """
        logging.info(result)
        prediction = result['outputs'][0]['data']
        predicted_class_id = np.argmax(prediction)

        return {"predictions": self.id2label[predicted_class_id]}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--http-port", default=8080)
    parser.add_argument("--predictor-host")
    args = parser.parse_args()

    x = re.compile('(kfserving-\d+)').search(os.environ.get('HOSTNAME'))
    name = "kfserving-default"
    if x:
        name = x[0]
    model = BertTransformer(name, predictor_host=args.predictor_host)
    kfserving.KFServer(workers=1, http_port=args.http_port).start([model])

Сформированный скрипт для сервинга модели необходимо сохранить по пути `Triton/Transformer/kf_serving.py`

Для работы kf_serving.py скрипта необходимо добавить в установку используемые в нем зависимости. 

In [37]:
%%bash
 
cat >> Triton/Transformer/requirements.txt << EOF
tritonclient [all]
transformers
torch
numpy
EOF

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Итоговая структура директории:
```
 |-Triton
 | |-Transformer
 | | |-tokenizer
 | | | |-tokenizer.json
 | | | |-tokenizer_config.json
 | | | |-vocab.txt
 | | |-requirements.txt
 | | |-kf_serving.py
 | |-Predictor
 | | |-distil_bert_emotion
 | | | |-1
 | | | | |-model.pt
 | | | |-config.pbtxt
```

## Создание образа

Для сборки образа необходимо созданные папки **Transformer** и **Predictor** загрузить в бакет S3. Если бакет создан, то нужно перейти в раздел получения credentials. Для создания бакета S3 Data Catalog -> Обзор Хранилища -> Создать Бакет.


<img src="img/data_storage.png" alt="drawing" width="200"/>

![data storage](img/storage.png)


После создания бакета необходимо получить его credentials для подключения с помощью сторонних утилит и последующей загрузки файлов. 


<img src="img/get_cred.png" alt="drawing" width="500"/>
<img src="img/view_cred.png" alt="drawing" width="500"/>


После того как получили credentials необходимо скопировать: 
- S3 endpoint
- S3 имя бакета
- S3 access key ID
- S3 security key

In [38]:
import boto3
import os
from tqdm import tqdm

S3_ACCESS_KEY_ID = "USER_S3_ACCESS_KEY_ID"
S3_SECRET_ACCESS_KEY_ID = "S3_SECRET_ACCESS_KEY_ID"
BUCKET_NAME = "BUCKET_NAME"
ENDPOINT_URL = "ENDPOINT_URL"

def upload_files(bucket, path):
    session = boto3.session.Session()
 
    s3_client = session.client(
        service_name='s3',
        aws_access_key_id=S3_ACCESS_KEY_ID,
        aws_secret_access_key=S3_SECRET_ACCESS_KEY_ID,
        endpoint_url=ENDPOINT_URL
    )
 
    for subdir, dirs, files in tqdm(os.walk(path)):
        for file in files:
            full_path = os.path.join(subdir, file)
            with open(full_path, 'rb') as data:
                s3_client.put_object(Bucket = bucket, Key=full_path[len(path)+1:], Body=data)


Загрузим каталоги из Triton в S3

In [39]:
upload_files(BUCKET_NAME, './Triton')

8it [00:06,  1.31it/s]


После загрузки можем приступить к сборке образа. Для сборки образа зайти в Deployment->Образы и нажмите "Создать образ"


<img src="img/image.png" alt="drawing" width="900"/>


Первым образом, соберем "Трансформер".

1. Тип образа  - Triton Server
2. Тип контейнера - Трансформер
3. Базовый образ - cr.msk.sbercloud.ru/aicloud-base-images/triton22.04-py3:0.0.32 
4. Хранилище - тот S3 бакет в который загружали ранее 
5. Конфигурация
    - Папка с моделью -  Transformer 
    - Файл Serving-script - kf_serving.py
    - Файл Requirements - requirements.txt
    
<img src="img/image_build_transformer.png" alt="" width="900"/>

Вторым образом, соберем "Предиктор".

1. Тип образа  - Triton Server
2. Тип контейнера - Предиктор
3. Базовый образ - cr.msk.sbercloud.ru/aicloud-base-images/triton22.04-py3:0.0.32 
4. Хранилище - тот S3 бакет в который загружали ранее 
5. Конфигурация
    - Папка с файлами конфигурации - папка с моделью. Пример - ИМЯ_БАКЕТА/Predictor


<img src="img/image_build_predictor.png" alt="" width="900"/>



## Деплой


Для деплоя модели зайдите в Deployment -> Деплои


<img src="img/deploi.png" alt="" width="900"/>

Нажмите кнопку "Создать деплой". Укажите следующие настройки. 

1. Наименование - Название сборки (можно оставить пустым)
2. Тип деплоя - Раздельный
3. Ресурсы - указываем регион и тип конфигурации 
4. Указываем долю ресурсов от общей конфигурации для контейнера Transformer
5. Выберите Docker-образ - указываете собранные Docker собранные ранее 


<img src="img/create_deploi.png" alt="" width="900"/>



После создания, появится карточка с созданным деплоем, со статусом **"В очереди"**. То есть данный деплой находиться на стадии ожидания выбранных ресурсов и как только ресурсы станут доступны, деплой передает в статус **"Выполняется"**

Обратите внимание, что если минимальное количество Pods будет установлено в "0", то горячий деплой не будет запущен сразу. В таком случае при первом запросе, вы получите дополнительную задержку на поднятии деплоя. 


Открыв карточку с запущеным деплоем можно посмотреть и изменить текущую конфигурацию.

<img src="img/image_triron.png" alt="" width="900"/>

Так же можно отправить тестовый запрос из вкладки "Тест API" и скопировать его в виде cURL 

<img src="img/image_example_requests.png" alt="" width="900"/>

Во вкладке "Логи" можно посмотреть текущее состояние деплоя Triton 

<img src="img/example_logs.png" alt="" width="900"/>
