В данном проекте вы выполните комплексное домашнее задание по подготовке, оптимизации и развертыванию модели машинного обучения с использованием современных инструментов. Цель проекта – освоить процессы обучения, конвертации, оптимизации и интеграции моделей в продакшн-среду с применением Triton Inference Server, Docker и микросервисной архитектуры.

In [19]:
! pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu  --no-cache-dir

Looking in indexes: https://download.pytorch.org/whl/cpu


In [None]:
! pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu  --no-cache-dir

Looking in indexes: https://download.pytorch.org/whl/cpu


In [20]:
! pip install onnx==1.13.1 onnxruntime==1.14.1 numpy==1.26.4



In [21]:
! pip install onnxoptimizer



1. **Обучение модели**
   - Обучите самописную модель на базе **Torch** или **TensorFlow**.
   - Используйте стандартные слои, избегая кастомных решений.

In [22]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.ao.quantization as quantization
import torch.onnx
import onnxoptimizer

import onnx
import onnxruntime as ort
import numpy as np

In [23]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # Среднее и стандартное отклонение MNIST
])

train_dataset = datasets.MNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root="./data", train=False, transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU() 
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.fc = nn.Linear(64 * 28 * 28, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = x.view(x.shape[0], -1)
        x = self.fc(x)
        return x

In [25]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

device

device(type='cpu')

In [26]:
def train(model, train_loader, optimizer, criterion, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}] Loss: {loss.item():.6f}')

In [27]:
def test(model, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)  # Получаем индекс максимального логита
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.2f}%)\n')

In [28]:
NUM_EPOCHS = 5
MODEL_NAME = "pytorch.pth"

In [29]:
for epoch in range(1, NUM_EPOCHS + 1):
    train(model, train_loader, optimizer, criterion, epoch)
    test(model, test_loader, criterion)

torch.save(model.state_dict(), MODEL_NAME)

Train Epoch: 1 [0/60000] Loss: 2.300770
Train Epoch: 1 [6400/60000] Loss: 0.240952
Train Epoch: 1 [12800/60000] Loss: 0.137812
Train Epoch: 1 [19200/60000] Loss: 0.217435
Train Epoch: 1 [25600/60000] Loss: 0.233539
Train Epoch: 1 [32000/60000] Loss: 0.041848
Train Epoch: 1 [38400/60000] Loss: 0.032779
Train Epoch: 1 [44800/60000] Loss: 0.074851
Train Epoch: 1 [51200/60000] Loss: 0.030709
Train Epoch: 1 [57600/60000] Loss: 0.040874

Test set: Average loss: 0.0001, Accuracy: 9842/10000 (98.42%)

Train Epoch: 2 [0/60000] Loss: 0.114845
Train Epoch: 2 [6400/60000] Loss: 0.054702
Train Epoch: 2 [12800/60000] Loss: 0.114300
Train Epoch: 2 [19200/60000] Loss: 0.020022
Train Epoch: 2 [25600/60000] Loss: 0.037703
Train Epoch: 2 [32000/60000] Loss: 0.101201
Train Epoch: 2 [38400/60000] Loss: 0.026494
Train Epoch: 2 [44800/60000] Loss: 0.060995
Train Epoch: 2 [51200/60000] Loss: 0.029231
Train Epoch: 2 [57600/60000] Loss: 0.018492

Test set: Average loss: 0.0000, Accuracy: 9857/10000 (98.57%)

Tr

In [30]:
model = CNN() 
model.load_state_dict(torch.load(MODEL_NAME))
model.eval()

traced_model = torch.jit.trace(model, torch.randn(1, 1, 28, 28))

traced_model.save("pytorch.pt")

2. **Конвертация в ONNX**
   - Экспортируйте обученную модель в формат **ONNX**.

In [31]:
model = CNN() 
model.load_state_dict(torch.load(MODEL_NAME))
model.eval()

dummy_input = torch.randn(1, 1, 28, 28)

onnx_filename = "onnx.onnx"
torch.onnx.export(
    model,  
    dummy_input,
    onnx_filename,
    input_names=["input"],
    output_names=["output"], 
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}, 
    opset_version=11
)

In [32]:
onnx_model = onnx.load("onnx.onnx")
onnx.checker.check_model(onnx_model)  # Проверяем модель

ort_session = ort.InferenceSession("onnx.onnx")

x = np.random.randn(1, 1, 28, 28).astype(np.float32)
ort_inputs = {"input": x}
ort_outs = ort_session.run(["output"], ort_inputs)

3. **(Опционально) Конвертация в TensorRT (TRT)**
   - При необходимости, конвертируйте модель в формат **TensorRT** для повышения производительности инференса.

4. **Оптимизация модели средствами Torch/TensorFlow**
   - Примените встроенные методы оптимизации (например, quantization или pruning) для улучшения эффективности модели.

In [33]:
torch.backends.quantized.engine = 'qnnpack'

In [34]:
model = CNN()
model.load_state_dict(torch.load("pytorch.pth"))
model.eval()

model.qconfig = torch.ao.quantization.get_default_qconfig('x86')
model_fused = torch.ao.quantization.fuse_modules(model, [['conv1', 'relu1'], ['conv2', 'relu2']])
model_prepared = torch.ao.quantization.prepare(model_fused)
model_int8 = torch.ao.quantization.convert(model_prepared)

scripted_model = torch.jit.script(model_int8)
scripted_model.save("pytorch_optimized.pt")


In [36]:
model = CNN()
model.load_state_dict(torch.load("pytorch.pth"))
model.eval()

model.qconfig = quantization.QConfig(
    activation=quantization.default_observer,
    weight=quantization.default_weight_observer
)

model_fused = torch.ao.quantization.fuse_modules(model, [['conv1', 'relu1'], ['conv2', 'relu2']])
model_prepared = torch.ao.quantization.prepare(model_fused)
model_int8 = torch.ao.quantization.convert(model_prepared)

scripted_model = torch.jit.script(model_int8)
scripted_model.save("pytorch_quantized.pt")  # TorchScript-модель


5. **Оптимизация модели инструментами ONNX и (опционально) TRT**
   - Используйте оптимизирующие инструменты для ONNX (например, ONNX Runtime) для повышения производительности.
   - (Опционально) Оптимизируйте модель в формате TensorRT.

In [37]:
onnx_model_path = "onnx.onnx"
onnx_model = onnx.load(onnx_model_path)

passes = [
    "eliminate_deadend",        # Убирает ненужные узлы
    "eliminate_identity",       # Убирает операции типа Identity
    "eliminate_nop_transpose",  # Убирает ненужные транспонирования
    "fuse_bn_into_conv",        # Сливает BatchNorm в Conv2d
    "fuse_add_bias_into_conv"   # Сливает Add в Conv2d
]

optimized_model = onnxoptimizer.optimize(onnx_model, passes)
optimized_onnx_path = "onnx_optimized.onnx"

onnx.save(optimized_model, optimized_onnx_path)

ort_session = ort.InferenceSession(optimized_onnx_path, providers=["CPUExecutionProvider"])

x = np.random.randn(1, 1, 28, 28).astype(np.float32)  # MNIST вход

# Выполняем инференс
ort_inputs = {ort_session.get_inputs()[0].name: x}
ort_outs = ort_session.run(None, ort_inputs)

ort_outs

[array([[-17.818356, -14.739766,  -6.881494, -10.863028, -15.319208,
          -9.028692, -13.222997,  -8.228025,  -8.828684, -11.138251]],
       dtype=float32)]