Я решаю данную задачу используя библиотеку LightAutoML [ссылка](https://github.com/sb-ai-lab/LightAutoML). 

## 0. Предварительная подготовка

### 0.0. Установим LightAutoML

In [1]:
# установим библиотеку LightAutoML с поддержкой энкодеров
# для текста и изображений на основе нейросетей
# !pip install -U lightautoml[all]

In [2]:
# чтобы работали нейросети и шаблон решения задач
# с картинками нужно поставить на библиотеку дополнительный фикс!
# !pip install LightAutoML-0.3.7-py3-none-any.whl[all]

In [3]:
# в качестве модели извлекающей признаки из картинок будет
# использоваться нейросеть EfficientNet
# установим эту зависимость.

# !pip install efficientnet-pytorch==0.7.0

# QUICK WORKAROUND FOR PROBLEM WITH PANDAS
# !pip install -U pandas

In [4]:
import os
import time

import numpy as np
import pandas as pd
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
import torch
import seaborn as sns
import matplotlib.pyplot as plt

# LightAutoML шаблоны пресетов для задач комп зрения
from lightautoml.automl.presets.image_presets import TabularCVAutoML
from lightautoml.tasks import Task

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
np.random.seed(42)
torch.set_num_threads(2)

### Загрузка данных

In [6]:
train = pd.read_csv('data/train.csv')

In [7]:
train.head()

Unnamed: 0,img_name,PSNR
0,100083_1.png,29.543164
1,100083_2.png,27.894582
2,100083_3.png,29.350353
3,100083_4.png,27.775248
4,100083_5.png,34.358442


In [10]:
# добавим полный относительный путь до картинок из train
# в качестве новой колонки
path_to_dataset_train = 'data/SBC22_IQA_dataset/train'
train['path'] =  path_to_dataset_train + '/' + train['img_name']

In [11]:
# колонка с имена картинок больше не нужна
train.drop(columns=['img_name'], inplace=True)

In [12]:
train.head()

Unnamed: 0,PSNR,path
0,29.543164,data/SBC22_IQA_dataset/train/100083_1.png
1,27.894582,data/SBC22_IQA_dataset/train/100083_2.png
2,29.350353,data/SBC22_IQA_dataset/train/100083_3.png
3,27.775248,data/SBC22_IQA_dataset/train/100083_4.png
4,34.358442,data/SBC22_IQA_dataset/train/100083_5.png


# Ставим задачу

In [13]:
# у нас задача регрессии, ставим метрику и лосс согласно условиям
# соревнования
task = Task('reg', loss='mae', metric='mae')

In [14]:
# целевая переменная для регресии это PSNR
# переменная пути для картинок в столбце path
roles = {'target': 'PSNR',
         'path' : ['path'],
         }

In [15]:
automl = TabularCVAutoML(task=task,
                         timeout=5 * 3600,
                         cpu_limit=2,
                         reader_params={'cv': 5, 'random_state': 42})


# Обучение модели по кросс валидации по всей выборке

In [16]:
%%time 
# будет использовано 5 фолдов кросс валидации
# для извлечения признаков с картинок будет использована нейросеть EfficientNet b0
# после извлечения числовых признаков нейросетью на них будут учится стандартные модели
#  линейная регрессия и градиентный бустинг
oof_pred = automl.fit_predict(train, roles = roles, verbose = 4)

[18:13:15] Stdout logging level is DEBUG.
[18:13:15] Task: reg

[18:13:15] Start automl preset with listed constraints:
[18:13:15] - time: 18000.00 seconds
[18:13:15] - CPU: 2 cores
[18:13:15] - memory: 16 GB

[18:13:15] [1mTrain data shape: (13210, 2)[0m

[18:13:15] Layer [1m1[0m train process start. Time left 18000.00 secs
Loaded pretrained weights for efficientnet-b0


100%|██████████| 104/104 [00:35<00:00,  2.89it/s]

[18:13:53] Feature path transformed





[18:13:54] Start fitting [1mLvl_0_Pipe_0_Mod_0_LinearL2[0m ...
[18:13:54] Training params: {'tol': 1e-06, 'max_iter': 100, 'cs': [1e-05, 5e-05, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000], 'early_stopping': 2, 'categorical_idx': [], 'embed_sizes': (), 'data_size': 1280}
[18:13:54] ===== Start working with [1mfold 0[0m for [1mLvl_0_Pipe_0_Mod_0_LinearL2[0m =====
[18:13:54] Linear model: C = 1e-05 score = -0.8361170218629669
[18:13:54] Linear model: C = 5e-05 score = -0.7955806235930066
[18:13:55] Linear model: C = 0.0001 score = -0.788143205205594
[18:13:55] Linear model: C = 0.0005 score = -0.790031745178784
[18:13:55] Linear model: C = 0.001 score = -0.7941463899017929
[18:13:55] ===== Start working with [1mfold 1[0m for [1mLvl_0_Pipe_0_Mod_0_LinearL2[0m =====
[18:13:56] Linear model: C = 1e-05 score = -0.8383684181777105
[18:13:56] Linear model: C = 5e-05 score = -0.7987508957118582
[18:13:56] Linear model: C

Default metric period is 5 because MAE is/are not implemented for GPU


[18:14:45] 200:	learn: 0.7815682	test: 0.8284462	best: 0.8284462 (200)	total: 219ms	remaining: 1.96s
[18:14:45] 300:	learn: 0.7469610	test: 0.8166907	best: 0.8166860 (299)	total: 329ms	remaining: 1.85s
[18:14:45] 400:	learn: 0.7195630	test: 0.8076664	best: 0.8076664 (400)	total: 439ms	remaining: 1.75s
[18:14:45] 500:	learn: 0.6973203	test: 0.8025592	best: 0.8025592 (500)	total: 547ms	remaining: 1.64s
[18:14:45] 600:	learn: 0.6772006	test: 0.7991981	best: 0.7991981 (600)	total: 657ms	remaining: 1.53s
[18:14:45] 700:	learn: 0.6611098	test: 0.7958313	best: 0.7958313 (700)	total: 766ms	remaining: 1.42s
[18:14:45] 800:	learn: 0.6454059	test: 0.7922988	best: 0.7922988 (800)	total: 875ms	remaining: 1.31s
[18:14:46] 900:	learn: 0.6304536	test: 0.7898520	best: 0.7897216 (892)	total: 985ms	remaining: 1.2s
[18:14:46] 1000:	learn: 0.6174602	test: 0.7886640	best: 0.7886640 (1000)	total: 1.09s	remaining: 1.09s
[18:14:46] 1100:	learn: 0.6057972	test: 0.7864886	best: 0.7864886 (1100)	total: 1.21s	rema

Default metric period is 5 because MAE is/are not implemented for GPU


[18:14:48] 200:	learn: 0.7834389	test: 0.8191694	best: 0.8191694 (200)	total: 219ms	remaining: 1.96s
[18:14:48] 300:	learn: 0.7484606	test: 0.8042356	best: 0.8042356 (300)	total: 328ms	remaining: 1.85s
[18:14:48] 400:	learn: 0.7231277	test: 0.7957099	best: 0.7957099 (400)	total: 438ms	remaining: 1.75s
[18:14:48] 500:	learn: 0.7014055	test: 0.7896979	best: 0.7896701 (499)	total: 547ms	remaining: 1.64s
[18:14:48] 600:	learn: 0.6804889	test: 0.7846941	best: 0.7846941 (600)	total: 656ms	remaining: 1.53s
[18:14:48] 700:	learn: 0.6628048	test: 0.7808905	best: 0.7808905 (700)	total: 765ms	remaining: 1.42s
[18:14:48] 800:	learn: 0.6481734	test: 0.7770830	best: 0.7770830 (800)	total: 875ms	remaining: 1.31s
[18:14:48] 900:	learn: 0.6349031	test: 0.7736576	best: 0.7736482 (898)	total: 984ms	remaining: 1.2s
[18:14:48] 1000:	learn: 0.6217293	test: 0.7703903	best: 0.7703703 (995)	total: 1.09s	remaining: 1.09s
[18:14:49] 1100:	learn: 0.6088568	test: 0.7671202	best: 0.7671202 (1100)	total: 1.2s	remain

Default metric period is 5 because MAE is/are not implemented for GPU


[18:14:50] 200:	learn: 0.7784074	test: 0.8327664	best: 0.8327664 (200)	total: 254ms	remaining: 2.27s
[18:14:50] 300:	learn: 0.7431740	test: 0.8190141	best: 0.8190141 (300)	total: 387ms	remaining: 2.18s
[18:14:51] 400:	learn: 0.7176778	test: 0.8142434	best: 0.8142434 (400)	total: 509ms	remaining: 2.03s
[18:14:51] 500:	learn: 0.6944672	test: 0.8078648	best: 0.8078639 (498)	total: 618ms	remaining: 1.85s
[18:14:51] 600:	learn: 0.6749945	test: 0.8038054	best: 0.8037594 (598)	total: 727ms	remaining: 1.69s
[18:14:51] 700:	learn: 0.6573799	test: 0.7990373	best: 0.7990373 (700)	total: 837ms	remaining: 1.55s
[18:14:51] 800:	learn: 0.6421226	test: 0.7955658	best: 0.7955658 (800)	total: 946ms	remaining: 1.42s
[18:14:51] 900:	learn: 0.6292915	test: 0.7917909	best: 0.7917909 (900)	total: 1.05s	remaining: 1.29s
[18:14:51] 1000:	learn: 0.6162041	test: 0.7886583	best: 0.7886583 (1000)	total: 1.16s	remaining: 1.16s
[18:14:51] 1100:	learn: 0.6047146	test: 0.7852259	best: 0.7852259 (1100)	total: 1.27s	rem

Default metric period is 5 because MAE is/are not implemented for GPU


[18:14:53] 200:	learn: 0.7870143	test: 0.8211407	best: 0.8211407 (200)	total: 220ms	remaining: 1.97s
[18:14:53] 300:	learn: 0.7516646	test: 0.8068327	best: 0.8068327 (300)	total: 329ms	remaining: 1.86s
[18:14:53] 400:	learn: 0.7237951	test: 0.7948710	best: 0.7948710 (400)	total: 439ms	remaining: 1.75s
[18:14:53] 500:	learn: 0.7008464	test: 0.7877289	best: 0.7876674 (498)	total: 548ms	remaining: 1.64s
[18:14:54] 600:	learn: 0.6808580	test: 0.7813520	best: 0.7813520 (600)	total: 658ms	remaining: 1.53s
[18:14:54] 700:	learn: 0.6636187	test: 0.7776801	best: 0.7776801 (700)	total: 768ms	remaining: 1.42s
[18:14:54] 800:	learn: 0.6470118	test: 0.7726638	best: 0.7726589 (799)	total: 878ms	remaining: 1.31s
[18:14:54] 900:	learn: 0.6336409	test: 0.7695123	best: 0.7695123 (900)	total: 988ms	remaining: 1.2s
[18:14:54] 1000:	learn: 0.6210873	test: 0.7664114	best: 0.7664114 (1000)	total: 1.1s	remaining: 1.09s
[18:14:54] 1100:	learn: 0.6099908	test: 0.7631732	best: 0.7631732 (1100)	total: 1.21s	remai

Default metric period is 5 because MAE is/are not implemented for GPU


[18:14:56] 200:	learn: 0.7824777	test: 0.8482778	best: 0.8481802 (199)	total: 220ms	remaining: 1.97s
[18:14:56] 300:	learn: 0.7491779	test: 0.8320318	best: 0.8320318 (300)	total: 329ms	remaining: 1.86s
[18:14:56] 400:	learn: 0.7219455	test: 0.8216520	best: 0.8216520 (400)	total: 439ms	remaining: 1.75s
[18:14:56] 500:	learn: 0.7008259	test: 0.8144356	best: 0.8142872 (492)	total: 548ms	remaining: 1.64s
[18:14:56] 600:	learn: 0.6801549	test: 0.8091933	best: 0.8091597 (595)	total: 658ms	remaining: 1.53s
[18:14:56] 700:	learn: 0.6637575	test: 0.8047022	best: 0.8046799 (699)	total: 768ms	remaining: 1.42s
[18:14:57] 800:	learn: 0.6478722	test: 0.7999155	best: 0.7999155 (800)	total: 878ms	remaining: 1.31s
[18:14:57] 900:	learn: 0.6346784	test: 0.7964858	best: 0.7964466 (898)	total: 987ms	remaining: 1.2s
[18:14:57] 1000:	learn: 0.6217912	test: 0.7934667	best: 0.7934667 (1000)	total: 1.1s	remaining: 1.09s
[18:14:57] 1100:	learn: 0.6101262	test: 0.7906546	best: 0.7906546 (1100)	total: 1.21s	remai

In [21]:
# смотрим предсказания модели-регрессора
preds = oof_pred.data[:,0]
preds


array([29.51603 , 28.0579  , 28.717566, ..., 27.91448 , 28.251934,
       28.537424], dtype=float32)

In [22]:
y_true = train['PSNR'].values
y_true

array([29.54316383, 27.89458224, 29.3503531 , ..., 27.8561391 ,
       29.72078868, 28.28530245])

In [24]:
# Введем метрики оценки качества модели
def mse(y_true, y_pred):
    return ((y_true - y_pred)**2).mean()
def mae(y_true, y_pred):
    return (np.abs(y_true - y_pred)).mean()

In [26]:
# Оценим нашу модель на валидационной выборке:
print(f"MSE = {mse(y_true, preds)}")
print(f"MAE = {mae(y_true, preds)}")

MSE = 2.0822282802423095
MAE = 0.738451368613441


# Предскажем на тесте

In [29]:
test = pd.read_csv('data/sample_submission.csv')

In [30]:
test.head()

Unnamed: 0,img_name,PSNR
0,100090_1.png,0
1,100090_2.png,0
2,100090_3.png,0
3,100090_4.png,0
4,100090_5.png,0


In [31]:
# добавим полный относительный путь до картинок из test
# в качестве новой колонки
path_to_dataset_test = 'data/SBC22_IQA_dataset/test'
test['path'] =  path_to_dataset_test + '/' + test['img_name']

In [34]:
# колонка с имена картинок больше не нужна
test.drop(columns=['img_name'], inplace=True)

In [35]:
test.head()

Unnamed: 0,PSNR,path
0,0,data/SBC22_IQA_dataset/test/100090_1.png
1,0,data/SBC22_IQA_dataset/test/100090_2.png
2,0,data/SBC22_IQA_dataset/test/100090_3.png
3,0,data/SBC22_IQA_dataset/test/100090_4.png
4,0,data/SBC22_IQA_dataset/test/100090_5.png


In [36]:
# предсказание ансамблем моделей на тестовых данных
test_pred = automl.predict(test)

100%|██████████| 26/26 [00:09<00:00,  2.78it/s]

[18:32:32] Feature path transformed





In [39]:
test_pred = test_pred.data[:,0]

In [40]:
test_pred

array([28.877903, 27.85306 , 27.89661 , ..., 27.892792, 28.644264,
       29.722187], dtype=float32)

# публикуем сабмит

In [41]:
submission = pd.read_csv('data/sample_submission.csv')

In [42]:
submission

Unnamed: 0,img_name,PSNR
0,100090_1.png,0
1,100090_2.png,0
2,100090_3.png,0
3,100090_4.png,0
4,100090_5.png,0
...,...,...
3305,99584_1.png,0
3306,99584_2.png,0
3307,99584_3.png,0
3308,99584_4.png,0


In [43]:
submission['PSNR'] = test_pred


In [44]:
submission

Unnamed: 0,img_name,PSNR
0,100090_1.png,28.877903
1,100090_2.png,27.853060
2,100090_3.png,27.896610
3,100090_4.png,27.994972
4,100090_5.png,28.971851
...,...,...
3305,99584_1.png,29.786552
3306,99584_2.png,28.124371
3307,99584_3.png,27.892792
3308,99584_4.png,28.644264


In [45]:
submission_path = 'submission/lightautoml_model_final_submission.csv'

In [47]:
submission.to_csv(submission_path, index=False)