![Catalyst](https://raw.githubusercontent.com/catalyst-team/catalyst-pics/master/pics/catalyst_logo.png)

**This notebook aims to:**
1. Show a base pipeline with Catalyst:
    * declarative base structure that is required by the library
    * training
    * inference

2. Create the submission

**Catalyst** is a high-level library(based on Pytorch) that helps to train your neural network models.

It breaks a common training procedure into separate code blocks.

Typically, you just need to declare your model and create simple configuration files. That is it!

From the official site:

>
Catalyst helps you write compact but full-featured DL & RL pipelines in a few lines of code. 

>You get a training loop with metrics, early-stopping, model checkpointing and other features without the boilerplate.

**Features**
> 
> Universal train/inference loop.
> 
> Configuration files for model/data hyperparameters.
> 
> Reproducibility – even source code will be saved.
> 
> Callbacks – reusable train/inference pipeline parts.
> 
> Training stages support.
> 
> Easy customization.

> PyTorch best practices (SWA, AdamW, 1Cycle, FP16 and more).

More info about Catalyst:

https://github.com/catalyst-team/catalyst

Examples:

https://github.com/catalyst-team/catalyst/tree/master/examples

**Install Catalyst**

Typically, you just run `pip install catalyst`. 

But in many competitions on Kaggle you will not be provided with the Internet connection when you will be doing the submission.


To cope with it, there is a Kaggle dataset with the requirements.

More about this dataset: https://www.kaggle.com/lightforever/catalyst

In [None]:
! cp -a /kaggle/input/catalyst/catalyst/catalyst/install.sh /tmp/install.sh && chmod 777 /tmp/install.sh && /tmp/install.sh /kaggle/input/catalyst/catalyst/catalyst

**Explore format of the files which are required by Catalyst**

In [None]:
! ls /kaggle/input/mnistcatalyst

1. __init__.py - import base parts which will be imported by catalyst
2. experiment.py - file that provides the catalyst with the datasets
3. model.py - model declaration. We use a very simple model for this notebook
4. dataset.py - trivial Pytorch dataset for the task
5. train.yml - configuration file for training
6. infer.yml - configuration file for inference

fold.csv - is a task-specific file with 5-Fold spliting. It is used in the dataset

In [None]:
cat /kaggle/input/mnistcatalyst/__init__.py

In [None]:
cat /kaggle/input/mnistcatalyst/experiment.py

In [None]:
cat /kaggle/input/mnistcatalyst/model.py

In [None]:
cat /kaggle/input/mnistcatalyst/dataset.py

In [None]:
cat /kaggle/input/mnistcatalyst/train.yml

In [None]:
cat /kaggle/input/mnistcatalyst/infer.yml

In [None]:
!head /kaggle/input/mnistcatalyst/fold.csv

**Train**

In [None]:
! catalyst-dl run --config /kaggle/input/mnistcatalyst/train.yml --expdir /kaggle/input/mnistcatalyst/

**Infer**

In [None]:
! catalyst-dl run --config /kaggle/input/mnistcatalyst/infer.yml --expdir /kaggle/input/mnistcatalyst/

catalyst has written the predictions into 'infer.logits.npy'

In [None]:
! ls /tmp/log

Let's read the predictions and make a submission

In [None]:
import numpy as np
import pandas as pd

prob = np.load('/tmp/log/infer.logits.npy')
argmax = prob.argmax(axis=1)
pd.DataFrame({
    'ImageId': np.arange(1, len(argmax) + 1),
    'Label': argmax
}).to_csv('submission.csv', index=False)

**Conclusion**

Catalyst helps you to create a new experiment rapidly! 

You just need to include(or write if it does not exist) a new callback.

Or you can change the runner. There are lots of tricks there!