Skip to content

Pipeline for training UNet using PyTorch+Catalyst for the problem of steel defect segmentation.

Notifications You must be signed in to change notification settings

githubartema/Steel-Defect-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Severstal: Steel Defect Detection

Abstract: this repo includes a pipeline using Catalyst for training UNet with different encoders for the problem of steel defect detection. Moreover, weights for trained models are provided, the result are:

  • UNet with ResNet-50 - IoU 0.413
  • UNet with EfficientNet-B3 - IoU 0.541
  • UNet with EfficientNet-B4 - IoU 0.592

Important: balanced (in the meaning of defect classes) dataset includes 1000 images, where each class includes roughly 250 images. With the whole dataset the metrics might be better.

I have not included EDA here, in general data seems to be clear (having our new dataset balanced).

Plan of research

First, let's identify the main architecture. UNet is a bit better for this problem than Mask R-CNN. It is enough to complete the task, without the need to use more complex instance segmentation like Mask R-CNN. I've conducted a research on several Kaggle kernels and papers from sources like arxiv.com.

So:

  • Architecture: UNet
  • Encoder: EfficientNet-B3,B4; ResNet-50
  • Loss function: DiceBCELoss, TverskyLoss (alpha=0.1, beta=0.9)
  • Optimizer: Adam (learning rate for encoder 1e-3, learning rate for decoder 1e-2), as encoder is much deeper
  • learning scheduler: ReduceLROnPlateau(factor=0.15, patience=2)

General thoughts

Important to notice that we have quite imbalanced dataset in the meaning of classes defect/no_defect (True Positive and True Negative). Thus, it is important to pick the appropriate loss. I've tried DiceBCELoss and Tversky Loss (alpha=0.1 and beta=0.9). The best results have been obtained with DiceBCELoss in this case.

Both of the encoders were pretrained on ImageNet. However, I do believe there is one more trick that can be fruitful: we can fine-tune encoders on the whole dataset (classification defect/no_defect). This way we can get some better results, but there were no images of class no_defect in the train.csv at all.

Also, in this situation of imbalanced classes there is point in using only images including True Positive. But, as I said above, there were no other pictures at all in the train.csv.

Moreover, we can try some multi-scale training methods to increase image resolution from small to large, but I haven't done that.

I need to add I've been bounded with Cuda memory capacity, so basicaly I could not try bigger encoders for batch size > 8.

Results

Encoder IoU DiceBCELoss Mask Resolution Epochs
ResNet-50 0.4132 (256, 1600)
EfficientNet-B3 0.513 0.444 (256, 768) 11
EfficientNet-B4 0.597 0.36 (256, 768) 37

Link to TensorBoard for EfficientNet-B4: tap here

Inferences for validation data:

  • EfficientNet-B4

Example 1: alt text

Example 2: alt text

Example 3: alt text

Example 4: alt text

Installation

Required libraries are catalyst, segmentation_models and albumentations.

P.S. I've used segmentation_models for fast prototyping.

Installation:

!pip install git+https://github.com/qubvel/segmentation_models.pytorch
!pip install -U git+https://github.com/albu/albumentations 
!pip install catalyst

Usage

The directory tree should be:

├── Predict_masks.py
├── Train.py
├── config.py
├── data
│   ├── results                #results
│   ├── test.csv
│   ├── test_images            #download test images here
│   ├── train.csv
│   ├── train_balanced.csv
│   └── train_images           #download train images here
├── images
│   
├── readme.md
├── utils
│   ├── losses.py
│   └── utils.py
└── weights
    ├── UnetEfficientNetB4_IoU_059.pth
    └── UnetResNet50_IoU_043.pth 

Evaluation

There is a Predict_masks.py script which can be used to evaluate the model and predict masks for the test dataset (from test.csv). The weights are stored in the ./weights directory.

Pictures with predicted masks and source images will be stored in data/results folder.

Important: masks for ResNet-50 are of (256, 1600)px and masks for EfficientNet-B3,B4 are of (256, 768)px. Free Colab doesn't allow to use more Cuda memory:(

Usage example:

python3 Predict_masks.py -dir /Users/user/Documents/steel_defect_detection/data/  -weights_dir /Users/user/Documents/steel_defect_detection/data/weights

Arguments

-dir    : Pass the full path of a directory containing a folder "train" and "train.csv".
-num_of_images   : Number of test image from test.csv for segmentation.
-weights_dir   : Pass a weights directory.

Predict.py doesn't save binary masks, it saves pictures with image and predicted mask for better presentation.

Training

The model is supposed to be trained on the dataset from the Kaggle competition. You can choose which encoder to use and a batch size. The default is EfficientNet-B4. Mask size is set as (256, 768) in config.py, you can set your own.

It is necessary to point the directory where the train folder and train.csv are stored.

Usage example:

python3 Train.py -dir /Users/user/Documents/steel_defect_detection/data/ -num_of_workers 4

Arguments

-dir    : Pass the full path of a directory containing a folder "test" and "test.csv".
-encoder   : Backbone to use as encoder for UNet, default='efficientnet-b3'.
-batch_size   : Batch size for training, default=8.
-num_of_workers   : Number of workers for training, default=0.

About

Pipeline for training UNet using PyTorch+Catalyst for the problem of steel defect segmentation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages