# Example of SciAugment use for augmenting scientific images with YOLO anotations.

It uses albumentation (example of augmentation here: https://colab.research.google.com/drive/1JuZ23u0C0gx93kV0oJ8Mq0B6CBYhPLXy) and OpenCV. The goal is to create tools that make more sense for augmentation of scientific images. The way how the sensors capture data are important, and usualy the sensors and ways of capture are noc completely same as in capturing RGB data.

Thoughtful augmentation should improve robustnes of object detection and clasification. Bad augmentation not respecting characteristics of the sensor and data information/statistic may lead to increased erors or low usability of final model.

Clone SciAugment repository

Install SciAugment packge

In [1]:
# !pip install git+https://github.com/martinschatz-cz/SciAugment.git

In [None]:
!git clone https://github.com/martinschatz-cz/SciAugment.git
!pip install -r /content/SciAugment/requirements.txt

Import functions

In [3]:
from SciAugment.SciAugment.SciAug_tools import *

Download subsection320.zip

In [4]:
!wget https://github.com/martinschatz-cz/SciAugment/raw/v0.2.0/example_notebooks/subsection320png.zip

--2022-09-25 10:30:39--  https://github.com/martinschatz-cz/SciAugment/raw/v0.2.0/example_notebooks/subsection320png.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/martinschatz-cz/SciAugment/v0.2.0/example_notebooks/subsection320png.zip [following]
--2022-09-25 10:30:39--  https://raw.githubusercontent.com/martinschatz-cz/SciAugment/v0.2.0/example_notebooks/subsection320png.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1021277 (997K) [application/zip]
Saving to: ‘subsection320png.zip’


2022-09-25 10:30:40 (22.5 MB/s) - ‘subsection320png.zip’ saved [1021277/1021277]



Or unzip test folder (subsection320.zip) with subsection

In [5]:
!unzip -q /content/subsection320png.zip -d /content/

Ideal tool for anotating images is https://www.makesense.ai/

Specify folder with images and YOLO anotations and run default augmentation. The process will automaticaly create train_data folder and randomly divide the images an labels to train/test folder with 70/30 distribution. The percentage of train part can be specified.

The default input format is .png (can be changed), and output format is .jpeg. The function expects images with three channels.

The ouput images have a name tag joined at the end of the name for better control over augmentatin.
name (string): Name where relevant to position is bit length 11 for:
 *     1:Shift
 *     2:Scale
 *     3:Rotate
 *     4:VerticalFlip
 *     5:HorizontalFlip
 *     6:RandomContrast
 *     7:MultiplicativeNoise(multiplier=0.5, p=0.2)
 *     8:RandomSizedBBoxSafeCrop (250, 250, erosion_rate=0.0, interpolation=1, p=1.0)
 *     9:Blur(blur_limit=(50, 50), p=0)
 *     10:Transpose
 *     11:RandomRotate90
 *     12:RandomBrightness

 suffix for channel augmentation:

*          No Channel Augmentation **-NA**
*          RandomBrightnessContrast(contrast_limit=0.2,p=1) **-RC**
*          MultiplicativeNoise(multiplier=0.5, p=1) **-MN**
*          Blur(blur_limit=(10, 10), p=)1 **-B**
*          RandomBrightnessContrast(brightness_limit=0.2,p=1) **-RB**
*          Superpixels **-SP**
*          GaussNoise **-GN**

In [6]:
# @markdown Specifie a path to folder with images and YOLO anotations
input_images_folder = "/content/subsection320/"  # @param{type: 'string'}
input_image_format = ".png"  # @param{type: 'string'}

For reproducible train/test distribution, select specific seed for random numbers.

In [7]:
random.seed(7)

Create default augmentation object.

Default augmentation does not augment brightness, as Albumentations package offers mainly RGB augmentation - which is not always usable for multi channel scientific images.

It will notify user about selected augmentation. Each augmentation will create one new image and label.

In [8]:
aug1 = SciAugment()

New instance of SciAugment.
Selected augmentation type: Default


Version: 0.2.0


Selected augmentation:
no augmentation
HorizontalFlip(p=1)
RandomSizedBBoxSafeCrop(250, 250, erosion_rate=0.0, interpolation=1, p=1.0)
Transpose(1)
RandomRotate90(p=1)
ShiftScaleRotate(p=1)
VerticalFlip(p=1)


Apply augmentatin on selected folder of images and YOLO labels (if there already exists train_folder, the function will stop).

In [9]:
aug1.augment_data(images_path=input_images_folder, image_format=input_image_format)

Num of files: 63
Processing: im_1.png
/content/subsection320/
im_1.png
Processing: im_1.txt
/content/subsection320/im_1.txt
Writing im_1_0_000000000000.jpg
Writing im_1_1_000010000000.jpg
Writing im_1_2_000000010000.jpg
Writing im_1_3_000000000100.jpg
Writing im_1_4_000000000010.jpg
Writing im_1_5_111000000000.jpg
Writing im_1_6_000100000000.jpg
Processing: im_10.png
/content/subsection320/
im_10.png
Processing: im_10.txt
/content/subsection320/im_10.txt
Writing im_10_7_000000000000.jpg
Writing im_10_8_000010000000.jpg
Writing im_10_9_000000010000.jpg
Writing im_10_10_000000000100.jpg
Writing im_10_11_000000000010.jpg
Writing im_10_12_111000000000.jpg
Writing im_10_13_000100000000.jpg
Processing: im_11.png
/content/subsection320/
im_11.png
Processing: im_11.txt
/content/subsection320/im_11.txt
Writing im_11_14_000000000000.jpg
Writing im_11_15_000010000000.jpg
Writing im_11_16_000000010000.jpg
Writing im_11_17_000000000100.jpg
Writing im_11_18_000000000010.jpg
Writing im_11_19_11100000

In [10]:
import os

os.rename("train_data", "train_data_aug1")

There exist another prepared version of augmentation (it will be tuned up in future after testing)

In [11]:
aug2 = SciAugment(aug_type="fluorescece_microscopy", channel_aug=True)

New instance of SciAugment.
Selected augmentation type: fluorescece_microscopy


Version: 0.2.0


Selected augmentation:
no augmentation
HorizontalFlip(p=1)
RandomBrightnessContrast(contrast_limit=0.2,p=1)
MultiplicativeNoise(multiplier=0.5, p=1)
RandomSizedBBoxSafeCrop(250, 250, erosion_rate=0.0, interpolation=1, p=1.0)
Blur(blur_limit=(10, 10), p=0)
Transpose(1)
RandomRotate90(p=1)
ShiftScaleRotate(p=1)
VerticalFlip(p=1)


Selected channel wise augmentation:
no augmentation
MultiplicativeNoise(multiplier=0.5, p=1)
Blur(blur_limit=(10, 10), p=0)
RandomBrightnessContrast(brightness_limit=0.2,p=1)
Superpixels (p_replace=0.1, n_segments=20, max_size=64, interpolation=1, p=1)
GaussNoise (var_limit=(10.0, 50.0), mean=0, p=1)


It is possible to apply it in a same way (after renaming already existing train_data folder)

In [12]:
aug2.augment_data(images_path=input_images_folder, image_format=input_image_format)

Num of files: 63
Processing: im_1.png
/content/subsection320/
im_1.png
Processing: im_1.txt
/content/subsection320/im_1.txt
Writing im_1_0_000000000000.jpg
Writing im_1_1_000010000000.jpg
Writing im_1_2_000001000000.jpg
Writing im_1_3_000000100000.jpg
Writing im_1_4_000000010000.jpg
Writing im_1_5_000000001000.jpg
Writing im_1_6_000000000100.jpg
Writing im_1_7_000000000010.jpg
Writing im_1_8_111000000000.jpg
Writing im_1_9_000100000000.jpg
Processing: im_10.png
/content/subsection320/
im_10.png
Processing: im_10.txt
/content/subsection320/im_10.txt
Writing im_10_10_000000000000.jpg
Writing im_10_11_000010000000.jpg
Writing im_10_12_000001000000.jpg
Writing im_10_13_000000100000.jpg
Writing im_10_14_000000010000.jpg
Writing im_10_15_000000001000.jpg
Writing im_10_16_000000000100.jpg
Writing im_10_17_000000000010.jpg
Writing im_10_18_111000000000.jpg
Writing im_10_19_000100000000.jpg
Processing: im_11.png
/content/subsection320/
im_11.png
Processing: im_11.txt
/content/subsection320/im_1

In [13]:
import os

os.rename("train_data", "train_data_aug2")

Use all augmentation

In [14]:
aug3 = SciAugment(aug_type="all")

New instance of SciAugment.
Selected augmentation type: all


Version: 0.2.0


Selected augmentation:
no augmentation
HorizontalFlip(p=1)
RandomBrightnessContrast(contrast_limit=0.2,p=1)
MultiplicativeNoise(multiplier=0.5, p=1)
RandomSizedBBoxSafeCrop(250, 250, erosion_rate=0.0, interpolation=1, p=1.0)
Blur(blur_limit=(10, 10), p=0)
Transpose(1)
RandomRotate90(p=1)
ShiftScaleRotate(p=1)
VerticalFlip(p=1)


In [15]:
aug3.augment_data(images_path=input_images_folder, image_format=input_image_format)

Num of files: 63
Processing: im_1.png
/content/subsection320/
im_1.png
Processing: im_1.txt
/content/subsection320/im_1.txt
Writing im_1_0_000000000000.jpg
Writing im_1_1_000010000000.jpg
Writing im_1_2_000001000000.jpg
Writing im_1_3_000000100000.jpg
Writing im_1_4_000000010000.jpg
Writing im_1_5_000000001000.jpg
Writing im_1_6_000000000100.jpg
Writing im_1_7_000000000010.jpg
Writing im_1_8_111000000000.jpg
Writing im_1_9_000100000000.jpg
Processing: im_10.png
/content/subsection320/
im_10.png
Processing: im_10.txt
/content/subsection320/im_10.txt
Writing im_10_10_000000000000.jpg
Writing im_10_11_000010000000.jpg
Writing im_10_12_000001000000.jpg
Writing im_10_13_000000100000.jpg
Writing im_10_14_000000010000.jpg
Writing im_10_15_000000001000.jpg
Writing im_10_16_000000000100.jpg
Writing im_10_17_000000000010.jpg
Writing im_10_18_111000000000.jpg
Writing im_10_19_000100000000.jpg
Processing: im_11.png
/content/subsection320/
im_11.png
Processing: im_11.txt
/content/subsection320/im_1

In [16]:
import os

os.rename("train_data", "train_data_aug3")

Or dont augment and jsut prepare test/val dataset

In [17]:
aug0 = SciAugment(aug_type="no_augment")

New instance of SciAugment.
Selected augmentation type: no_augment
No augment setting will only divide images and labels to train_data folder.


Version: 0.2.0


Selected augmentation:
no augmentation


Or also apply augmentation per channel

In [18]:
output_image_format = ".jpeg"
aug2.augment_data_per_channel(
    images_path=input_images_folder,
    image_format=input_image_format,
    output_image_format=output_image_format,
)

Num of files: 63
Processing: im_1.png
/content/subsection320/
im_1.png
Processing: im_1.txt
/content/subsection320/im_1.txt
Writing im_1_0_000000000000_ch-1-NA
Writing im_1_1_000000000000_ch-2-NA
Writing im_1_2_000000000000_ch-3-NA
Writing im_1_3_000000000000_ch-1-MN
Writing im_1_4_000000000000_ch-2-MN
Writing im_1_5_000000000000_ch-3-MN
Writing im_1_6_000000000000_ch-1-B
Writing im_1_7_000000000000_ch-2-B
Writing im_1_8_000000000000_ch-3-B
Writing im_1_9_000000000000_ch-1-RB
Writing im_1_10_000000000000_ch-2-RB
Writing im_1_11_000000000000_ch-3-RB
Writing im_1_12_000000000000_ch-1-SP
Writing im_1_13_000000000000_ch-2-SP
Writing im_1_14_000000000000_ch-3-SP
Writing im_1_15_000000000000_ch-1-GN
Writing im_1_16_000000000000_ch-2-GN
Writing im_1_17_000000000000_ch-3-GN
Writing im_1_18_000010000000_ch-1-NA
Writing im_1_19_000010000000_ch-2-NA
Writing im_1_20_000010000000_ch-3-NA
Writing im_1_21_000010000000_ch-1-MN
Writing im_1_22_000010000000_ch-2-MN
Writing im_1_23_000010000000_ch-3-MN
W

  segments = skimage.segmentation.slic(image, n_segments=n_segments, compactness=10)


[1;30;43mVýstupní stream byl oříznut na posledních 5000 řádků.[0m
Writing im_12_717_000100000000_ch-1-GN
Writing im_12_718_000100000000_ch-2-GN
Writing im_12_719_000100000000_ch-3-GN
Processing: im_13.png
/content/subsection320/
im_13.png
Processing: im_13.txt
/content/subsection320/im_13.txt
Writing im_13_720_000000000000_ch-1-NA
Writing im_13_721_000000000000_ch-2-NA
Writing im_13_722_000000000000_ch-3-NA
Writing im_13_723_000000000000_ch-1-MN
Writing im_13_724_000000000000_ch-2-MN
Writing im_13_725_000000000000_ch-3-MN
Writing im_13_726_000000000000_ch-1-B
Writing im_13_727_000000000000_ch-2-B
Writing im_13_728_000000000000_ch-3-B
Writing im_13_729_000000000000_ch-1-RB
Writing im_13_730_000000000000_ch-2-RB
Writing im_13_731_000000000000_ch-3-RB
Writing im_13_732_000000000000_ch-1-SP
Writing im_13_733_000000000000_ch-2-SP
Writing im_13_734_000000000000_ch-3-SP
Writing im_13_735_000000000000_ch-1-GN
Writing im_13_736_000000000000_ch-2-GN
Writing im_13_737_000000000000_ch-3-GN
Writi

In [19]:
# import os
# os.rename("train_data","train_data_aug2ch")

Now we have train_data directory ready to be used as training an validation with YOLO_v5 model. You can find preactical examples here: https://github.com/martinschatz-cz/SciCount

Zip up prepared train_data folder with augmented images and YOLO anotations for backup.

In [20]:
import shutil

shutil.make_archive("train_data", "zip", "/content/", base_dir="train_data")

'/content/train_data.zip'

Install and apply watermark

In [21]:
!pip install watermark

%load_ext watermark

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting watermark
  Downloading watermark-2.3.1-py2.py3-none-any.whl (7.2 kB)
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 8.6 MB/s 
Installing collected packages: jedi, watermark
Successfully installed jedi-0.18.1 watermark-2.3.1


In [22]:
%watermark -v -p albumentations,opencv-python-headless,imgaug,cv2

Python implementation: CPython
Python version       : 3.7.14
IPython version      : 7.9.0

albumentations        : 1.3.0
opencv-python-headless: not installed
imgaug                : 0.4.0
cv2                   : 4.1.2



In [23]:
!pip freeze > req.txt