## SPLIT TRAIN TEST
In this notebook, we provide code and script, that helped us split images into train and test sets    
Because of sun's activity cycle (every 11 years), we wanted to ensure that we won't have many "darker" sun's disks in train dataset and small amount of "brighter" sun's disks images or vice versa.    
For our training, we could use images from range 2011 - 2021. We randomly selected 80% of images from each year to train set and remaining 20% images to test set. Therefore test set should include 80% of images from year 2011, 80% of 2012 and so on till 2021. Test set should contain 20% of images 2011, 20% of 2012 and so on.    
Later while training SCSS-Net we used 20% of training dataset for validation dataset

In [1]:
from PIL import Image, ImageDraw
from PIL import Image, ImageOps
from tqdm.notebook import tqdm, trange
import glob

In [2]:
def split(paths_to_split, save_path_train, save_path_test):
    """function to randomly select 80% of images of given year for training set, and 20% for test set.

    Args:
        paths_to_split (string): path to folder of images from one year
        save_path_train (string): path to folder where to save images for train set
        save_path_test (string): path to folder where to save images for test set
    """
    paths = []
    # glob randomly shuffles images from folder of all year round images
    for path in glob.glob(paths_to_split):
        paths.append(path)

    train_imgs = paths[ : int(len(paths) * 0.8) ]
    test_imgs = paths[ int(len(paths) * 0.8) : len(paths)]

    for path in train_imgs:
        img = Image.open(path)
        name_clean = path[-29:]
        download_path = save_path_train + name_clean
        img.save(download_path)

    for path in test_imgs:
        img = Image.open(path)
        name_clean = path[-29:]
        download_path = save_path_test + name_clean
        img.save(download_path)

In [5]:
# paths
imgs = "/Users/majirky/Desktop/ar_spoca/ARspocaBWimgs/*"
save_path_train = "/Users/majirky/Desktop/1BP/scss-net/data/ARtrain_masks/"
save_path_test = "/Users/majirky/Desktop/1BP/scss-net/data/ARtest_masks/"

In [6]:
# script
split(imgs, save_path_train, save_path_test)

## COPY

In [24]:
def compare_masks_imgs(path, save_path, all_masks):
    image_name_clean = path[-29: -4]
    image_name_clean = image_name_clean + ".png"

    if image_name_clean in all_masks:
        img = Image.open(path)
        download_path = save_path + image_name_clean
        img.save(download_path)

In [28]:
all_masks = []
for name in glob.glob("/Users/majirky/Desktop/1BP/scss-net/data/ARtrain_masks/*"):
    image_name_clean = name[-29:]
    all_masks.append(image_name_clean)

In [29]:
all_masks = sorted(all_masks)

In [30]:
year_to_find = 2011
save_path = "/Users/majirky/Desktop/1BP/scss-net/data/ARtrain_imgs/"


for i in range(11):

    resource_path = f"/Users/majirky/Desktop/slnko_ar/arfotky_{year_to_find}/*"
    print(f"year_of_splitting: {year_to_find}")

    imgs_paths = []
    for name in glob.glob(resource_path):
        imgs_paths.append(name)

    for path in tqdm(imgs_paths):
        compare_masks_imgs(path, save_path, all_masks)

    year_to_find = year_to_find + 1

rok hladania: 2011


  0%|          | 0/349 [00:00<?, ?it/s]

rok hladania: 2012


  0%|          | 0/318 [00:00<?, ?it/s]

rok hladania: 2013


  0%|          | 0/336 [00:00<?, ?it/s]

rok hladania: 2014


  0%|          | 0/334 [00:00<?, ?it/s]

rok hladania: 2015


  0%|          | 0/342 [00:00<?, ?it/s]

rok hladania: 2016


  0%|          | 0/349 [00:00<?, ?it/s]

rok hladania: 2017


  0%|          | 0/207 [00:00<?, ?it/s]

rok hladania: 2018


  0%|          | 0/173 [00:00<?, ?it/s]

rok hladania: 2019


  0%|          | 0/282 [00:00<?, ?it/s]

rok hladania: 2020


  0%|          | 0/326 [00:00<?, ?it/s]

rok hladania: 2021


  0%|          | 0/303 [00:00<?, ?it/s]

## CONTROL

In [100]:
paths1 = []
for name in glob.glob("/Users/majirky/Desktop/1BP/scss-net/data/test_masks/*"):
    image_name_clean = name[-29:]
    paths1.append(image_name_clean)

In [101]:
paths2 = []
for name in glob.glob("/Users/majirky/Desktop/1BP/scss-net/data/test_imgs/*"):
    image_name_clean = name[-29:]
    paths2.append(image_name_clean)

In [102]:
for path in paths1:
    if path not in paths2:
        print(path)