# Fractal Generative Adversarial Network

In [None]:
#Just here to check, if the right python version is chosen
!pip -V               #Note that by installing another python version pip can be pip3, pip3.7 or pip3.9
!python -V            # This is the default python version, but not neccesarily used by this jupyter notebook
import sys
print(sys.executable) # This is the python version used by jupyter
print(sys.version_info)


## 1 Abstract

Recent advances in artificial intelligence and machine learning led to many different network architectures and algorithmic procedures, which improve accuracy and computational efficiency on processed target problems. 
Due to the collaboration of gradient descent, optimizers, losses and parallel computing, artificial intelligence is prone to solve tasks which handwritten algorithms can hardly achieve.
While the nature of a neural network is to fit a given input to an output, a feature always has to have a given label it will try to predict; named supervised learning.
This fact means that for every observation a solution has to be provided, which the network will try to predict. The error, described for example by the accumulated element-wise distance between the given value and the predicted value is returned to the optimizer, which adjusts the weights & biases of the current machine learning model to lower its loss. 
Problematic can be that fact, when no predictions, aka. labels are available to train the model on. Often there aren’t any datasets, where labels are contained within the dataset. Then often the labels have to be created painstakingly by hand for every element of the dataset, which results in hours of work just to dataset preparation.
Autoencoder networks work around this supervised learning problem while utilizing two neural networks. One network tries to compress the given input while passing it to a constrained output space. The second network tries to generate the input from the compressed/encoded version. This leads to encoded messaging and compression for signal transmission limited by bandwidth. 
By introducing noise or masks to the input, a model can be made with noise reduction, or neural inpainting functionalities incorporated. 
Adding another network called the discriminator, which tries to distinguish between original data and generated data, the overall network accuracy can be improved in certain tasks. This network structure is called GAN, generative adversarial network, or sometimes called adversarial autoencoder.
This work implements recent developements of neural network architectures like residual neural networks and inception networks paired with fractal auxillary indicators into a custom neural network architecture. This approach generates many different network architectures from initialization parameters. The generative approach is enabling the search for optimal network architectures in random and evolutionary procedure.


## 2 Introduction

The theory goes that if knowledge or pattern can be acquired through a dataset containing entries then the information can be compressed and fit onto optimized dimensions without knowing them first. The data can be compressed, encoded, be cleaned of noise and unwanted artifacts with the same algorithmic architecture making it a versatile tool.
A problem in generating these new images was, if generated an arbitrary output, the user could not decide or modify what happens in the generated picture. By introducing spacial box counting feature extraction [1] into the networks, the lack of artistic control could be eliminated, by showing the network a coarse version of the image that should be generated. So the network knows, where what structure/topologies should occur. 

Since this kind of unsupervised automatic learning has many parameters to tweak the algorithm's performance, an evolutionary generative algorithm is used to find the best network architectures to tackle the given task.

This work postulates a new methodology of generative adversarial networks coupled with an evolutionary network generator and tries to evaluate the potential benefit of it.
The focus of this work lies on training and testing of machine learning models, which custom network architectures are generated and chosen by an evolutionary network generator with build in functionality of denoising altered mnist images as an example dataset.
Other functionality like neural inpainting and superresolution is already incorporated into the software, but not tested well enough and so forth not content of this work.


## 3 Experimental Setup and Methods

This setup utilizes a modern 64 bit computer on a linux operating system. The system consists of an ryzen 3900x 12 core processor, 64 Gb of RAM and an Nvidia GTX 1080 Ti with 11 Gb of VRAM. To train the neuronal networks in a reasonable amount of time, a capable graphics card should be available, but not necessary to execute the code. To test the trained networks, just a cpu with sufficient ram is necessary. 


### 3.1 Prequisites

To execute the code some additional modules have to be installed preferably via pip, a python package manager. The commands install all necessary libraries. Numpy is used to handle multidimensional arrays within python and is used by many other libraries as default. Numba is an alternative high performance c compiler for python, which speeds up some functions in dataset assembly significantly.  Tqdm implements an easy to use progress bar to visualize progress in the terminal. Matplotlib is a plotting library able to generate charts, diagrams, or display arrays as images. The library pillow, aka. PIL is a python imaging library to handle image modification efficiently.
Prettytable is installed to have a handy way to generate readable tables within the terminal window. Hyperopt is a free python library to optimize given hyperparameters, which is used to find the optimal machine learning parameters, minimizing loss and training time.
The used machine learning library is called pytorch and is installed with gpu capabilities in mind or just with cpu capability. The needed python version is at least 3.6 and at maximum 3.9 and when using a Nvidia GPU, CUDA 11 should be installed. 



In [None]:

try:
    print("Try windwows installaion method")
    !pip install tk
except:
    print("Try unix installation method")
    try:
        print("Try debian installation method")
        !sudo apt install python-tk
    except:
        print("Try arch installation method")
        !sudo pacman -S tk

!pip3 install numpy numba tqdm matplotlib pillow prettytable hyperopt 

!pip install pandas filetype

try:
    !pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
    #!pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
except:
    !pip install torch==1.10.1+cpu torchvision==0.11.2+cpu torchaudio==0.10.1+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

#just an issue at my side
!pip3 install --upgrade pytz

        

### 3.2 Modules and hyperparameters

#### 3.2.1 Modules

For handling some features, some modules have to be imported to enable some functions. The time module is imported to measure the time-dependent performance of the algorithm. Also the numpy library is imported to handle multidimensional Arrays and Tensors. 
To retrieve some System specific parameters in functions the library sys is imported.
To use many operation system specific functions, such as modifying files and list directories the library os is imported.
Tqdm to show a terminal progress bar, matplotlib to show plots and to load and modify images the python image library is loaded.
To have convenient file dialogs to choose the data going into the network the library tkinter is utilized to save and load network models. To save and load datasets the module pickle is chosen. Since the Dataset can be created with multiple processes, the internal python library multiprocessing is needed.
The machine learning part and dataset will be handled by pytorch by importing some classes and functions.
Iterating through python dictionaries and generating network parameter chains the internal module itertools is used and to create a dictionary with more than one keyword the default dictionary is imported.
Evolutionary algorithms often require random numbers for initialization requiring the random module. Linecache is imported to print detailed information if an error occurs.

At last from pathlib the class path is imported to create a variable for the current working directory, to be able to work on various operating systems with different path structure.

Documentations of all used python libraries are found in chapter 6 at [2-19]



#### 3.2.2 Self written modules

This algorithm comes in mind to be able to tailor the model by enabling changing the hyperparameters and other useful options while training or testing. Within the repository the 'config.py' file contains the necessary structure to set options on the run. To reload an imported libary or file because the file changed after importing, the reload function of the importlib module has to be loaded. 
To alter the options or hyperparameters set the ‘ON_OFF_Switch’ to true to update the values to these shown below within the config file. The parameters will change every 50 batches of data while training and every step while testing.

To create and handle datasets a datahandler class was written before which is imported through the loader module, which was developed in advance of this work and will not be discussed further. The generatable dataset contains the chosen pictures and derived from these the spacial box counts and lacunarities stored in a numpy array format.
To extract spacial box counts from images and arrays, the self written module boxcount feature extraction has to be loaded, which was created in [20].

The generative adversarial network is contained within the’ FractalGAN’ module and can be imported accordingly and the computation device is fetched. If the chosen device is the central processing unit, but a gpu is still available, the gpu will be disabled manually.

The evolutionary network generator is contained in the local module’s ‘EvoNet’ class named ‘Network_Generator’. 
 


#### 3.2.3 Hyperparameters and variables

Since the argument parser isn't feasible when using Jupyter notebook an option object is created to handle all all parameters on a global level.
The hyper-parameter ‘n_cpu_count’ changes the number of processes used to fully utilize CPU power while creating the dataset. 
The machine learning hyperparameters are divided by the batch size, learning rate, and the first and second moment of decay of the learning rate. The compression factor is set at the size of the latent dimension. The image size is set to ‘(32, 32)’ and can be chosen from numbers n= 2^x from x = 1, 2, 3, …, ∞. 
Since the dataset will consist of grayscale images and all color images will be separated by their  channel, the option ‘channels’ is set to 1. 
Generating images while training the sample interval can be set to specific steps when sample images are generated.
To enable the pass through of residual information through the neural network the option residual can be turned to true. When 'False' the normal convolutional neural network layers will be used.
Initial hyperparameter optimization is handled bei the Python library hyperopt and specific hyperparameters could be injected when 'opt.hyperopt = "off"' is chosen.
To toggle the discriminator network, the option autoencoder can be turned to ‘off’. When turned to ‘on’ the discriminator network is disabled and a variational autoencoder is chosen.

To restrict the size of the neural network and its memory consumption, the number of maximum parallel layers and the maximum length can be set with numbers of integers. The number of latent spaces can be chosen but it's set to '1' for simplistic reasons.
The bigger the latent dimension, image size, channels, the number parallel and length of the network, the more RAM and VRAM is used to save all weights and biases of the neural networks.

While modifying the input data with noise or mask, functions of denoising and inpainting can be implemented. To be able to denoise your data set the noise bool turned to ‘True’ and by choosing a standard deviation the amplitude of the noise is selected.
To mask the incoming data on a specific position turn the mask bool to ‘True’ and define a mask mean consisting out of x and y coordinates and choose the mask lateral x and y dimensions. This masking feature is already implemented, but will be no further subject of this work.



In [None]:
#IMPORTS
import time
#import argparse   #Argument parser just available when executing a .py script/program
import numpy as np
import sys
import os
from tqdm import tqdm
import matplotlib.pyplot as plt
from PIL import Image                                  # Python Image Library

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials  #hyperoptimization libary

#Import tkinter to be able to use choose file/directory prompt in window-mode
from tkinter import *
from tkinter import filedialog
from tkinter.filedialog import askdirectory

import pickle
import multiprocessing 

#Importing nessecary modules for creating machine learning networks, such as 
import torch                                    #Pytorch machine learning framework.
import torch.nn as nn                           
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision.utils import save_image
from torch.utils.data import DataLoader
from torchvision import datasets
from torch.autograd import Variable
import torch.optim as optim

import itertools                                       #To iterate through dictionaries and generate network chains to pass to an optimizer
from itertools import permutations

# EVONET imports
np.set_printoptions(threshold=sys.maxsize)
from collections import defaultdict                    #Used to create dictionaries with more than one keyword
import random                                          #Functions about random numbers
import linecache                                       #To get detailed Error description


from pathlib import Path
#For the directory of the script being run:
FileParentPath  = os.path.abspath('')
print(f"Working Directory is: {FileParentPath}")
#FileParentPath = str(Path(__file__).parent.absolute())  #Alternative when executing .py script.
import pathlib              #Import pathlib to create a link to the directory where the file is at.

#Import own scripts
from importlib import reload    # to reload config.py when file changes
import config

import Loader       #loads the data and converts it to the dataset
import BoxcountFeatureExtr_v2 # Calculates the Boxcount structure map and appends it to the Data as a Lable so later the gpu version can calc and gradient decend
import FractalGANv13 as FractalGAN

from EvoNet import Network_Generator
print("IMPORTS DONE \n")


# SET VARIABLES AND OPTIONS in original code when executing the .py script
'''
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--MainFunction", type=str, default="NOT any", help="number of epochs of training")
parser.add_argument("--n_epochs", type=int, default=5, help="Type: boxcounting(bc),  mkDataset(mkdata), ConvNet_BoxCounting(cnn_bc), DataExplorer(da) , or FractalGAN(gan) :")
#parser.add_argument("--hyperopt", type=str, default="on", help="type --hyperopt=off to use the given/default arguments instead of Hyperparameteroptimization")
parser.add_argument("--batch_size", type=int, default=2, help="size of the batches")
parser.add_argument("--lr", type=float, default=0.001, help="adam: learning rate")
parser.add_argument("--b1", type=float, default=0.9, help="adam: decay of first order momentum of gradient")
parser.add_argument("--b2", type=float, default=0.999, help="adam: decay of first order momentum of gradient")
parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
parser.add_argument("--latent_dim", type=int, default=16, help="dimensionality of the 2d latent code like AxB, 8x0 by default 0 has to be calced into the right format")
parser.add_argument("--img_size", type=tuple, default=(32,32), help="size of each image dimension in (y,x), HAVE TO BE DEVISABLE BY 2... 2,4,8,16,18,...212..512")
parser.add_argument("--channels", type=int, default=1, help="number of image channels")
parser.add_argument("--sample_interval", type=int, default=10000000, help="interval between image sampling")
parser.add_argument("--autoencoder", type=str, default="off", help="type --autoencoder=on to use just the encoder/decoder without the discriminator")
parser.add_argument("--verbosity", type=bool, default=False, help="Set verbosity True/False for display results, or show additional data for debugging")
parser.add_argument("--device", type=str, default="gpu", help="Set device to cpu/gpu  default is gpu")
'''

MainFunctionDiscription = " Type: mkDataset(mkdata), or FractalGAN(gan) : "

# Create a option object to store all variables
class OptionObject:
  def __init__(self):
    self.init = True

opt = OptionObject()

opt.device = FractalGAN.get_device()
if opt.device == "cpu":
    print("CPU USED FOR NN")
    time.sleep(1)
    os.environ["CUDA_VISIBLE_DEVICES"]="-1" #To disable GPU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

#un-comment to overwrite the used device
#opt.device = 'cpu'
#opt.device = 'gpu'
    
opt.FileParentPath = FileParentPath

#ADD ALL HYPERPARAMETERS
opt.n_cpu = os.cpu_count()
#opt.n_cpu = 6  #Un-comment to overwrite used cpu-cores 
opt.batch_size = 2
opt.lr = 0.001
opt.b1 = 0.9
opt.b2 = 0.999
opt.latent_dim = 16
opt.img_size = (32,32)
opt.channels = 1
opt.sample_interval = 1000000

opt.verbosity = False
opt.residual = True    #if True, a residual flows through the layers, if False: Just normal cnn layers
opt.hyperopt = "on"    #If "on", automatic hyperparameter optimization is enabled. If "off",  specific hyperparameters have to be chosen 
opt.Population_Init_Threshold = 200   # the initial population for random network architecture search
opt.autoencoder = "on"   #  if on discriminator  network is disabled,  if off discriminator network is enabled.
#opt.autoencoder = "off"

opt.Max_parallel_layers = 4  #  choose the max width of the network.
opt.Max_Lenght = 6           # choose the maximum length of each network
# attention if parallel layers and maximum length values are too high,  high memory consumption
opt.No_latent_spaces = 1
opt.img_shape = (opt.channels, opt.img_size[0], opt.img_size[1])

###Inpainting parameters describing in what form the input data gets altered for the network to learn to undo the alteration.

opt.noisebool = True
#std = 0.001     #moderate disturbance       #cant see anything at all
#opt.std = 0.01      #Hard Disurbance
#opt.std = 0.001     #moderate disturbance
opt.std = 0.0001    #light disturbance
opt.std_decay_rate = 0 

# if maskbool is None, Random masking is applied
opt.maskbool = False
#maskmean       x                       Y
opt.maskmean = opt.img_size[1]/2 , opt.img_size[0]/2    #just the center for exploring  
#               x. Y                    
opt.maskdimension = 50,25
opt.UpdateMaskEvery = 8      # choose how often to update mask. each time the mask will be placed randomly


opt.LetterboxBool = False
opt.LetterboxHeight = 30

opt.PillarboxBool = False
opt.PillarboxWidth = 10
opt.superres = False                
opt.InpaintingParameters = {
                    'opt': opt,
                    'superresolution': (opt.superres, 2),
                    'noise': (opt.noisebool, opt.std, opt.std_decay_rate),
                    'mask': (opt.maskbool, opt.maskmean , opt.maskdimension),
                    'Letterbox': (opt.LetterboxBool, opt.LetterboxHeight),
                    'Pillarbox': (opt.PillarboxBool, opt.PillarboxWidth),

                }

import pprint
print("Chosen options are:")
pprint.pprint(vars(opt))

## 3.3 Helper functions

Some recurring helper functions are to define, which are an extended form of displaying an error and to make a new dataset a create dataset worker is implemented.



In [None]:

def PrintException():
    exc_type, exc_obj, tb = sys.exc_info()
    f = tb.tb_frame
    lineno = tb.tb_lineno
    filename = f.f_code.co_filename
    linecache.checkcache(filename)
    line = linecache.getline(filename, lineno, f.f_globals)
    print('EXCEPTION IN ({}, LINE {} "{}"): {}'.format(filename, lineno, line.strip(), exc_obj))



def Create_Dataset_Worker(opt,DataHandler):
    DataHandler = Loader.make_train_test_Datasets_multicore(DataHandler,opt)    #Generate Dataset with BOXCOUNTS on the run in multicore mode
    #DataHandler = Loader.make_train_test_Datasets(DataHandler,opt)    #Generate Dataset in singlecore/Multithread mode
    #print(DataHandler.Dataset)
    print(f'{len(DataHandler.Dataset)} Entrys are in generated Dataset')
    print("Dataset created!")
    print("ID of Create_Dataset_Worker: {}".format(os.getpid())) 






## 3.4 Methodes and used Algorithms

By implementing recent advances into the generative adversarial network, training accuracy should be relatively high with constrained training time and avoiding problems like the vanishing gradient. The network architecture consists of a convolutional neural network.
Additionally the proposed network architecture tries to implement network features from two major research papers, called the residual network described in [21] and the inception network described in [22].
Tackling the problem to choose, where defined features are in what position of the generated output, a recently developed spacial box counting algorithm [20] is implemented into the network architecture.

### 3.4.1 Convolutional Neural Networks

Convolutional neural networks, like depicted in figure 1, are a class of artificial neural networks, which calculates an output with a kernel sliding in steps of a given stride over the input and passing the solution to feature maps. These feature maps, because they are exploding in size, are shrinked in size by pooling layers. Pooling layers with a given size striding over the feature maps and pooling the feature maps for example by maximum/minimum value, or by the average of the given array. In the proposed architecture pytorch’s adaptive average pooling is applied.
In this particular architecture there are also layers of gaussian noise implemented. The noise helps the network often better to generalize and get better accuracies [23].
To further improve the generalization capabilities of the networks, dropout layers are implemented, so that networks don't just rely on the same network nodes. By turning some of them randomly off and on while training, the network has to find a workaround, when important nodes are disabled.
For the network activation function a leaky relu loss is implemented. It is prone to converge fast to a given problem's solution with acceptable accuracy [24]. 
To further stabilize the network after the activation function, a batch normalization is possible to inject as the last layer. This method is used to make networks more stable and performant by normalizing the layers outputs by re-centering and re-scaling across the batches of the neural network. [25]

![ConvNetURL](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png "Convolutional Neural Network")

Figure 1: Schematic representation of a convolutional neural network 
Source: <https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png>

#### 3.4.1.1 ResNet

The residual neural network differs from the normal network by not only passing the calculated solution from the input further to the output, but also adding the input of the current layer to the calculated output. In the original residual network, a few convolutional neural network layers are packed into a residual block with standard cnn layer functionality. A residual is passed on the network chain as input for the next residual block. This architecture mimics the structure in the human brain.
In machine learning the main reason to implement this kind of network is to mitigate the vanishing gradient problem, where when adding more and more layers to the network, deep learning models tend to have higher training error.
The main difference between the original structure described in [21] and the proposed network is that the residual is passed from every layer to the next one.

![ResNetURL](https://upload.wikimedia.org/wikipedia/commons/9/98/ResNet50.png "Residual Neural Network")

Figure 2: Schematic representation of a residual neural network 
Source: <https://upload.wikimedia.org/wikipedia/commons/9/98/ResNet50.png>

#### 3.4.1.2 Inception Net

The inception network originating from the Google Net [22] implements more than one convolution layer with its layer. A given input is fed into convolution layers with various kernel sizes. The Idea is that a 1x1 convolution filter can only detect features within its given size. So the input is also fed to for example 3x3 and a 5x5 convolution, so that each filter is able to detect a specific feature with its size respectively.
After the parallel layers of convolution operations, a filter is applied to bring the individual outputs to the same dimensionality, being able to concatenate those to a tensor for passing it to the next layer block of the network.


![GoogleNetURL](https://production-media.paperswithcode.com/methods/Screen_Shot_2020-06-22_at_3.22.39_PM.png "Inception Neural Network")

Figure 3: Schematic representation of an inception neural network 
Source: [22]


#### 3.4.1.3 Spacial box counting 
The spacial box counting algorithm was developed in advance of this work in [20] and is based on the standard box counting explained at [26].


‘The function is executed with a 2D, picture-like array with intrinsic values ranging depending on the given data format. The maximum value of the intrinsic dimensionality depends on the numeric encoding of the data. The maximum intrinsic value for hexcode is 15, for a normal 8-bit grayscale image it is 255, while a high dynamic range picture with 12 bit has a value of 4095. The z-boxcounting gets a chunk of the original picture or array so x and y borders are defined externally. The input array is now seen as a volume with lateral borders (x, y) and the height defined by the maximum intrinsic value into the z direction, hence the name z-boxcounting. The algorithm counts the sum of all, at least partially, filled boxes in a fixed grid scan scaled by the specified box size. The actual count ranges inversely proportional to its chosen box size and can get really high with a 2³ box. To get the same value range for each box size, the counted boxes are divided by the total number of boxes resulting in the boxcount ratio. This ratio ranges from zero, with no counted boxes, to one, describing that all possible boxes were counted.’ (Peters 2021: 3.3.1)


![Laser treated surface](https://raw.githubusercontent.com/ollimacp/spacial-boxcounting-cpu-gpu/main/0Data/Document_images/lacunarity_formula.png "Lacunarity formula for indicating spacial heterogenity")

Figure 4: Lacunarity formula for indicating spacial heterogenity
Source: [20]



‘At the same time a list of counted values is recorded per box, so the lacunarity, or spacial heterogeneity can be calculated. The lacunarity is calculated by taking the standard deviation σ of the list of counted values for each box and is divided by the mean µ of it [2]. So that all output lacunaritys are positive the former term is taken to the power of 2. This results in a property where lacunaritys against zero describing a homogeneous structures and by increasing lacunarity spacial heterogeneity emerges.’ (Peters 2021: 3.3.1)




![Laser treated surface](https://raw.githubusercontent.com/ollimacp/spacial-boxcounting-cpu-gpu/main/0Data/MISC/17_3_Rand.bmp "Laser treated surface as image input for spacial box counting")

Figure 5: Laser treated surface as image input for spacial box counting
Source: [20]

![Spacial Box Counting](https://raw.githubusercontent.com/ollimacp/spacial-boxcounting-cpu-gpu/main/0Data/generated_imgs/17_3_Rand.png "Spacial box counting output depicting spacial box count ratio and lacunarity arrays with boxsizes from 2 to 16")

Figure 6: Spacial box counting output depicting spacial box count ratio and lacunarity arrays with boxsizes from 2 to 16
Source: [20]

The resulting arrays can be used to analyze and compare data to another by summation of all the box count ratios and lacunarity for each specimen. The box count ratios and lacunaritys can be used for similarity search and sorting of any kind of data without any labels by spacial complexity and homogeneity/heterogeneity. The similarity is just being the weighted sum of the differences between bcr and lacunarity of two specimens.
The weights are dependent on the user's intentions. The scale of the searched features can be weighted by preferring arrays with a specific box size. Small box sizes are aware of fine structural components while big box sizes are only observant of coarse structures.


### 3.4.2 Variational Autoencoder

A variational autoencoder is a combination of two networks which is the encoder network and decoder network and is explained further in [27]. This and the following methods fall into the unsupervised machine learning category, where no labels, aka. solutions that the network has to predict have to be delivered. Instead the features/data is fed to the encoder network. The output size of the encoder network has to be smaller than the input size forcing the network to compress the information and thus hopefully extract useful information from the data, so that the decoder network can reconstruct the original data from it. The constrained output of the encoder network is the input of the decoder network and is called latent space or latent variable. These types of networks can be used to reconstruct the original from distrurbed data, for example by noise or masked regions of the input images.

![Variational Autoencoder]( https://upload.wikimedia.org/wikipedia/commons/4/4a/VAE_Basic.png "Schematic depiction of a variational Autoencoder")

Figure 7: Schematic depiction of a variational Autoencoder
Source: https://upload.wikimedia.org/wikipedia/commons/4/4a/VAE_Basic.png




### 3.4.3 Generative Adversarial Network (GAN)

Generative adversarial networks describe a combination of a network which is used as a content generator, often an autoencoder and an adversarial discriminator network, which tries to distinguish between generated content and the dataset's original content which was proposed in [28].

![GAN]( https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/A-Standard-GAN-and-b-conditional-GAN-architecturpn.png/800px-A-Standard-GAN-and-b-conditional-GAN-architecturpn.png?20211004091917 "Schematic depiction of a generative adversarial network")
Figure 8: Schematic depiction of a generative adversarial network
Source: https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/A-Standard-GAN-and-b-conditional-GAN-architecturpn.png/800px-A-Standard-GAN-and-b-conditional-GAN-architecturpn.png?20211004091917



### 3.4.4 Adversarial Autoencoder

Building upon the generative adversarial networks proposed adversarial autoencoder described in [29], which tries to match the encoder's posterior latent distribution to an arbitrary chosen prior distribution. This leads to an autoencoder network, which learns to encode the input data into a parameterized latent distribution enabling semi-supervised classification and disentangling style and images’ content. So the images can for example be fitted onto a gaussian or other distribution to the user's intent. Then the data can be analyzed by generating new images with the decoder network by sampling this chosen latent distribution. This advantage is in contrast to the variational autoencoder, whose latent distribution is variational and can’t be sampled and analyzed as easily.


### 3.4.5 FractalGAN

The proposed FractalGAN is based on the combination of generative adversarial networks, variational and adversarial autoencoders, inception networks, residual networks and spacial box counting described in [20-29]. 
The network architectures are generated by a specific list of chosen elements and are optimized through an evolutionary algorithm. This methodology is somewhat comparable to the NEAT-algorithm [30], but differs in methodology and the way of execution.


#### 3.4.5.1 Layer description

![Layer Description]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/LayerDescription.png "LayerDescription")

Figure 9: Layer description for generating a neural network

Before getting into the fine details about the algorithm, some things have to be made clear.
Like seen in figure 9 the architecture of the neural network is generated out of a list called layer description. A so-called layer of the proposed neural network consists of sublayers. The first possible sublayer is a gaussian noise function, which can be turned on or off by setting it 1 or 0. The gaussian noise helps the neural network to generalize better and can be conducive for a more robust model shown in [23].
The next sublayer describes the layer magnification and is calculated by dividing the output size in x or y by the input size of the layer. This is necessary, because the compressed information leads to a latent space with a certain size. This size has to be the same for the input of the decoder network. Also the decoder network has to be finished with the same size as the input image. The layer magnification helps to keep track of all array sizes from start to finish. 

The third element is called parallel layers and describes the multiplicator for the kernel size of the convolutional layers. The corresponding kernel, stride and pooling values have to be calculated via the ‘sample_layerdict’. By passing a list of multiplicators, parallel layers with different kernel sizes, strides values and pooling layers will be generated. The ‘sample_layerdict’ dictionary is necessary to make sure that output sizes of the input arrays of each parallel layer are size wise compatible with another. The output of the parallel layers can be concatenated in the third dimension by stacking them atop of another. This just can work, when the lateral dimensions of all outputs of a layer are equal.
The third dimension is called the channel dimension, which are for example the rgb values of the input picture. The channel dimension is specified in a list of integers. The first integer is the input channel size and the second one is the output channel size. The output channel size of the current layer has also been the input channel size of the next layer. The absolut first channel size in the encoder is the channels of the input images. Since there are only grayscale images, the value is 1. The last channel value of the encoder is the number of latent spaces, which is also 1 for simplistic reasons. At decoder level this is in the opposite direction.
To help the networks generalize better and don’t rely on just specific network nodes a dropout layer can be implemented which is described further in [31] by a list of values. The first value turns the dropout on or off by setting it '1' or '0'. The second value describes the fraction of how many of all network nodes are muted randomly while training. While testing or evaluating these layers are off by default, because it just helps against overfitting during training the network and would decrease accuracy with no benefit while testing the network.
The last element of the layer description is the batch normalization switch, which is enabled by setting it to 1. This kind of network operation tries to speed up and stabilize neural networks by transforming the mean and variances of the input arrays across the batches, aka. the number of input arrays or images per training step. This only can be activated, when the batch size is greater than 1.

Each element of the already discussed features is encased in a list and establishes a here called layer. To add new layers, another list of accurately chosen values has to be added onto the layer description. If done correctly, a network architecture can be generated by the custom written encoder, decoder and discriminator network. This layer description defines the base for enabling random and evolutionary search for efficient network architectures.



##### 3.4.5.2 Encoder Network

The encoder class is initialized by passing all necessary values from the option object and the layer description to it. A for loop begins to iterate over the individual layers with their sublayers like gaussian noise, layer magnification, parallel layers, channel list, dropout and batch normalization. 
The aggregate magnification is calculated by multiplying it by the layer magnification starting at 1 to keep track of the current magnification status. This is there to check if the network has the right architecture to result in the correct predefined latent dimension size and to evaluate where the spacial box counts have to be added to the network. 

The spacial box counts developed in [20] are defined by the spacial box count ratio, which is an indicator for fractal dimensional complexity and the lacunarity, which is an indicator for dimensional heterogeneity. The hypothesis is that by adding those within the network, it will be “reminded” on each scaling level where what structure or feature is within the image. 
Afterwards the channels have to be initialized, which values are delivered by the channel list within the layer description.  Note that because of the usage of inception layers, or here called parallel layers, that after the initial layer the input channels of the current layer are the multiplication of the previous layer output channels and the previous layer parallel layer count. Now a for loop initializes all parallel layers one after another taken from the layer description.

If the option ‘residual’ is set to ‘True’, the input is morphed with pytorches' adaptive average 2D pooling layer to the output size passing it on to the next layer while bypassing the parallel layers, which are combined with the next layer by channel and element-wise addition. The calculated sum is divided in half ensuring that every element of the output stays below 1, which resembles the upper value range boundary of neural networks in general. If the number of output channels are less than the input channels, the overhang of residual channels is discarded.

Continuing with the layers from the layer description, gaussian noise is added, and the kernel, stride and pool values are calculated with their parallel layers respectively.

After calculation, the actual convolution layers are added to the network. Here 2D transposed convolution layers were used. Their biases can be deactivated, if batch normalization is applied resulting in faster computation and less memory usage, because batch normalization will center the values making a bias useless.
After the convolution layers which often bloat up the array sizes, 2D adaptive average pooling layers are used to bring the calculated arrays to their correct output size.

For the activation layer leaky rectified linear unit, aka. leaky ReLU is used. ReLU is one of the most common activation functions, because of its fast convergence to a given problem's solution. The normal ReLU activation function is prone to get stuck at values less than zero, because its value is constant and set to zero and so forth not differentiable. Leaky ReLU’s values below zero is a monomial of first degree defaulting to ‘0.001 x’ and so forth differentiable. 
After the initialization of the previous sublayers, a dropout layer is attached with its given fraction of disabled network nodes trying to achieve more generalization capabilities.
At last the batch normalization operation is added, if described by layer description.

This initialization of the individual sublayers and layers are iterated continuously, until the whole layer description is processed. 
Since the output of the Encoder network resulting in the latent space is described by a gaussian normal distribution when executed as an adversarial autoencoder, the output has to be parameterized with the reparameterization trick described in [32]. This happens by adding a mean ‘mu’ and a ‘logvar’  formed by a 2d convolution layer and a 2D average pooling layer to bring it to its correct output size. 

After this, some information of the generated network is displayed for completion and for debug reasons.

Now that the artificial neuronal encoder network is generated, data can be passed through it with the forward function with their necessary attributes respectively. It iterates in the same manner over the layer description like the initialization function, but is there to pass the images or data and their corresponding spacial box count arrays to the input layers and their calculated solutions to the next layers in the correct order. The output is parameterized by the reparametrization function, which generates a gaussian normal distribution in the correct shape as it’s ground truth. Then the standard deviation is calculated from the ‘logvar’ and is multiplied with the sampled normal distribution. At last the mean ‘mu’ is added and returns the latent space variable ‘z’. This latent variable defines the parameterised compressed information of the input image.
 



In [None]:

class Encoder(nn.Module):
    def __init__(self,Parameter):
        super(Encoder, self).__init__()
        print("-------------INITIALIZE ENCODER----------------")
        #The Encoder gets an 2d np array imagelike or MPM'd aud,vid etc; and converts it to the latent space
        # -----------------------------------------------------------------------------
        self.input_shape = Parameter['input_shape']
        print("self.input_shape",self.input_shape)
        self.LayerDescription = Parameter['LayerDescription']
        self.LayerCount = len(self.LayerDescription)
        self.Layers = nn.ModuleList()
        self.magnification = Parameter['magnification']
        self.opt = Parameter['opt']
        #print(self.opt)
        self.device = Parameter['device']
        self.InterceptLayerMagnification = np.array([0.5,0.25,0.125,0.0625])
        self.AggregateMagnification = 1.0       # to Calc, the magnification until from beginning until the present layer
        self.resultedIndexes = []

        for i in range(self.LayerCount):        #iterate over Layers 
            if i > 0:
                #Cause the input of a channel has to be the output of the last channels, adjust the input channels accordingly by multiply the previous output channels of the last layers
                previous_Output_channels = OUT
                #print("previous_Output_channels",previous_Output_channels)
                previous_parallel_layer_count = len(parallel_layers)
                #print("previous_parallel_layer_count",previous_parallel_layer_count)

            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = self.LayerDescription[i]
            #print("Layer",i," with Parameters layermagn, parallel_layers , channellist, batchnorm_switch", layermagn, parallel_layers , channellist, batchnorm_switch)
            self.AggregateMagnification = self.AggregateMagnification*float(layermagn)
            
            #print("InterceptLayerMagnification", self.InterceptLayerMagnification)
            #print("AggregateMagnification", self.AggregateMagnification)

            #Show me the indexes where the boxcounts have to be merged with the NN Layers
            result = np.where( self.InterceptLayerMagnification == self.AggregateMagnification)
            try:
                converted_index = int(result[0])
                #print("Converted_index", converted_index)
                self.resultedIndexes.append([i, converted_index])
                #print("Resulted index describes: [Index in Layercount, index for BC/LAC")
            except:
                PrintException()

            IN, OUT = channellist
            if i > 0:
                IN = previous_Output_channels * previous_parallel_layer_count
            
            layer_multiplicator =  np.array(Sample_layerdict[layermagn])   #multiplicator for each layer, so that with every chosen kernelsize the stride and pooling is scaled accordingly
            
            for parallel_layer in parallel_layers:

                if self.opt.residual == True:
                    output_size = int(float(self.opt.img_size[0]) * self.AggregateMagnification)
                    #print("residual output size for adaptive pool to pass onto the next layer", output_size)
                    self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))

                if gaussian_noise == 1:
                    std = 0.001     #moderate disturbance
                    std_decay_rate = 0
                    self.Layers.append(  GaussianNoise(std, std_decay_rate) )

                kernel, stride, pool = layer_multiplicator * float(parallel_layer)
                kernel, stride, pool = int(kernel), int(stride), int(pool)
                #for the moment square but later can be implemented in x-y manner 
                Kx, Ky = kernel, kernel
                Sx, Sy = stride, stride
                Px,Py = 0,0     #no padding will be needed

                #New Pool version with adaptive average pooling
                output_size = int(float(self.opt.img_size[0]) * self.AggregateMagnification)
                print("output size for adaptive pool", output_size)
                

                if batchnorm_switch == 1:
                    #if batchnorm is applied, then bias calc of conv layers is not needed -> performance gain, less mem usage
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py), bias = False )) 
                else:
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py) )) 

                self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))
                #self.Layers.append(nn.AvgPool2d((pool, pool), stride=(pool, pool)))        #Old version of pooling caused pictures looking blocky 

                # cause MaxPool(Relu(x)) = Relu(MaxPool(x)) the activation is applied after the pooling, cause less parameters have to be activated -> proof of performance pending
                self.Layers.append(nn.LeakyReLU(inplace = True))

                if dropout[0] == 1:
                    #if dropout switch is 1, then add dropout to layers
                    p = dropout[1] #with percentage of ...
                    self.Layers.append(nn.Dropout2d(p=p))

                if batchnorm_switch == 1:
                    self.Layers.append( nn.BatchNorm2d(OUT, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) )

        print(" ENCODER : Resulted layer indexes, where BCR/LAC arrays have to be passed to are: "+ str(self.resultedIndexes))

        previous_Output_channels = OUT
        print("Last Output of inception layer_channels",previous_Output_channels)
        previous_parallel_layer_count = len(parallel_layers)
        print("previous_parallel_layer_count",previous_parallel_layer_count)
        IN = previous_Output_channels * previous_parallel_layer_count
        
        #self.Layers.append(nn.Conv2d(IN,No_latent_spaces, kernel_size=(1, 1), stride=(1, 1), padding=(0, 0)) )      #1x1 channel reduction layer
        # a kernelsize of 2 with stride 1 results in a mag of 1/2 , 
        print("self.AggregateMagnification", self.AggregateMagnification)
        print("self.opt.latent_dim", self.opt.latent_dim)
        print("self.opt.latent_dim_x", self.opt.latent_dim_x)
        print("self.opt.img_size", self.opt.img_size)

        print(f"CALCING NEEDED POOLSIZE  = int(self.AggregateMagnification/(self.opt.latent_dim/ self.opt.img_size[0])) = {self.AggregateMagnification}/({self.opt.latent_dim}/{self.opt.img_size[0]})")
        neededpoolsize_y = int(self.AggregateMagnification/(self.opt.latent_dim/ self.opt.img_size[0]))
        neededpoolsize_x = int(self.AggregateMagnification/(self.opt.latent_dim_x/ self.opt.img_size[1]))

        print("self.AggregateMagnification", self.AggregateMagnification)
        print("self.opt.latent_dim", self.opt.latent_dim)
        print("self.opt.latent_dim_x", self.opt.latent_dim_x)

        print("self.opt.img_size", self.opt.img_size)

        #print("reparametrization trick with cnn architecture")
        self.mu = nn.Conv2d(IN,self.opt.No_latent_spaces, kernel_size=(1, 1), stride=(1, 1), padding=(0, 0))
        self.logvar = nn.Conv2d(IN,self.opt.No_latent_spaces, kernel_size=(1, 1), stride=(1, 1), padding=(0, 0))
        print("NeededPoolsizes y, x", neededpoolsize_y, neededpoolsize_x)
        self.reparamPool = nn.AvgPool2d((neededpoolsize_y, neededpoolsize_x), stride=(neededpoolsize_y, neededpoolsize_x))          #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
    
        self.Tensor = torch.cuda.FloatTensor if self.device == "cuda" else torch.FloatTensor

        print("Encoder structure")
        print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        print(f"Input Size: {self.opt.img_size}")
        print(f"output/latent Size: {self.opt.img_size[0] * self.AggregateMagnification}")
        print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        print(self.Layers)
        print("Reparameterization structure")
        print("mu",self.mu)
        print("logvar",self.logvar)
        print("reparamPool",self.reparamPool)
        print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        print("-------------INITIALIZE ENCODER COMPLETE----------------")




    def forward(self, x, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16):
        #print("FORWARDING ENCODER--------------------------------")
        layerindex = 0
        
        for i in range(self.LayerCount):  
            #layermagn, parallel_layers , channellist, batchnorm_switch = self.LayerDescription[i]
            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = self.LayerDescription[i]
            
            data_branch = [None] * len(parallel_layers)      # dynamic setup of variables containing multiple outputs 
            passed_data = x  

            for  index, parallel_layer in enumerate(parallel_layers):

                if self.opt.residual == True:
                    residual = self.Layers[layerindex](x)
                    layerindex +=1 

                if gaussian_noise == 1:
                    data_branch[index] = self.Layers[layerindex](passed_data)  #gaussian noise
                    #print(self.Layers[layerindex])
                    layerindex +=1 
                    data_branch[index] = self.Layers[layerindex](data_branch[index])
                else:
                    data_branch[index] = self.Layers[layerindex](passed_data)      #convlayer

                #print(self.Layers[layerindex])
                layerindex +=1
                data_branch[index] = self.Layers[layerindex](data_branch[index])   # pool layer
                #print(self.Layers[layerindex])
                layerindex +=1
                data_branch[index] = self.Layers[layerindex](data_branch[index])   # activation layer
                #print(self.Layers[layerindex])
                layerindex+=1

                if dropout[0] == 1:
                    data_branch[index] = self.Layers[layerindex](data_branch[index]) #dropout layer
                    #print(self.Layers[layerindex])
                    layerindex +=1

                if batchnorm_switch == 1:
                    data_branch[index] = self.Layers[layerindex](data_branch[index]) # batchnorm layer
                    #print(self.Layers[layerindex])
                    layerindex +=1

                #if its the first element of the parallel layers just inherent x and after that concatenate the channels, cause dimensions are the same in x,y batchsize
                if index == 0:
                    x = data_branch[index]
                    #------------------------------------------------------
                    #print("x.shape for intercept layer",x.shape)
                    #SBC SHOULD BE INTERCEPTED HERE
                    #SBC = [BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16]
                    #self.resultedIndexes = [[1,0],[2,1],[3,3]]  [layercountindex,bcrindex]
                    #IF the Layercount matches the resulted indexes, then the adding can happen
                    #------------------------------------------------------
                    for layercountindex, bcrindex in self.resultedIndexes:
                        if i == layercountindex:
                            # if needed_idx == bcrindex
                            if bcrindex == 0:
                                BCR_LAC = BCR_LAK_map_2
                            elif bcrindex == 1:
                                BCR_LAC = BCR_LAK_map_4
                            elif bcrindex == 2:
                                BCR_LAC = BCR_LAK_map_8
                            elif bcrindex == 3:
                                BCR_LAC = BCR_LAK_map_16
                            if device == "cuda":
                                BCR_LAC.cuda()

                            #BCR_LAC = SBC[bcrindex]  #getting the correct shaped BCR and LAC
                            #print("BCR_LAC",BCR_LAC.shape)
                            #print("  x.shape", x.shape) # for checking shape of tensor

                            merged_channels = torch.add(BCR_LAC, x[:,:2,:,:] )
                            merged_channels = torch.div(merged_channels,2)
                            #print("merged_channels",merged_channels)
                            #print("  merged_channels.shape", merged_channels.shape)
                            #print("x[:,2:,:,:].shape", x[:,2:,:,:].shape)
                            x = torch.cat(( merged_channels  , x[:,2:,:,:]),1)
                            #TODO: IF torch.cat is applied before relu OR JUST APPLY A ACTIVATION FUNCTION AFTER CAT bcr+residual to stay below 1. then we dont need torch.div 2                    
                    #------------------------------------------------------
                else:
                    x = torch.cat((x,data_branch[index]),1)

                if self.opt.residual == True:
                    residual_channellenght = int(residual.shape[1])
                    output_channellength = int(x.shape[1])
                    # if channels in layer before are more than now, just trash the last channels from past layer
                    if residual_channellenght > output_channellength:
                        residual_channellenght = output_channellength

                    passed_residual = torch.add(residual[:,:residual_channellenght,:,:], x[:,:residual_channellenght,:,:] )
                    passed_residual = torch.div(passed_residual,2)
                    #print("passed_residual",passed_residual)
                    #print("  passed_residual.shape", passed_residual.shape)
                    #print("x[:,2:,:,:].shape", x[:,2:,:,:].shape)
                    x = torch.cat(( passed_residual  , x[:,residual_channellenght:,:,:]),1)

        #x = torch.flatten(x,start_dim=1)
        #print("EncoderOutput flattend for  reparameterization x.shape",x.shape)  
        #print("IMG Compressed size")
        #print(x.size())
        mu = self.mu(x)
        mu = self.reparamPool(mu)
        logvar = self.logvar(x)
        logvar = self.reparamPool(logvar)
        z = reparameterization(mu, logvar, self.Tensor,self.opt)
        #print("z.shape Encoder Output shape latent dim",z.shape)  
        #z = torch.reshape(z,(opt.batch_size,1,opt.latent_dim,opt.latent_dim_x))
        # print("z.shape Encoder Output shape latent dim",z.shape)  

        return z


##### 3.4.5.3 Decoder Network

The decoder network is generated very similarly to the encoder network, but without the reparameterization trick, because the forwarded latent space is already in a gaussian normal distribution and just the initial image has to be generated.
The layer description has to be passed to the initialization function of the decoder class to generate the specified network architecture.
Asserting that the layers result in the correct magnification beginning from the size of the latent space ending in the size of the original image, resulting in an executable custom convolutional neural network architecture.
After the initialization, the forward method can be used by passing an encoded latent space and the spacial box count arrays to generate an output image.
The hypothesis here is, that by passing the spacial box count arrays to the decoder network, the network is reminded on every scale roughly, how the picture should look.
Different network architectures use more or less of this additional data or just some on a specific scale, should enable some controllability of what impact the spacial box count arrays have on the image. 


In [None]:

class Decoder(nn.Module):

    def __init__(self,Parameter):
        super(Decoder, self).__init__()
        print("--------INITIALIZE Generator/DECODER----------------")
        self.LayerDescription = Parameter['LayerDescription']
        self.LayerCount = len(self.LayerDescription)
        self.Layers = nn.ModuleList()
        self.magnification = Parameter['magnification'] 
        self.opt = Parameter ['opt']

        '''
        LayerDescription = [        gaussian_noise , layermagn, no. parallel_layers , channellist, dropout ,   batchnorm_switch 
                                    [1,              '1',       [1],                  [1,4]  ,     [1, 0.112], 0 ],
                                    [1, '2', [1,2,4,8]    , [4,8]  , [1, 0.112], 1 ],
                                    [0, '2', [1,2,16]     , [8,16] , [1, 0.112], 1 ],
                                    [0, '4', [1,2,4,8,16] , [16,32], [1, 0.112], 1 ],
                                    [1, '1', [1,2,4,8,16] , [32,1] , [1, 0.112], 0 ],
                                ]         
        '''
        
        self.InterceptLayerMagnification = np.array([ self.opt.img_size[0]/16.0 , self.opt.img_size[0]/8.0 , self.opt.img_size[0]/4.0  , self.opt.img_size[0]/2.0])
        self.AggregateMagnification = self.opt.latent_dim
        self.resultedIndexes = []

        for i in range(self.LayerCount): 
            if i > 0:
                #Cause the input of a channel has to be the output of the last channels, 
                #adjust the input channels accordingly by multiply the previous output channels of the last layers except the first layer.
                previous_Output_channels = OUT
                #print("previous_Output_channels",previous_Output_channels)
                previous_parallel_layer_count = len(parallel_layers)
                #print("previous_parallel_layer_count",previous_parallel_layer_count)

            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = self.LayerDescription[i]
            self.AggregateMagnification = self.AggregateMagnification * float(layermagn)
            
            print("Magnifications, where BCR/LAC are added to the network", self.InterceptLayerMagnification)
            print("Aggregate magnification for this layer", self.AggregateMagnification)
            print("Layermagnification", layermagn)
            result = np.where( self.InterceptLayerMagnification == self.AggregateMagnification)        
            
            try:
                converted_index = 3- int(result[0]) #backwards
                print("Converted_index", converted_index)
                self.resultedIndexes.append([i, converted_index])
            except:
                PrintException()
            #print("resultedIndex where residual has to be passed to", self.resultedIndexes)

            IN, OUT = channellist
            if i > 0:
                IN = previous_Output_channels * previous_parallel_layer_count

            layer_multiplicator =  np.array(Sample_layerdict[layermagn])   #multiplicator for each layer, so that with every chosen kernelsizea and the stride is scaled accordingly
            
            for parallel_layer in parallel_layers:

                if self.opt.residual == True:
                    output_size = int( self.AggregateMagnification)
                    print("residual output size for adaptive pool to pass onto the next layer", output_size)
                    self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))
               
                if gaussian_noise == 1:
                    std = 0.001     #moderate disturbance
                    std_decay_rate = 0
                    self.Layers.append(  GaussianNoise(std, std_decay_rate) )

                kernel, stride, pool = layer_multiplicator * float(parallel_layer)
                kernel, stride, pool = int(kernel), int(stride), int(pool)
                #for the moment square but later can be implemented in x-y manner 
                Kx, Ky = kernel, kernel
                Sx, Sy = stride, stride
                Px,Py = 0,0     #no padding will be needed

                if batchnorm_switch == 1:
                    #if batchnorm is applied, then bias calc of conv layers is not needed -> performance gain
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py), bias = False ))  
                else:
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py) ))      
                    
                output_size = int(self.AggregateMagnification)
                print("output size for adaptive pool", output_size)
                self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))
                #self.Layers.append(nn.AvgPool2d((pool, pool), stride=(pool, pool)))         #old version of pooling caused blocky pixealted output

                # cause MaxPool(Relu(x)) = Relu(MaxPool(x)) the activation is applied after the pooling, cause less parameters have to be activated -> proof of performance pending
                self.Layers.append(nn.LeakyReLU(inplace = True))

                if dropout[0] == 1:
                    #if dropout switch is 1, then add dropout to layers
                    p = dropout[1] #with percentage of ...
                    self.Layers.append(nn.Dropout2d(p=p))

                if batchnorm_switch == 1:
                    self.Layers.append( nn.BatchNorm2d(OUT, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) )

        print(" DECODER : resultedIndex where box counts has to be passed to", self.resultedIndexes)
        print("Decoder/Generator")
        print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        print(f"Input Size: {self.opt.latent_dim}")
        print(f"output Size: {self.AggregateMagnification}")
        print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
        print(self.Layers)
        print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        print("--------INITIALIZING DECODER DONE----------------")

        
    def forward(self, x, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16):

        layerindex = 0
        for i in range(self.LayerCount):        #iterate forwards 

            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = self.LayerDescription[i]            
            #print("layerindex",i, 'Layermagnification', layermagn)

            data_branch = [None] * len(parallel_layers) 
            passed_data = x  

            for  index, parallel_layer in enumerate(parallel_layers):
                #print("parallel layer index", index)

                if self.opt.residual == True:
                    residual = self.Layers[layerindex](x)
                    layerindex +=1 

                if gaussian_noise == 1:
                    #data_branch[index] = self.Layers[layerindex](data_branch[index])  #gaussian noise
                    data_branch[index] = self.Layers[layerindex](passed_data)
                    #print(self.Layers[layerindex])

                    layerindex +=1 
                    data_branch[index] = self.Layers[layerindex](data_branch[index])
                else:
                    data_branch[index] = self.Layers[layerindex](passed_data)      #convlayer

                #print(self.Layers[layerindex])
                layerindex +=1
                data_branch[index] = self.Layers[layerindex](data_branch[index])   # pool layer
                #print(self.Layers[layerindex])
                layerindex +=1
                data_branch[index] = self.Layers[layerindex](data_branch[index])   # activation layer
                #print(self.Layers[layerindex])
                layerindex+=1
                
                if dropout[0] == 1:
                    #p = dropout[1]
                    data_branch[index] = self.Layers[layerindex](data_branch[index]) #dropout layer
                    #print(self.Layers[layerindex])
                    layerindex +=1

                if batchnorm_switch == 1:
                    data_branch[index] = self.Layers[layerindex](data_branch[index])
                    #print(self.Layers[layerindex])
                    layerindex +=1

                #if its the first element of the paralelle layers just inherent x and after that concatenate the channels, cause dimensions are the same in x,y batchsize
                if index == 0:
                    x = data_branch[index]
                    #print("x.shape for intercept layer",x.shape)

                    #------------------------------------------------------
                    #SBC = [BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16]
                    #self.resultedIndexes = [[1,0],[2,1],[3,3]]  [layercountindex,bcrindex]
                    #print("SBC[0].shape", SBC[0].shape)
                    #print("SBC[0] tpye", type(SBC[0]))
                    #print("SBC[0] dtype", SBC[0].dtype)
                    
                    #IF the Layercount matches the resulted indexes, then the adding can happen
                    for layercountindex, bcrindex in self.resultedIndexes:
                        if i == layercountindex:
                            #print("i == layercountidex", i)
                            #print("self.resulted INdexes",self.resultedIndexes)
                            needed_idx = bcrindex
                            
                            #other way around cause from little to big
                            if needed_idx == 0:
                                BCR_LAC = BCR_LAK_map_2
                                #print("bcr 2")
                            elif needed_idx == 1:
                                #print("bcr 4")
                                BCR_LAC = BCR_LAK_map_4
                            elif needed_idx == 2:
                                #print("bcr 8")
                                BCR_LAC = BCR_LAK_map_8
                            elif needed_idx == 3:
                                #print("bcr 16")
                                BCR_LAC = BCR_LAK_map_16

                            if device == "cuda":
                                BCR_LAC.cuda()


                            merged_channels = torch.add(BCR_LAC, x[:,:2,:,:] )
                            merged_channels = torch.div(merged_channels,2)
                            #print("merged_channels",merged_channels)
                            #print("  merged_channels.shape", merged_channels.shape)
                            #print("x[:,2:,:,:].shape", x[:,2:,:,:].shape)
                            x = torch.cat((merged_channels,x[:,2:,:,:]),1)
                    
                            #print("sleeping  BCR_LAC.shape", BCR_LAC.shape)
                else:
                    x = torch.cat((x,data_branch[index]),1)


                if self.opt.residual == True:
                    residual_channellenght = int(residual.shape[1])
                    output_channellength = int(x.shape[1])
                    # if channels in layer before are more than now, just trash the last channels from past layer
                    if residual_channellenght > output_channellength:
                        residual_channellenght = output_channellength

                    passed_residual = torch.add(residual[:,:residual_channellenght,:,:], x[:,:residual_channellenght,:,:] )
                    passed_residual = torch.div(passed_residual,2)
                    #print("passed_residual",passed_residual)
                    #print("  passed_residual.shape", passed_residual.shape)
                    #print("x[:,2:,:,:].shape", x[:,2:,:,:].shape)
                    x = torch.cat(( passed_residual  , x[:,residual_channellenght:,:,:]),1)

        return x



##### 3.4.5.4 Discriminator Network

The discriminator network’s task is to be able to differentiate between real and fake latent distributions. In case of a fake latent variable, an image is fed forward with the encoder network into the latent space. This latent space is fed through the discriminator, which should predict 0 representing a fake latent variable. A true sample is just being generated by a gaussian normal distribution and fed through the discriminator, similar to the reparameterization trick. The encoder's task is to fool the discriminator thinking it also produces a valid latent distribution while the discriminator tries to distinguish between the encoder output and a real sampled gaussian distribution. 

The initialization and forward methods are very similar to the encoder and decoder network. Differences are found in just needing the latent space as input and outputs a single value ranging from 0 to 1 by an applied sigmoid activation function in the final layer of the discriminator network. The output describes the validity of the latent space by being fake or true in respect to its value range.


In [None]:

class Discriminator(nn.Module):
    def __init__(self,Parameter):
        super(Discriminator, self).__init__()

        print("--------------INITIALIZING Discriminator----------------")
        # -----------------------------------------------------------------------------
        self.input_shape = Parameter['input_shape']
        self.LayerDescription = Parameter['LayerDescription']
        self.LayerCount = len(self.LayerDescription)
        self.Layers = nn.ModuleList()
        self.opt = Parameter['opt']
        self.device = Parameter['device']
        self.AggregateMagnification = float(opt.latent_dim)

        for i in range(self.LayerCount):        #iterate over layers
            # if not the first layer...
            if i > 0:
                #Cause the input of a channel has to be the output of the last channels, 
                #adjust the input channels accordingly by multiply the previous output channels of the last layers
                previous_Output_channels = OUT
                #print("previous_Output_channels",previous_Output_channels)
                previous_parallel_layer_count = len(parallel_layers)
                #print("previous_parallel_layer_count",previous_parallel_layer_count)

            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch  = self.LayerDescription[i]
            self.AggregateMagnification = self.AggregateMagnification*float(layermagn)
            
            #print("AggregateMagnification", self.AggregateMagnification)
            #print("layermag", layermagn)

            IN, OUT = channellist
            if i > 0:
                IN = previous_Output_channels * previous_parallel_layer_count

            layer_multiplicator =  np.array(Sample_layerdict[layermagn])   #multiplicator for each layer, so that with every chosen kernelsize the stride and pooling is scaled accordingly
            
            for parallel_layer in parallel_layers:
            
                if self.opt.residual == True:
                    #output_size = int(float(opt.img_size[0]) * self.AggregateMagnification)
                    output_size = int( self.AggregateMagnification)
                    print("residual output size for adaptive pool to pass onto the next layer", output_size)
                    self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))
            
                if gaussian_noise == 1:
                    std = 0.001     #moderate disturbance
                    std_decay_rate = 0
                    self.Layers.append(  GaussianNoise(std, std_decay_rate) )

                kernel, stride, pool = layer_multiplicator * float(parallel_layer)
                kernel, stride, pool = int(kernel), int(stride), int(pool)
                #for the moment square but later can be implemented in x-y manner 
                Kx, Ky = kernel, kernel
                Sx, Sy = stride, stride
                Px,Py = 0,0     #no padding will be needed
                if batchnorm_switch == 1:
                    #if batchnorm is applied, then bias calc of conv layers is not needed -> performance gain
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py), bias = False )) #,output_padding = ()     #Attention ENhancer IN OUT Switched
                else:
                    self.Layers.append(nn.ConvTranspose2d(IN, OUT , kernel_size=(Kx, Ky), stride=(Sx, Sy), output_padding=(Px, Py) )) #,output_padding = ()     #Attention ENhancer IN OUT Switched

                output_size = int(self.AggregateMagnification)
                print("output size for adaptive pool", output_size)
                self.Layers.append(nn.AdaptiveAvgPool2d((output_size,output_size)))

                #self.Layers.append(nn.AvgPool2d((pool, pool), stride=(pool, pool)))         #old version of pooling caused blocky looking pics
                #cause MaxPool(Relu(x)) = Relu(MaxPool(x)) the activation is applied after the pooling, cause less parameters have to be activated -> proof of perf. inc pending
                self.Layers.append(nn.LeakyReLU(inplace = True))

                if dropout[0] == 1:
                    #if dropout switch is 1, then add dropout to layers
                    p = dropout[1] #with percentage of ...
                    self.Layers.append(nn.Dropout2d(p=p))

                if batchnorm_switch == 1:
                    self.Layers.append( nn.BatchNorm2d(OUT, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) )
            
        print("self.input_shape",self.input_shape)

        previous_Output_channels = OUT
        print("previous_Output_channels",previous_Output_channels)
        previous_parallel_layer_count = len(parallel_layers)
        print("previous_parallel_layer_count",previous_parallel_layer_count)
        IN = previous_Output_channels * previous_parallel_layer_count

        print("Discriminator conv pass through workaround")
        # 1x1 conv layer channel reduction
        self.Layers.append(nn.Conv2d(IN,1, kernel_size=(1, 1), stride=(1, 1), padding=(0, 0)) )      #1x1 channel reduction layer
        #To bring down the HxW to 1 for validity
        self.Layers.append(nn.AdaptiveAvgPool2d((1,1)))
        #Sigmoid activation for value range = 0...1
        self.Layers.append( nn.Sigmoid() )

        print("DISCRIMINATOR STRUCTURE")
        print(self.Layers)
        print("--------------INITIALIZING Discriminator DONE----------------")


    def forward(self, z):       
        z = z.view(self.opt.batch_size, 1, self.opt.latent_dim, self.opt.latent_dim_x)

        layerindex = 0

        for i in range(self.LayerCount):
            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = self.LayerDescription[i]
                        
            data_branch = [None] * len(parallel_layers)      # cause single datapackage floods into the neurons, but cause of inception, multiple outputs 
            
            passed_data = z  #reshape data into  nd tensor

            for  index, parallel_layer in enumerate(parallel_layers):

                #print("parallel layer index", index)
                if self.opt.residual == True:
                    residual = self.Layers[layerindex](z)
                    layerindex +=1 

                if gaussian_noise == 1:
                    data_branch[index] = self.Layers[layerindex](passed_data)
                    #print(self.Layers[layerindex])

                    layerindex +=1 
                    data_branch[index] = self.Layers[layerindex](data_branch[index])

                else:
                    data_branch[index] = self.Layers[layerindex](passed_data)      #convlayer

                #print(self.Layers[layerindex])
                layerindex +=1

                data_branch[index] = self.Layers[layerindex](data_branch[index])   # pool layer
                #print(self.Layers[layerindex])
                layerindex +=1
                data_branch[index] = self.Layers[layerindex](data_branch[index])   # activation layer
                #print(self.Layers[layerindex])
                layerindex+=1

                if dropout[0] == 1:
                    #p = dropout[1]
                    data_branch[index] = self.Layers[layerindex](data_branch[index]) #dropout layer
                    #print(self.Layers[layerindex])
                    layerindex +=1

                if batchnorm_switch == 1:
                    data_branch[index] = self.Layers[layerindex](data_branch[index])
                    #print(self.Layers[layerindex])
                    layerindex +=1

                #if its the first element of the paralelle layers just inherent x and after that concatenate the channels, cause dimensions are the same in x,y batchsize
                if index == 0:
                    z = data_branch[index]
                    #print("x dtype", x.dtype)
                    #print("x.shape for intercept layer",x.shape)
                    #------------------------------------------------------
                else:
                    
                    z = torch.cat((z,data_branch[index]),1)
                #print(x.shape)

                
                if self.opt.residual == True:
                    residual_channellenght = int(residual.shape[1])
                    output_channellength = int(z.shape[1])
                    # if channels in layer before are more than now, just trash the last channels from past layer
                    if residual_channellenght > output_channellength:
                        residual_channellenght = output_channellength

                    passed_residual = torch.add(residual[:,:residual_channellenght,:,:], z[:,:residual_channellenght,:,:] )
                    passed_residual = torch.div(passed_residual,2)
                    #print("passed_residual",passed_residual)
                    #print("  passed_residual.shape", passed_residual.shape)
                    #print("z[:,2:,:,:].shape", z[:,2:,:,:].shape)
                    z = torch.cat(( passed_residual  , z[:,residual_channellenght:,:,:]),1)
                

        #Cause layercount just takes the inception layers the i has to be increased 3 more times to add 
        # to add 1x1 conv, adaptiv-avg-pool and sigmoid function
        i= -3
        #print("#################################################################")
        #print("adding conv1x1 layer")
        #print("Layerindex",i)
        #print("z.shape", z.shape)
        z = self.Layers[i](z)
        #print("z.shape", z.shape)
        i+=1
        #print("adding ADAPTIVE AVERAGE POOLING layer")
        #print("Layerindex",i)
        #print("z.shape", z.shape)
        z = self.Layers[i](z)
        #print("z.shape", z.shape)
        i+=1
        #print("adding SIGMOID layer")
        #print("Layerindex",i)
        #print("z.shape", z.shape)
        z = self.Layers[i](z)
        #print("z.shape", z.shape)

        validity = z
        validity = validity.view(self.opt.batch_size, 1)
        #print("discriminator validity shape", validity.shape)


        return validity


##### 3.4.5.5 Loss functions

The main function of the encoder and decoder networks are to compress the image to a smaller space than it’s pixelspace and to rebuild the original image from that compressed latent space. To achieve this. A pixelwise loss function called L1-loss is calculated between the original and the regenerated image after processing within the decoder network. The L1-loss measures the mean absolute error pixelwise of two arrays of the same size.

The discriminator tries to match predictions to differentiate between real and fake data, which can be described in a boolean expression. For that reason binary cross entropy loss is chosen for the adversarial loss function.



##### 3.4.5.6 Inpainting class

The Inpainting class implements some core functionality into the generative adversarial network by altering the original input and auxiliary data in a way that the artificial neural networks have to find a solution or workaround for repairing the disrupted data. 

Dark scenery, bad equipment or transmission always add noise to images, resulting in a grainy, low contrast picture. By adding artificial noise to the input data, an encoder decoder network chain can be built in mind with noise reduction capability. This is made possible by applying gaussian noise to the input image and it’s box counts and lacunarities respectively. 
Of course there are many types and variations of noise often generated by the equipment itself or interference through electromagnetic radiation, but the gaussian noise should be enough for demonstration purposes.

Other functionality is already added, but not the subject of this work. 
For example when the input image gets pixelated by a mean pooling followed by an upsampling operation the network can be trained with neural upsampling abilities. This way, low resolution images can be visually enhanced in a more natural way then just pooling & upsampling, also known as neural upsampling.

Neural inpainting is the interdisciplinary field of image augmentation handled by artificial neural networks. The image’s content is masked at a point of interest and the filling is computed by the neural networks. This leads to image manipulation skills and for example the ability to change the aspect ratio of the content by for example filling the letterboxes of a 21:9 movie into 16:9 content without cropping the original input. 



In [None]:
class Inpainting(nn.Module):
    def __init__(self,Parameter):
        super(Inpainting, self).__init__()
        '''
        THE inpainting Module masks the incoming data and the BCR/LAK accordingly so the network doesnt just
        cheat by taking unchanged BCR/LAK data, which it wouldnt have when getting noisy image and calc BCR then
        '''

        opt = Parameter['opt']
        ################### ###########################################################################################
        self.Layers = nn.ModuleList()

        superresolution = Parameter['superresolution']
        self.superres = superresolution[0]
        self.magnification = superresolution[1]
        
        if self.superres == True:
            #ORI DATA
            self.Layers.append(nn.AvgPool2d((self.magnification, self.magnification), stride=(self.magnification, self.magnification)))         #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
            self.Layers.append(nn.UpsamplingNearest2d(scale_factor=self.magnification))
            # BCR/LAC 2
            self.Layers.append(nn.AvgPool2d((self.magnification, self.magnification), stride=(self.magnification, self.magnification)))         #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
            self.Layers.append(nn.UpsamplingNearest2d(scale_factor=self.magnification))
            # BCR/LAC 4
            self.Layers.append(nn.AvgPool2d((self.magnification, self.magnification), stride=(self.magnification, self.magnification)))         #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
            self.Layers.append(nn.UpsamplingNearest2d(scale_factor=self.magnification))
            # BCR/LAC 8
            self.Layers.append(nn.AvgPool2d((self.magnification, self.magnification), stride=(self.magnification, self.magnification)))         #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
            self.Layers.append(nn.UpsamplingNearest2d(scale_factor=self.magnification))
            # BCR/LAC 16
            self.Layers.append(nn.AvgPool2d((self.magnification, self.magnification), stride=(self.magnification, self.magnification)))         #should be pool with poolkernel = stride, cause be reducing with 4 the poolkernel has to be 4 and the stride also 4
            self.Layers.append(nn.UpsamplingNearest2d(scale_factor=self.magnification))


        #For handling noisy images, the image can be overlayed with noise controlled by the noise parameter
        noise = Parameter['noise']
        
        self.noisebool = noise[0]
        self.std = noise[1]
        self.std_decay_rate = noise[2]
        if self.noisebool == True:
            self.Layers.append(  GaussianNoise(self.std, self.std_decay_rate) )
            #BCR/LAK 2
            self.Layers.append(GaussianNoise(self.std, self.std_decay_rate) )
            #BCR/LAK 4
            self.Layers.append(GaussianNoise(self.std, self.std_decay_rate) )
            #BCR/LAK 8
            self.Layers.append(GaussianNoise(self.std, self.std_decay_rate) )
            #BCR/LAK 16
            self.Layers.append(GaussianNoise(self.std, self.std_decay_rate) )

        
        ##############################################################################################################
        # taken from https://www.codefull.net/2020/03/masked-tensor-operations-in-pytorch/
        # to generate a mask in pytorch  source https://stackoverflow.com/questions/64764937/creating-a-pytorch-tensor-binary-mask-using-specific-values
  
        #For handling images with blacked out areas, the image can be overlayed with random or specific masks controlled by the mask parameter
        mask = Parameter['mask']
        #True/False, (x,y), (dx,dy)
        self.maskbool, self.maskmean, self.maskdimension = mask

        if self.maskbool == None:
            self.maskbool = True
            self.randdommask = True
            leftborder = int(opt.img_size[0] * 1/4)
            maxmasksize = leftborder
            rightborder = int(opt.img_size[0] * 3/4)
            self.maskmean = (torch.randint(leftborder,rightborder,(1,),device=torch.device(opt.device))[0]  , torch.randint(leftborder,rightborder,(1,),device=torch.device(opt.device))[0]  )
            self.maskdimension = (torch.randint(1,maxmasksize,(1,),device=torch.device(opt.device))[0]  , torch.randint(1,maxmasksize,(1,),device=torch.device(opt.device))[0]  )

        ori_maskmean, ori_maskdimension = self.maskmean, self.maskdimension

        if self.maskbool == True:
            self.Layers.append(mask_data( opt, self.maskmean, self.maskdimension, custom_mask = None ))
            ####BCRLAK2
            self.maskmean = self.maskmean[0]/2, self.maskmean[1]/2
            self.maskdimension = self.maskdimension[0]/2, self.maskdimension[1]/2
            self.Layers.append(mask_data(opt, self.maskmean, self.maskdimension, custom_mask = None ))
            ####BCRLAK4
            self.maskmean = self.maskmean[0]/2, self.maskmean[1]/2
            self.maskdimension = self.maskdimension[0]/2, self.maskdimension[1]/2
            self.Layers.append(mask_data(opt, self.maskmean, self.maskdimension, custom_mask = None ))
            ####BCRLAK8
            self.maskmean = self.maskmean[0]/2, self.maskmean[1]/2
            self.maskdimension = self.maskdimension[0]/2, self.maskdimension[1]/2
            self.Layers.append(mask_data(opt, self.maskmean, self.maskdimension, custom_mask = None ))
            ####BCRLAK16
            self.maskmean = self.maskmean[0]/2, self.maskmean[1]/2
            self.maskdimension = self.maskdimension[0]/2, self.maskdimension[1]/2
            self.Layers.append(mask_data(opt, self.maskmean, self.maskdimension, custom_mask = None ))
            #resetting maskmean & dimension
            self.maskmean, self.maskdimension = ori_maskmean, ori_maskdimension

        ##############################################################################################################
 

        print("MaskingStructure")
        print(self.Layers)


        ##############################################################################################################

    def forward(self,x, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16):
        for i in range(len(self.Layers)):
            layerindex = 0

            if self.superres == True:
                #ORI DATA 
                x = self.Layers[layerindex](x)
                layerindex +=1
                x = self.Layers[layerindex](x)
                layerindex +=1
                # BCR/LAC 2
                BCR_LAK_map_2 = self.Layers[layerindex](BCR_LAK_map_2)
                layerindex +=1
                BCR_LAK_map_2 = self.Layers[layerindex](BCR_LAK_map_2)
                layerindex +=1
                # BCR/LAC 4                
                BCR_LAK_map_4 = self.Layers[layerindex](BCR_LAK_map_4)
                layerindex +=1
                BCR_LAK_map_4 = self.Layers[layerindex](BCR_LAK_map_4)
                layerindex +=1
                #BCR/LAC 8
                BCR_LAK_map_8 = self.Layers[layerindex](BCR_LAK_map_8)
                layerindex +=1
                BCR_LAK_map_8 = self.Layers[layerindex](BCR_LAK_map_8)
                layerindex +=1
                #BCR /LAC 16
                BCR_LAK_map_16 = self.Layers[layerindex](BCR_LAK_map_16)
                layerindex +=1
                BCR_LAK_map_16 = self.Layers[layerindex](BCR_LAK_map_16)
                layerindex +=1


            if self.noisebool == True:
                #print("Adding Noise,now")
                #layerindex = 0
                x = self.Layers[layerindex](x)
                layerindex +=1
                BCR_LAK_map_2 = self.Layers[layerindex](BCR_LAK_map_2)
                layerindex +=1
                BCR_LAK_map_4 = self.Layers[layerindex](BCR_LAK_map_4)
                layerindex +=1
                BCR_LAK_map_8 = self.Layers[layerindex](BCR_LAK_map_8)
                layerindex +=1
                BCR_LAK_map_16 = self.Layers[layerindex](BCR_LAK_map_16)
                layerindex +=1

            if self.maskbool == True:
                #Print Masking
                x = self.Layers[layerindex](x)
                layerindex +=1
                BCR_LAK_map_2 = self.Layers[layerindex](BCR_LAK_map_2)
                layerindex +=1
                BCR_LAK_map_4 = self.Layers[layerindex](BCR_LAK_map_4)
                layerindex +=1
                BCR_LAK_map_8 = self.Layers[layerindex](BCR_LAK_map_8)
                layerindex +=1
                BCR_LAK_map_16 = self.Layers[layerindex](BCR_LAK_map_16)
                layerindex +=1

                

        return x, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16


##### 3.4.5.7 Train-/Test-/Validation-worker

The “FractalGAN_Worker” function initiates and handles the training, testing and validation loops.

If the model is set to be trained, all possible latent dimension sizes are calculated and have to be lower than the input image, to ensure the compression of data. Afterwards the hyperparameter space is defined or specified by the user’s initial input and the ‘datahandler’ from the ‘loader’ module is defining a dataset for proceeding the training loop. All necessary parameters are packed into a list of variables and passed onto the “begin_training” function, which executes the training loop.

If the already trained models should be validated all saved models are retrieved from the projects folder and sorted by their loss value to begin with the best performing models to conserve time validating. The testing dataset is initialized and the encoder-/decoder-/discriminator-network is loaded into the RAM to be executed by the CPU. A GPU isn’t necessary to execute the networks, because testing doesn’t need much parallel processing power, like training with backpropagation does. 
Then the inpainting class is initialized with the desired alterations to disturb the input data with noise, a mask or downsampling. Note, that the networks had to be also trained with these options when testing the networks, so just the noise will disturb input when executing. The networks then are feeded with the chosen test dataset and displayed onto the screen. At last the user can specify, if the network is good by keeping the model, or to delete it, when it’s bad by the users opinion.

If the worker function is executed in “test” mode, a specific model can be chosen and the encoder-/decoder-/discriminator-networks are loaded and the inpainting class is set up to the users intentions. After this the test dataset is feeded to the GAN and is displayed. 

The specific training and testing loops are described further in chapter 4. 





In [None]:
def FractalGAN_Worker(opt):
    global all_ness_params
    print("Placeholder for THE GAN submodule")
    import Loader       #loads the dataset
    device = FractalGAN.get_device()
    print("Chosen Device is",device)
    #device = "cpu"
    ########################################################################################
    #####################                   CNN_BoxCount               #####################
    ########################################################################################
    print("Convolutional Neural Network for spacial Box Counting is initilized")
    ModelnameList = None
    opt.verbosity = False

    #INIT#################################################
    DataHandler = Loader.DataHandler(opt)

    opt.Mode = 2
    if opt.Mode == 1:
        BoxCountEnc_model= CNN_Boxcount_encoder.CNN_BC_enc(opt)
        #newtrained model
        #Modelname = "Loss0.529---n_epochs_90_batch-size_512_learning-rate_0.063_beta-decay_0.928_0.664"
        #Modelname = "Loss0.017---n_epochs_115_batch-size_8_learning-rate_0.004_beta-decay_0.171_0.104"
        #Modelname = "Loss0.019---n_epochs_70_batch-size_2_learning-rate_0.007_beta-decay_0.391_0.487"
        #Modelname = "Loss0.014---n_epochs_80_batch-size_128_learning-rate_0.0_beta-decay_0.079_0.837"
        print("Choose the Model for flavoring GAN")
        root = Tk()
        root.filename =  filedialog.askopenfilename(initialdir = FileParentPath + "/models/SpacialBoxcountModels/" ,title = "Select file",filetypes = (("model files","*.model"),("all files","*.*")))
        print(root.filename)
        Modelname = root.filename[:-6]
        print("LOADING BoxcountEncoder Model named: \n" ,Modelname )
        NetParametersSaveplace =FileParentPath+ "/models/SpacialBoxcountModels/"+ Modelname +".netparams"
        BoxCountNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
        #INIT BC ENC NETWORK STRUCTURE
        BoxcountEncoder = CNN_Boxcount_encoder.BoxCountEncoder(BoxCountNetParameters)

        Modelsaveplace = Modelname +".model"
        #LOAD WEIGHS & BIASES
        try:
            BoxcountEncoder.load_state_dict(torch.load(Modelsaveplace, map_location=device))
        except:
            BoxcountEncoder.load_state_dict(torch.jit.load(Modelsaveplace, map_location=device))
        print("BoxcountEncoder generated")

    elif opt.Mode ==2:
        print("CPU Mode chosen for Boxcount Encoding")
        BoxCountEnc_model = None
        BoxcountEncoder = None

    #TRAINTESTORVALIDATE

    if opt.TrainOrTest.lower() ==  "train":
        if opt.hyperopt == "on":
            #ContainedLatentDims = [2,4,8,16,32,64]
            ContainedLatentDims = [2]
            while True:
                ContainedLatentDims.append(ContainedLatentDims[0]*ContainedLatentDims[-1])
                if ContainedLatentDims[-1] >= opt.img_size[0]:
                    break

            HyperParameterspace = {
                'n_epochs':hp.choice('opt.n_epochs', range(1,3) ),
                'lr':hp.uniform('opt.lr', 0.0001 , 0.01 ), 
                'b1':hp.uniform('opt.b1', 0.8 , 1.0 ),
                'b2':hp.uniform('opt.b2', 0.8 , 1.0 ),
                'latent_dim':hp.choice('opt.latent_dim', ContainedLatentDims ),
            }     

        elif opt.hyperopt == "off":
            HyperParameterspace = {
                'n_epochs':opt.n_epochs,
                'lr':opt.lr, 
                'b1':opt.b1,
                'b2':opt.b2,
                'latent_dim':opt.latent_dim,
            }    
    

        train_test_switch = opt.TrainOrTest.lower()
        DataHandler = Loader.DataHandler(opt)
        Dataset, DataLoader = DataHandler.define_dataset(train_test_switch,opt.ProjectName)
        previous_Best_Loss = None
        opt.first_time = True


        all_ness_params = opt, Dataset, DataLoader, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace
        happend = 0
        '''
        while True:
            try:
                FractalGAN.begin_training(all_ness_params)
            except:
                PrintException()
                print("Sleeping 1m")
                time.sleep(60)
                continue
        '''
        FractalGAN.begin_training(all_ness_params)


    elif opt.TrainOrTest.lower() == "val":
        import pathlib
        #path = str(pathlib.Path(__file__).parent.absolute()) # not working in jupyter
        path = os.getcwd()
        if sys.platform == "linux" or sys.platform == "darwin":
            path = path +"/models/GAN/" + opt.ProjectName+"/"
        elif sys.platform == "win32":
            path = path +"\\models\\GAN\\" + opt.ProjectName+"\\"
        Enc_netparams_file_list = []
        Dec_netparams_file_list = []
        Dis_netparams_file_list = []
        enc_dec_loss_list = []
        dis_loss_list = []

        encoder_dict = {}
        decoder_dict ={}
        discriminator_dict = {}
        unixtime_dict = {}

        for index, FILE in enumerate(os.listdir(path)):
            try:
                    
                if FILE[-6:] == ".model":
                    print(FILE)

                    try:
                        unixtime, lossvalue, Network_type = FILE.split("_")
                    except:
                        lossvalue, Network_type = FILE.split("_")

                    lossvalue  = lossvalue.replace("Loss","")
                    lossvalue  = lossvalue.replace("---","")
                    lossvalue = float(lossvalue)
                    Network_type = Network_type.replace(".model","")

                    with open(path+FILE[:-6]+".netparams", "rb") as f:
                        NetParameters = pickle.load(f)
                        #print("Extracted NETPARAMETERS ARE ",NetParameters)
                    print("chosen unixtime", unixtime)
                    unixtime_dict[unixtime] = lossvalue  


                    if Network_type == "ENCODER":
                        encoder_dict[unixtime] = [FILE,NetParameters,lossvalue]
        
                    elif Network_type == "DECODER":
                        decoder_dict[unixtime] = [FILE,NetParameters,lossvalue]
                    
                    elif Network_type == "DISCRIMINATOR":
                        discriminator_dict[unixtime] = [FILE,NetParameters,lossvalue]
            except:
                PrintException()
                pass

        '''
        # JUST TO CREATE A CSV WITH UNIXTIMEDICT AND CORRESPONDING LOSSVALUE
        '''
        import csv
        now = int(time.time())
        #my_dictionary = {'values': 678, 'values2': 167, 'values6': 998}
        #input("Try to write dictionary")
        with open(f'VarAE loss over time_{now}.csv', 'w') as f:
            for idxs, key in enumerate(unixtime_dict.keys()):
                f.write("%s, %s, %s\n" % (idxs, key, unixtime_dict[key]))
        #input("Dict written... please check")

        ###################################################################################
        # File list created.... now validation begins
        ###################################################################################
        print("Model list created now validation")
        print(unixtime_dict)

        train_test_switch = "test"
        DataHandler = Loader.DataHandler(opt)
        Dataset, DataLoader = DataHandler.define_dataset(train_test_switch,opt.ProjectName)
        #device = FractalGAN.get_device()
        device = 'cpu'
        #sorteddict = sorted(unixtime_dict.items(), key = lambda x:x[-1]) #sort the retreived models by their loss value [1][-1] = values, last value in list
        #sorteddict = dict(sorted(unixtime_dict.items(), key = lambda x:x, reverse = True)) #sort the retreived models by their loss value [1][-1] = values, last value in list
        import operator
        #sort the received models by increasing loss value, so best comes first
        sorteddict = dict(sorted(unixtime_dict.items(), key = operator.itemgetter(1)))

        for timestamp, value in sorteddict.items():
            try:
                print("Now unixtime-timestamp  "+ str(timestamp)+"  is used")
                print(f"Value of sorteddict is {value}")
                File , NetParameters, lossvalue = encoder_dict[timestamp]
                print(f"Loss is {lossvalue} and should be sorted from 0 to 1")
                encoder_path = path + File

                File , NetParameters, lossvalue = decoder_dict[timestamp]
                decoder_path = path + File

                Encodername = encoder_path[:-6]
                Decodername = decoder_path[:-6]

                if opt.autoencoder == "off":    
                    File , NetParameters, lossvalue = discriminator_dict[timestamp]
                    discriminator_path = path + File
                    Discriminatorname = discriminator_path[:-6]
                    #Modelname, EncDeDis = Modelname.split("_")

                ####################################
                #load ENCODER
                ####################################

                NetParametersSaveplace = Encodername +".netparams"
                EncoderNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
                EncoderNetParameters['device'] = device
                print("EncoderNetParameters",EncoderNetParameters)
                opt = EncoderNetParameters['opt']
                
                print("opt.latent", opt.latent_dim)

                #saveplace = Encodername +".model"
                saveplace = encoder_path
                encoder = FractalGAN.Encoder(EncoderNetParameters)

                try:
                    encoder.load_state_dict(torch.load(saveplace, map_location=device))
                except:
                    encoder.load_state_dict(torch.jit.load(saveplace, map_location=device))

                encoder.eval()   #to disable backpropagation, so don't adjust any weights and biases
                FractalGAN.count_parameters(encoder)


                ####################################
                #load DECODER
                ####################################
                NetParametersSaveplace =Decodername +".netparams"
                DecoderNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
                DecoderNetParameters['device'] = device
                #saveplace = Decodername +".model"
                saveplace = decoder_path
                decoder = FractalGAN.Decoder(DecoderNetParameters)

                try:
                    decoder.load_state_dict(torch.load(saveplace, map_location=device))
                except:
                    decoder.load_state_dict(torch.jit.load(saveplace, map_location=device))
                
                FractalGAN.count_parameters(decoder)

                decoder.eval()   #to disable backpropagation, so don't adjust any weights and biases


                if opt.autoencoder == "off":
                        
                    ####################################
                    #load Discriminator
                    ####################################
                    NetParametersSaveplace =Discriminatorname +".netparams"
                    DiscriminatorNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
                    DiscriminatorNetParameters['device'] = device
                    saveplace = Discriminatorname +".model"

                    discriminator = FractalGAN.Discriminator(DiscriminatorNetParameters)

                    try:
                        discriminator.load_state_dict(torch.load(saveplace, map_location=device))
                    except:
                        discriminator.load_state_dict(torch.jit.load(saveplace, map_location=device))
                    
                    FractalGAN.count_parameters(discriminator)

                    discriminator.eval()   #to disable backpropagation, so don't adjust any weights and biases

                else:
                    discriminator = None
                    DiscriminatorNetParameters = None


                previous_Best_Loss = None
                HyperParameterspace = None

                ########################################################################################
                #####################                   Inpainting Layers          #####################
                ########################################################################################  
                print("Generating Masking Layers")
                print("image size is", opt.img_size)
                    
                opt.noisebool = True
                #opt.std = 0.001        #light disturbance
                opt.std = 0.01      #moderate Disurbance
                #opt.std = 0.1      #hard Disturbance

                opt.std_decay_rate = 0 

                # if maskbool is None, Random masking is applied
                #opt.maskbool = None
                opt.maskbool = False
                #maskmean       x                       Y
                opt.maskmean = opt.img_size[1]/2 , opt.img_size[0]/2    #just the center for exploring  
                #               =          x                         Y                    
                opt.maskdimension =  int(opt.img_size[1]/8) , int(opt.img_size[0]/8)

                opt.LetterboxBool = False
                opt.LetterboxHeight = 30

                opt.PillarboxBool = False
                opt.PillarboxWidth = 10
                opt.device = 'cpu'
                opt.InpaintingParameters = {
                    'opt': opt,
                    'superresolution': (opt.superres, 2),
                    'noise': (opt.noisebool, opt.std, opt.std_decay_rate),
                    'mask': (opt.maskbool, opt.maskmean , opt.maskdimension),
                    'Letterbox': (opt.LetterboxBool, opt.LetterboxHeight),
                    'Pillarbox': (opt.PillarboxBool, opt.PillarboxWidth),

                }

                #inpainting = FractalGAN.Inpainting(opt.InpaintingParameters)

                inpainting = None

                #opt.autoencoder = autoencoder

                all_ness_params = opt, Dataset, DataLoader, inpainting, opt.InpaintingParameters ,encoder,EncoderNetParameters,  decoder, DecoderNetParameters,  discriminator , DiscriminatorNetParameters, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace 
                
                FractalGAN.TestGAN(all_ness_params)
                
                delete = input("Do you want to delete the current model or just continue to next model? (y/N)")
                if delete.lower() == "y":
                    try:
                        os.remove(encoder_path)
                    except:
                        PrintException()
                        pass
                    try:
                        os.remove(encoder_path[:-6]+".netparams")
                    except:
                        PrintException()
                        pass                    
                    try:
                        os.remove(decoder_path)
                    except:
                        PrintException()
                        pass                    
                    try:
                        os.remove(decoder_path[:-6]+".netparams")
                    except:
                        PrintException()
                        pass

                    if opt.autoencoder == "off":    
                        try:
                            os.remove(discriminator_path)
                            os.remove(discriminator_path[:-6]+".netparams")                
                        except:
                            pass
                elif delete.lower() == "n":
                    continue
            
            except:
                print("Some error happend! What do you want to do?")
                PrintException()
                delete = input("Do you want to delete the model (y/N)")
            
                if delete.lower() == "y":
                    try:
                        os.remove(encoder_path)
                    except:
                        PrintException()
                        pass
                    try:
                        os.remove(encoder_path[:-6]+".netparams")
                    except:
                        PrintException()
                        pass                    
                    try:
                        os.remove(decoder_path)
                    except:
                        PrintException()
                        pass                    
                    try:
                        os.remove(decoder_path[:-6]+".netparams")
                    except:
                        PrintException()
                        pass

                    if opt.autoencoder == "off":  
                        try:
                            os.remove(discriminator_path)
                            os.remove(discriminator_path[:-6]+".netparams")
                        except:
                            pass
                    
                continue
            

    if opt.TrainOrTest.lower() ==  "test":
        train_test_switch = opt.TrainOrTest.lower()
        DataHandler = Loader.DataHandler(opt)
        Dataset, DataLoader = DataHandler.define_dataset(train_test_switch,opt.ProjectName)

        '''
        #LOADING CHOSEN MODEL
        NetParametersSaveplace =FileParentPath+ "/models/"+ "SpacialBoxcountModels/"+ Modelname +".netparams"
        BoxCountNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))

        saveplace = FileParentPath+ "/models/"+ "SpacialBoxcountModels/"+Modelname +".model"
        
        __, parameter = Modelname.split('---')  #extracting parameter to generate option object
        __, __, n_epochs, __,   batch_size, __, learning_rate, __ , betadecay1, betadecay2 = parameter.split('_')
        
        #should not be nessecary, except the changing batchsize #self, n_epochs, batch_size, img_size, channels, learning_rate, b1, b2
        #opt = OptionObject(int(n_epochs), int(batch_size), opt.img_size, 1 , float(learning_rate), float(betadecay1), float(betadecay2))
        
        #device = "cuda"
        #device = "cpu"
        
        #load and init the BCencoder
        BoxCountEncoder = BoxCountEncoder(BoxCountNetParameters)
        try:
            BoxCountEncoder.load_state_dict(torch.load(saveplace, map_location=device))
        except:
            BoxCountEncoder.load_state_dict(torch.jit.load(saveplace, map_location=device))

        BoxCountEncoder.eval()   #to disable backpropagation, so don't adjust any weights and biases
        '''

        #device = FractalGAN.get_device()
        device = "cpu"
        
        
        print("Choose the Model for loading and testing already trained GAN")
        root = Tk()

        if sys.platform == "linux" or sys.platform == "darwin":
            initialdir = FileParentPath + "/models/GAN/" + opt.ProjectName + "/"
        elif sys.platform == "win32":
            initialdir = FileParentPath + "\\models\\GAN\\" + opt.ProjectName + "\\"

        root.filename =  filedialog.askopenfilename(initialdir = initialdir ,title = "Select file",filetypes = (("model files","*.model"),("all files","*.*")))
        print(root.filename)
        Modelname = root.filename[:-6]
        Unixtime, Modelname, EncDeDis = Modelname.split("_")

        Encodername = Unixtime +"_"+ Modelname+"_ENCODER"
        Decodername = Unixtime +"_"+Modelname + "_DECODER"
        '''
        print("Choose the CORRESPONDING Model for loading the DISCRIMINATOR")
        root = Tk()
        root.filename =  filedialog.askopenfilename(initialdir = initialdir ,title = "Select file",filetypes = (("model files","*.model"),("all files","*.*")))
        print(root.filename)
        Discriminatorname = root.filename[:-6]
        #Modelname, EncDeDis = Modelname.split("_")
        '''


        ####################################
        #load ENCODER
        ####################################
        NetParametersSaveplace = Encodername +".netparams"

        EncoderNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
        EncoderNetParameters['device'] = device
        print("EncoderNetParameters",EncoderNetParameters)

        opt = EncoderNetParameters['opt']
        print("opt.latent", opt.latent_dim)
        saveplace = Encodername +".model"

        encoder = FractalGAN.Encoder(EncoderNetParameters)

        try:
            encoder.load_state_dict(torch.load(saveplace, map_location=device))
        except:
            encoder.load_state_dict(torch.jit.load(saveplace, map_location=device))

        encoder.eval()   #to disable backpropagation, so don't adjust any weights and biases



        ####################################
        #load DECODER
        ####################################
        NetParametersSaveplace =Decodername +".netparams"
        DecoderNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
        DecoderNetParameters['device'] = device
        saveplace = Decodername +".model"

        decoder = FractalGAN.Decoder(DecoderNetParameters)

        try:
            decoder.load_state_dict(torch.load(saveplace, map_location=device))
        except:
            decoder.load_state_dict(torch.jit.load(saveplace, map_location=device))

        decoder.eval()   #to disable backpropagation, so don't adjust any weights and biases



        if opt.autoencoder == "off":
            print("Choose the CORRESPONDING Model for loading the DISCRIMINATOR")
            root = Tk()
            root.filename =  filedialog.askopenfilename(initialdir = initialdir ,title = "Select file",filetypes = (("model files","*.model"),("all files","*.*")))
            print(root.filename)
            Discriminatorname = root.filename[:-6]
            #Modelname, EncDeDis = Modelname.split("_")

            ####################################
            #load Discriminator
            ####################################
            NetParametersSaveplace =Discriminatorname +".netparams"
            DiscriminatorNetParameters = pickle.load(open(NetParametersSaveplace, "rb"))
            DiscriminatorNetParameters['device'] = device
            #DiscriminatorNetParameters['opt'] = opt

            saveplace = Discriminatorname +".model"

            discriminator = FractalGAN.Discriminator(DiscriminatorNetParameters)

            try:
                discriminator.load_state_dict(torch.load(saveplace, map_location=device))
            except:
                discriminator.load_state_dict(torch.jit.load(saveplace, map_location=device))

            discriminator.eval()   #to disable backpropagation, so don't adjust any weights and biases
        
        else:
            DiscriminatorNetParameters = None
            discriminator = None



        previous_Best_Loss = None
        HyperParameterspace = None



        ########################################################################################
        #####################                   Inpainting Layers          #####################
        ########################################################################################  
        print("Generating Masking Layers")
        print("image size is", opt.img_size)
            
        opt.noisebool = True
        #opt.std = 0.001        #light disturbance
        opt.std = 0.02      #moderate Disurbance
        #opt.std = 0.1      #hard Disturbance

        opt.std_decay_rate = 0 

        # if maskbool is None, Random masking is applied
        #opt.maskbool = None
        opt.maskbool = False
        #maskmean       x                       Y
        opt.maskmean = opt.img_size[1]/2 , opt.img_size[0]/2    #just the center for exploring  
        #               =          x                         Y                    
        opt.maskdimension =  int(opt.img_size[1]/8) , int(opt.img_size[0]/8)

        opt.LetterboxBool = False
        opt.LetterboxHeight = 30

        opt.PillarboxBool = False
        opt.PillarboxWidth = 10
        #opt.device = 'cpu'
        InpaintingParameters = {
            'opt': opt,
            'superresolution': (opt.superres, 2),
            'noise': (opt.noisebool, opt.std, opt.std_decay_rate),
            'mask': (opt.maskbool, opt.maskmean , opt.maskdimension),
            'Letterbox': (opt.LetterboxBool, opt.LetterboxHeight),
            'Pillarbox': (opt.PillarboxBool, opt.PillarboxWidth),

        }


        inpainting = None
        opt.InpaintingParameters = None
        #opt.autoencoder = autoencoder
        opt.device = device

        all_ness_params = opt, Dataset, DataLoader, inpainting, InpaintingParameters ,encoder,EncoderNetParameters,  decoder, DecoderNetParameters,  discriminator , DiscriminatorNetParameters, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace 

        FractalGAN.TestGAN(all_ness_params)


        

### 3.4.6 Evolutionary network generator


Since the neural network architectures have many options to choose and how to build those, a generative evolutionary approach was undertaken. 
The generative aspect comes from the fact that the network classes from the encoder, decoder and discriminator are written so that they can be adjusted in their architecture by a network layer description described in chapter 3.4.5.1 . This layer description is a python list of different data types. Each element describes for each layer block, if gaussian noise is used, which layer magnification from input to output is chosen, how many and which parallel inception layers are chosen. Also the number of input and output channels have to be defined for each layer. Afterwards it can be decided if dropout layers are activated, and for how much percentage the nodes will be turned off. At last the batch normalization can be turned on or off for each individual layer. 

Every basic evolutionary algorithms come in 5 steps:
1. Population initialization through randomly generated specimen
2. Evaluation of the specimen through a fitness function
3. Selection of best performing specimen for passing on their “genes”
4. Recombination of selected specimen forming a new generation of specimen
5. Random mutation of the recombined specimen

After step 5 the algorithm continues at step 2 as long as it is desired.
Problems with many variables are not easy to optimize through classic optimization strategies. In certain cases random search hyperparameter search is performing better than other strategies, for example grid search shown in [33]. Evolutionary algorithms can circumnavigate this problem and be better than a random search, when executed properly. In nature there are many different variables and dependencies to form and continue life. All these living creatures are in competition to one another and just pass on their genes, when they are fit enough to survive nature.


#### 3.4.6.1 Initialization

The network generator is initialized by calling the initialization class and passing a parameter dictionary. This consists of the image size, the latent dimension, the maximum length of the network, the maximum parallel layers and the options class for some additional information.
The network generator is just generating the layer descriptions of the generated network architectures. 
It has to make sure that the layer description is correct and suitable to the given task. Therefore the needed magnifications can be calculated using the image size and the size of the latent dimension. A series of magnifications is called a magnification train and all valid magnification trains are calculated within the initialization function. It generates all permutations off the allowed layer magnifications with a set maximum length. Every combination of the permutation list is checked if the magnification specifications for generating the encoder, decoder, or discriminator network. Since the encoder network uses the reparameterization step, the aggregate magnifications of the layer magnification combination can be greater or equal then the needed magnification. Also the discriminator network pools the final layer array to a single value representing the likelihood of a real or fake data input. Only the decoder magnification has to be equal to the aggregate magnification of the Chosen magnification train from the permutation list. If these assertions are met, the permutation combination is appended to the valid combination list, which is returned at the end of the function.
Afterwards all parallel layer possibilities have to be initialized. By again using permutations to generate a list of parallel layer possibilities limited by the maximum parallel layers integer.


#### 3.4.6.2 Random network architecture search

To begin the population initialization with random Network search the function to generate random networks can be used by passing all valid magnification trains, all valid parallel layer trains and the network type to it. There, a magnification train is chosen and channel trains are chosen by generating all channel trains and picking one at random. After choosing all necessary variables the function ‘generate_random_layer’ is executed to assemble a layer-list. Layer-by-layer this list is appended until the last layer of the network is reached forming the layer description to initialize the wanted artificial neural network architecture.


#### 3.4.6.3 Evaluation with fitness function

This architecture is passed onto the training routine and is evaluated by the loss function and saved for further processing. The random Network architecture search is used as long the current training trial is below the chosen population initialization threshold. The chosen loss function is a pixel wise mean squared error calculated between the original input image and the generated output image and describes the pixelwise difference between the input and output image. The less the loss value, the better is the reconstruction of the original input image.

#### 3.4.6.4 Evolutionary network architecture search

The evolutionary network architecture search is initialized when calling the network-generator’s ‘init_mating’ function. The function loads all saved random search network architectures and their loss values respectively and saves them in a list. The “generate_children_from_parents” function is taking those lists and is deleting all bad behaving models with a high loss value above the mean of all loss values, when specified by users’ input. It saves the good performing network architectures in a dictionary, which is searchable by the keys ‘latent dimension’  and ‘loss value’. The latent dimension is necessary, because the generated encoder and decoder network share the same latent dimension and by using network architectures with different latent dimensions would result in an error. Also the ratio between the latent dimension and image input size describes the compression factor and can be used to balance accuracy and transmission bandwidth between the encoder and decoder network.
These network architecture dictionaries are sorted by their loss value and divided into two lists forming one with the 20 % best performing network architectures and all the other architectures.

To initialize an evolutionary generated network architecture, parent one is from the best network architecture list and this second parent is taken from the other network architecture list. While always taking one of the best performing models as a parent. It is believed without evidence that variations of the best performing architecture should perform similar and probably better than just a random parent pair. The layer description of these best performing architectures are taken as a base for crossover. A for loop iterates through all of the layers of parent one and two and randomly chooses one or the other layer as the child's layer. When reaching the last layer the magnification and channel trains are checked and corrected, when not. This crossover step is repeated until 80 % of the new generation limit is reached. 

The last 20 % of the evolutionary generated network architecture list is filled by mutated versions of the best performing models. This is done by iterating over the best network architecture list and choosing randomly new sublayers. Integer values were completely randomly chosen but float values like in the dropout ratio were altered with random values from a gaussian distribution to keep the newly chosen values in the near vicinity of the old values. This procedure was chosen to slightly mutate already good models to be able to search in a more finely chosen parameter space.

After generating new layer descriptions from the existing ones, a new generation of network architectures can be trained, evaluated and saved for later use. To retrieve a layer description the ‘get_child_arch” function is used. 

This kind of parameter search tries to enable faster convergence to a given problem than just random search for many variables. The evolutionary neural network generator is interruptible and continues on without beginning from scratch, while learning from past experiences. Also through automatic deleting the bad behaving models and by deleting the bad models but with a low loss value by the user in validation mode, it is believed that the new generation of network architectures are more suitable for the chosen task. 


In [None]:

import numpy as np
import linecache
import sys
import os
import pathlib              #Import pathlib to create a link to the directory where the file is at.
import pickle
import random
import time
from collections import defaultdict
import statistics
from itertools import permutations

def PrintException():
    exc_type, exc_obj, tb = sys.exc_info()
    f = tb.tb_frame
    lineno = tb.tb_lineno
    filename = f.f_code.co_filename
    linecache.checkcache(filename)
    line = linecache.getline(filename, lineno, f.f_globals)
    print('EXCEPTION IN ({}, LINE {} "{}"): {}'.format(filename, lineno, line.strip(), exc_obj))


class Network_Generator():
    #A evolutionary Network generator for FractalGAN

    def __init__(self,Parameter):
        super(Network_Generator, self).__init__()
        self.Sample_layerdict ={'0.125': [1,1,8], '0.25':[1,1,4], '0.5':[1,1,2], '1':[1,1,1], '2':[2,2,1], '4':[4,4,1], '8':[8,8,1]}
        self.layer_magnifications = [0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, ]
        if Parameter == None:
            self.img_size = 256
            self.latent_dim = 8
        else:    
            self.img_size = Parameter['img_size']
            self.latent_dim = Parameter['latent_dim']
        self.encoder_magnification = self.latent_dim / self.img_size
        self.decoder_magnification = self.img_size / self.latent_dim
        self.discriminator_magnification = 1.0/ self.latent_dim
        self.init_mag = 1.0
        self.parallel_multiplicator = [1,2,4,8]
        self.Channel_possibilitys = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32] #,32
        self.Max_Lenght = Parameter['Max_Lenght'] # discribes the max length of the network
        self.Max_parallel_layers = Parameter['Max_parallel_layers']
        self.opt = Parameter['opt']
        self.EnDeDis = None
        print("init finished")
    

    def init_all_magnification_trains(self, EnDeDis):
        if EnDeDis.lower() == "encoder":
            #constrain the allowed layer magnifications
            allowed_layermags = self.layer_magnifications[:-2] + self.layer_magnifications[:-2] + self.layer_magnifications[:-2] #only 0.125 to 2.0 for encoding/compressing aspect
            print(f"encoder_magnification is {self.encoder_magnification}")
            print(f"allowed_layermags: {allowed_layermags}")

        elif EnDeDis.lower() == "decoder":
            #so that the encoder doesent see something like 0.5; 2; 0.5; 2; to infinity
            allowed_layermags = self.layer_magnifications[2:] +   self.layer_magnifications[2:] +self.layer_magnifications[2:]   #only from 1.0 to 8.0 for generation aspect
            print(f"decoder_magnification: {self.decoder_magnification}")
            print(f"allowed_layermags: {allowed_layermags}")

        elif EnDeDis.lower() == "discriminator":
            #so that the encoder doesent see something like 0.5; 2; 0.5; 2; to infinity
            allowed_layermags = self.layer_magnifications[:4] + self.layer_magnifications[:4] + self.layer_magnifications[:4] #only 0.1258 to 1
            print(f"discriminator_magnification:  {self.discriminator_magnification}")
            print(f"allowed_layermags: {allowed_layermags}")

            
        Valid_combinations = []
        #Permutation list based on  https://www.geeksforgeeks.org/python-itertools-permutations/

        for current_length in range(1,self.Max_Lenght,1): 
            #permutations = list(permutations(allowed_layermags,self.Max_Lenght))
            permut = list(permutations(allowed_layermags,int(current_length)))
            for i, combo in enumerate(permut):
                aggregate_mag = 1.0
                for element in combo:
                    aggregate_mag = aggregate_mag * element
                
                if EnDeDis == "encoder" :
                    placing_condition =  aggregate_mag >= self.encoder_magnification
                elif EnDeDis == "discriminator":
                    placing_condition =  aggregate_mag >= self.discriminator_magnification
                elif EnDeDis == "decoder":
                    placing_condition = aggregate_mag == self.decoder_magnification
                
                if placing_condition == True:
                    Valid_combinations.append(combo)

        return Valid_combinations 


    def init_all_parallel_layers(self):
        Valid_combinations = []
        #Permutation list based on  https://www.geeksforgeeks.org/python-itertools-permutations/

        for possible_length in range(1,self.Max_parallel_layers+1):
            #print("possible lenght", possible_length)
            permutation_list = list(permutations(self.parallel_multiplicator,possible_length))
            
            for i, combo in enumerate(permutation_list):
                    Valid_combinations.append(combo)
            
        return Valid_combinations


    def init_all_Channel_train(self, net_lenght):
        Valid_combinations = []
        #Permutation list based on  https://www.geeksforgeeks.org/python-itertools-permutations/
        assert net_lenght <= len(self.Channel_possibilitys)
        permutation_list = list(permutations(self.Channel_possibilitys,net_lenght)) # C
        #print("Permutation list is", permutation_list)
        
        for i, combo in enumerate(permutation_list):
                Valid_combinations.append(combo)
        
        return Valid_combinations        


    def generate_random_Net(self,EnDeDis,No_latent_spaces,Valid_mag_train,Valid_parallel_layer_train ):

        def generate_random_layer(i, last_layer_index,EnDeDis, IN, OUT,No_latent_spaces ):
            layerlist = []
            parallel_layer = []
            #print("lastlayerindex is",last_layer_index)
            #print("Layer:",i, "__IN:", IN, "__OUT:", OUT)

            if i == 0:
                #First layer, so first channel has to be 1 and output has to be more than 2 in case that the bcr/lac gets added
                if EnDeDis == "encoder":
                    IN = 1
                    if OUT <= 1:
                        OUT = 2

                elif EnDeDis == "decoder" or EnDeDis == "discriminator":
                    IN = No_latent_spaces

            elif i == last_layer_index and EnDeDis == "decoder":
                #Last Layer, so if decoder outputlayer has to be singular output and no parallel layers
                OUT = 1  #Just add singular ouput layer
                parallel_layer = [1] #and end in  1 channel output
            

            if parallel_layer == [1]: 
                #if parallel_layer is already chosen, then dont choose parallel layers
                pass
            else:
                #else
                parallel_layer =  list(Valid_parallel_layer_train[random.randrange(0,len(Valid_parallel_layer_train),1)])

            #layerelements=   gaussian Noise,         magnification           paralell layers   channels    Dropout/               dropout pct                 Batch norm
            layerlist = [ random.randint(0, 1 ), str(chosen_mag_train[i]), parallel_layer ,  [IN,OUT], [random.randint(0, 1 ), random.uniform(0.001, 0.3 ) ]   , random.randint(0, 1) ]
            #print(f"Generated random Layer is {layerlist}")

            return layerlist

        print(f"Number of all valid magnification trains are: {len(Valid_mag_train)}")
        chosen_mag_train = Valid_mag_train[random.randrange(0,len(Valid_mag_train),1)] #low high, step
        print(f"chosen_mag_train is {chosen_mag_train}")

        last_layer_index = len(chosen_mag_train) -1
        #print("Last layer index is", last_layer_index)
        Channel_train = self.init_all_Channel_train( last_layer_index+2)
        try:
            chosen_channel_train = Channel_train[random.randint(0,len(Channel_train))]
            print("chosen Channel train is", chosen_channel_train)
        except:
            PrintException()
            print("asume lenght is 0 so just one element")
            chosen_channel_train = list(Channel_train)
            print(f"chosen Channel train is: {chosen_channel_train}")

        LayerDescription = []
        for i, layer in enumerate(chosen_mag_train):
            # Output channels have to be the input channels of the next layer
            IN = chosen_channel_train[i]
            OUT = chosen_channel_train[i+1]
            layerlist = generate_random_layer(i,last_layer_index,EnDeDis, IN, OUT,No_latent_spaces)
            LayerDescription.append(layerlist)

        return LayerDescription


    def init_mating(self,opt):
        path = str(pathlib.Path(__file__).parent.absolute())        
        #path = path + "/models/GAN/" + opt.ProjectName +"/"
        
        if sys.platform == "linux" or sys.platform == "darwin":
            path = path + "/models/GAN/" + opt.ProjectName +"/"
        elif sys.platform == "win32":
            path = path + "\\models\\GAN\\" + opt.ProjectName +"\\"
            
        print("Path is "+ path)
        self.Enc_netparams_file_list = []
        self.Dec_netparams_file_list = []
        self.Dis_netparams_file_list = []
        self.enc_dec_loss_list = []
        self.dis_loss_list = []

    
        #cause pickle.load(f) returns fail when netparams are loaded in cpu mode, when trained on gpu
        # taken from https://github.com/pytorch/pytorch/issues/16797
        import io
        class CPU_Unpickler(pickle.Unpickler):
            def find_class(self, module, name):
                if module == 'torch.storage' and name == '_load_from_bytes':
                    return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
                else: return super().find_class(module, name)

        filecounter = 0
        n_encoder = 0
        n_decoder = 0 
        n_discriminator = 0
        for FILE in os.listdir(path):
            filecounter +=1
            #print(FILE)
            if FILE[-10:] == ".netparams":
                try:
                    unixtime, lossvalue, Network_type = FILE.split("_")
                except:
                    lossvalue, Network_type = FILE.split("_")

                lossvalue  = lossvalue.replace("Loss","")
                lossvalue  = lossvalue.replace("---","")
                lossvalue = float(lossvalue)
                Network_type = Network_type.replace(".netparams","")

                with open(path+FILE, "rb") as f:

                    if self.opt.device == "cpu":
                        #print("load netparams into cpu")
                        NetParameters = CPU_Unpickler(f).load()
                    else:
                        #print("normal pickle load used")
                        NetParameters = pickle.load(f)
                    #print("NETPARAMETERS ARE " + NetParameters)

                if Network_type.lower() == "encoder":
                    n_encoder +=1
                    self.Enc_netparams_file_list.append([FILE, NetParameters, lossvalue])
                    self.enc_dec_loss_list.append(lossvalue)
                
                elif Network_type.lower() == "decoder":
                    n_decoder += 1
                    self.Dec_netparams_file_list.append([FILE, NetParameters, lossvalue])
                    self.enc_dec_loss_list.append(lossvalue)

                elif Network_type.lower() == "discriminator":
                    n_discriminator +=1
                    Netparameters = pickle.load
                    self.Dis_netparams_file_list.append([FILE, NetParameters, lossvalue])
                    self.dis_loss_list.append(lossvalue)

        print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
        print(f"Number of Files: {filecounter},   encoder: {n_encoder},  decoder: {n_decoder}, discriminator: {n_discriminator},")
        print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
        
        # Calcing mean for Survivor selection
        #print(f"ENC_DEC mean lossLIST {self.enc_dec_loss_list}")

        self.enc_dec_mean_loss =  np.nanmean(self.enc_dec_loss_list, axis = 0)
        print(f"ENC_DEC mean loss is {self.enc_dec_mean_loss}")
        if opt.autoencoder == "off":
            self.dis_mean_loss = np.mean(self.dis_loss_list)

        print("Generated Parents Network architecture")
        print(f"Number of Encoder Models are {len(self.Enc_netparams_file_list)}")
        print(f"Number of Decoder Models are {len(self.Dec_netparams_file_list)}")
        assert len(self.Enc_netparams_file_list) > 0 and len(self.Dec_netparams_file_list) > 0 , "No Encoder/Decoder in Parent list"
        if opt.autoencoder == "off":
            print(f"Number of Discriminator Models are {len(self.Dis_netparams_file_list)}")    
            assert len(self.Dis_netparams_file_list) > 0 ,"No discriminator models found"

        return self



    def generate_children_from_parents(self, EnDeDis, Generation_limit,opt ):
        #NetParameters = {'LayerDescription': LayerDescription, 'input_shape': input_shape, 'SpacialBoxcounting':BoxcountEncoder, 'magnification':magnification, 'opt':opt, 'device': device}
        FileParentPath = str(pathlib.Path(__file__).parent.absolute())
        del_bad_models = input("Do you want to delete bad Models automaticly?  (y/N)")

        ################################################################
        #########     SURVIVOR SELECTION & ARCH COMPATIBILY CHECK
        ################################################################
        #If a network arch was performing worse/higher than the mean loss, than it'll not survive.
        
        # Create dict with 2 keys, so that every Network is searchable by loss-value and latent dim size
        print(f"ENC_DEC mean loss is {self.enc_dec_mean_loss}")
        print(f"Generating {EnDeDis} models")

        if EnDeDis == "encoder":
            encoder_parents_dict = defaultdict(dict)
            enc_index = 0
            for element in self.Enc_netparams_file_list:
                filename, Netparameters, lossvalue = element
                #print("Filename: "+filename)
                encoder_opt = Netparameters['opt']
                print(f"loss for this model is {lossvalue}")
                if lossvalue < self.enc_dec_mean_loss:
                    print(" ENCODER network worthy of propagating")
                    encoder_parents_dict[str(encoder_opt.latent_dim)][str(lossvalue)] = Netparameters
                    enc_index += 1

                else:
                    print("Potential Parent is not performing enough")
                    if del_bad_models == "" or del_bad_models.lower() == "n":
                        pass
                    else:
                        try:
                            print("Removing Encoder Model")
                            deletepath = f"{FileParentPath}/models/GAN/{encoder_opt.ProjectName}/{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}/models/GAN/{encoder_opt.ProjectName}/{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")               
                        except:
                            print("Removing Encoder Model in windows mode")
                            deletepath = f"{FileParentPath}\\models\\GAN\\{encoder_opt.ProjectName}\\{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}\\models\\GAN\\{encoder_opt.ProjectName}\\{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")  
                            
        elif EnDeDis == "decoder":
            decoder_parents_dict = defaultdict(dict)
            dec_index = 0
            for element in self.Dec_netparams_file_list:
                filename, Netparameters, lossvalue = element
                decoder_opt = Netparameters['opt']
                print(f"loss for this model is {lossvalue}")
                if lossvalue < self.enc_dec_mean_loss:
                    print(" DECODER network worthy of propagating")
                    print(f"latent dim is {decoder_opt.latent_dim}")
                    decoder_parents_dict[str(decoder_opt.latent_dim)][str(lossvalue)] = Netparameters
                    dec_index += 1
                else:
                    print("Potential Parent is not performing enough")       
                    if del_bad_models == "" or del_bad_models.lower() == "n":
                        pass
                    else:
                        try:
                            deletepath = f"{FileParentPath}/models/GAN/{decoder_opt.ProjectName}/{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}/models/GAN/{decoder_opt.ProjectName}/{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")
                        except:
                            deletepath = f"{FileParentPath}\\models\\GAN\\{decoder_opt.ProjectName}\\{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}\\models\\GAN\\{decoder_opt.ProjectName}\\{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")

        elif EnDeDis == "discriminator":
            #Discriminator_parents_list = []
            discriminator_parents_dict = defaultdict(dict)
            dis_index = 0
            for element in self.Dec_netparams_file_list:
                filename, Netparameters, lossvalue = element
                dis_opt = Netparameters['opt']

                if lossvalue < self.dis_mean_loss:
                    print(" discriminator network worthy of propagating")
                    #Discriminator_parents_list.append([Netparameters,lossvalue])
                    print(f"latent dim is {dis_opt.latent_dim}")
                    discriminator_parents_dict[str(dis_opt.latent_dim)][str(lossvalue)] =  Netparameters
                    dis_index += 1

                else:
                    print("Potential Parent is not performing enough")       
                    if del_bad_models == "" or del_bad_models.lower() == "n":
                        pass
                    else:
                        try:
                            deletepath = f"{FileParentPath}/models/GAN/{dis_opt.ProjectName}/{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}/models/GAN/{dis_opt.ProjectName}/{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")
                        except:
                            deletepath = f"{FileParentPath}\\models\\GAN\\{dis_opt.ProjectName}\\{filename}"
                            os.remove(deletepath)
                            modelpath =  f"{FileParentPath}\\models\\GAN\\{dis_opt.ProjectName}\\{filename[:-10]}.model"
                            os.remove(modelpath)
                            print(f"Deleted Netparams and model data for {filename[:-10]}")

        ################################################################
        #########     CROSSOVER & Sibling Mutation
        ################################################################
        '''  
        LayerDescription = [     gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch 
                                    [1, '1', [1]          , [1,4]  , [1, 0.112], 0 ],
                                    [1, '2', [1,2,4,8]    , [4,8]  , [1, 0.112], 1 ],
                                    [0, '2', [1,2,16]     , [8,16] , [1, 0.112], 1 ],
                                    [0, '4', [1,2,4,8,16] , [16,32], [1, 0.112], 1 ],
                                    [1, '1', [1,2,4,8,16] , [32,1] , [1, 0.112], 0 ],
                                ]         
        '''

        #Models are grouped by their latent dimension, to avoid layer mag fail
        chosen_latent_spaces_ori = ['2', '4', '8', '16', '32', '64', '128']
        # make sure, that latent spaces are always tinier than input, else there is no compression of data 
        chosen_latent_spaces = [x for x in chosen_latent_spaces_ori if int(x) < opt.img_size[0]]

        # The philosophy here is to get the best performing models and let them live on until they are too bad
        # and to mate them with the other survivors devided by the population_ratio
        population_ratio = 0.2 # the best 20% will survive/mate with the other

        if EnDeDis == "encoder":  
            self.Encoder_Netparameter_dict = defaultdict(dict)
            encoder_input= opt.channels
            encoder_output = opt.No_latent_spaces +1

        elif EnDeDis == "decoder":
            self.Decoder_Netparameter_dict = defaultdict(dict)
            decoder_input=  opt.No_latent_spaces +1
            decoder_output = opt.channels

        elif EnDeDis == "discriminator":
            self.Discriminator_Netparameter_dict = defaultdict(dict)
            discriminator_input = opt.No_latent_spaces
            discriminator_output = 1 #true or false


        for latent_dimension in chosen_latent_spaces:
            print(f"Processing latent dimension of {latent_dimension}")
            try:
                Best_Network_Arch_list = []
                Other_Network_Arch_list = []
        
                if EnDeDis == "encoder":      
                    subdict = encoder_parents_dict[latent_dimension]
                    self.encoder_magnification = int(latent_dimension) / self.img_size
                    magnification = self.encoder_magnification
                    
                elif EnDeDis == "decoder":
                    subdict = decoder_parents_dict[latent_dimension]
                    self.decoder_magnification = self.img_size / int(latent_dimension)
                    magnification = self.decoder_magnification

                elif EnDeDis == "discriminator":
                    subdict = discriminator_parents_dict[latent_dimension]
                    magnification = self.discriminator_magnification
                    self.discriminator_magnification = 1.0/ int(latent_dimension )
                
                sorted_loss_values = sorted(subdict)
                
                if len(sorted_loss_values) == 0:
                    print("Continue with next latent dimension, because no models found for latent dimension "+str(latent_dimension))
                    continue   

                print(f"sorted_loss_values: {sorted_loss_values}")
                print(f"Number models found {len(sorted_loss_values)}")
                population_ratio_index = int(float(len(sorted_loss_values))*population_ratio)
                print(f"population_ratio_index {population_ratio_index}")
                for index, lossvalue in enumerate(sorted_loss_values):
                    #spits out the network architecture with best loss(0.) to worst loss(>0.5)
                    #print("index, lossvalue, population_ratio_index "+ str(index) +" "+  str(lossvalue)+" "+  str(population_ratio_index))
                    Parent_Netparameters = subdict[lossvalue]
                    if index <= population_ratio_index:
                        Best_Network_Arch_list.append(Parent_Netparameters)
                        Other_Network_Arch_list.append(Parent_Netparameters)
                    else:
                        Other_Network_Arch_list.append(Parent_Netparameters)

                print(f" Lenght of best networks is {len(Best_Network_Arch_list)} and Lenght of all other nets  are {len(Other_Network_Arch_list)}")    
                
                for model_index, new_model in enumerate(range(Generation_limit)):
                    Parent1 = Best_Network_Arch_list[random.randint(0,len(Best_Network_Arch_list)-1)]
                    Parent2 = Other_Network_Arch_list[random.randint(0,len(Other_Network_Arch_list)-1)]
                    
                    #Because some old models were saved with a typo, this has to be adressed, by loading value with typo and fixing it
                    try:
                        Parent1['LayerDescription'] = Parent1['LayerDiscription']
                    except:
                        pass

                    try:
                        Parent2['LayerDescription'] = Parent2['LayerDiscription']
                    except:
                        pass
                        
                    #print(f"Parent1['LayerDescription'] {Parent1['LayerDescription']}")
                    #print(f"Parent2['LayerDescription'] {Parent2['LayerDescription']}")
                    Child = []
                    channel_input = 1
                    channel_output = 1
                    last_layer_index = len(Parent1['LayerDescription'])-1
                    AggregateMagnification = 1.0    #init magnification

                    for layer_index, layer in enumerate(Parent1['LayerDescription']):
                        #print("layer_index", layer_index)
                        #print("layer", layer)        

                        #if netgenparameters of parent1 are longer than those of the 2nd one, then just take the layerdescription from parent 1
                        #print("len(Parent2['LayerDescription'])", len(Parent2['LayerDescription']))
                        if layer_index < len(Parent2['LayerDescription']):
                            True_OR_False = random.randint(0,1) #random true false value
                        else:
                            True_OR_False = 1                        
                        #print("TrueOrFalse", True_OR_False)

                        if True_OR_False == 1:
                            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = Parent1['LayerDescription'][layer_index]
                            if layer_index == 0:
                                pass
                            else:
                                channel_input = channel_output  #because the channel input has to be the channel output of layer before
                                channellist = [channel_input,channellist[-1]]
                                channel_output = channellist[-1]

                            if self.opt.batch_size == 1:
                                #cause no batch norm possible, when just having batchsize of 1
                                batchnorm_switch = 0

                            Child.append([gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch])

                        else:
                            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = Parent2['LayerDescription'][layer_index]        #layer_from_Parent2
                            if layer_index == 0:
                                pass
                            else:
                                channel_input = channel_output
                                channellist = [channel_input,channellist[-1]]
                                channel_output = channellist[-1]
                            
                            if self.opt.batch_size == 1:
                                #cause no batchnorm, when just having batchsize of 1
                                batchnorm_switch = 0

                            Child.append([gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch])

                        layermagn = float(layermagn)  #layermag is right after gaussian noise
                        AggregateMagnification = AggregateMagnification * layermagn 
                        
                        if layer_index == last_layer_index:
                            #print("last layer reached, checking for correct channel output and magnification")
                            ######
                            ## INSANITY CHECK 
                            #  Check for each Network type for spec input and output
                            #  encoder in =1 and out = opt.no_latent_spaces
                            #  decoder in = opt.no_latent_spaces out = grey=1
                            #  discriminator in = opt.np_latent spaces out = 1 (t/f)
                            #
                            #   Check Magnifications 
                            #   assert magnification == in/out or something like this
                            #   else: just adjust last layer with right magnification or adjust a mag 1 layer according to the nessecary mag
                            #####
                            lastlayer = Child[-1]

                            gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = lastlayer
                            
                            if self.opt.batch_size == 1:
                                batchnorm_switch = 0 #cause no batchnorm, when just having batchsize of 1

                            if EnDeDis == "encoder":  
                                #checking channels
                                channellist = [channellist[0],encoder_output]
                                print("checking magnification")
                                print(f"AggregateMagnification {AggregateMagnification}" )
                                print(f"self.encoder_magnification {self.encoder_magnification}" )
                                if AggregateMagnification == self.encoder_magnification:
                                    print("Correct magnification... pass on")
                                    pass
                                else:

                                    print("Not Correct mag, recalcing correct magnification")

                                    oldlayermagn = layermagn
                                    layermagn =  self.encoder_magnification / AggregateMagnification
                                    layermagn = str(float(layermagn) * float(oldlayermagn))
                                    print(f"oldlayermagn {oldlayermagn}" )
                                    print(f"newlayermagn {layermagn}" )

                            elif EnDeDis == "decoder":
                                #checking channels
                                channellist = [channellist[0],decoder_output]
                                #checking magnification

                                #print("AggregateMagnification", AggregateMagnification)
                                #print("self.decoder_magnification", self.decoder_magnification)

                                if AggregateMagnification == self.decoder_magnification:
                                    #print("Correct magnification... pass on")
                                    pass
                                else:
                                    #print("Not Correct mag, recalcing correct magnification")
                                    oldlayermagn = layermagn
                                    layermagn =   self.decoder_magnification / AggregateMagnification
                                    layermagn = str(float(layermagn) * float(oldlayermagn))
                                    #print("oldlayermagn", oldlayermagn)
                                    #print("newlayermagn", layermagn)
                            
                            elif EnDeDis == "discriminator":
                                #checking channels
                                channellist = [channellist[0],discriminator_output]
                                #checking magnification not neccesary, cause adaptive average pooling takes the last output and pools it to a singular value ranging from 0 to 1

                            Child[last_layer_index]= gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch

                    print(f"Child Layer Discription{Child}")
                    
                    if EnDeDis == "encoder":  
                        self.Encoder_Netparameter_dict[latent_dimension][str(model_index)] = Child
                        print(f"Appending Child to Encoder Netparameter dict with latent_dim={latent_dimension} and model index {model_index} with Layerdiscription:")
                        print(Child)
                        #print(self.Encoder_Netparameter_dict)
                    elif EnDeDis == "decoder":
                        self.Decoder_Netparameter_dict[latent_dimension][str(model_index)] = Child
                        print(f"Appending Child to Decoder Netparameter dict with latent_dim={latent_dimension} and model index {model_index} with Layerdiscription:")
                        print(Child)                        
                        #print(self.Decoder_Netparameter_dict)
                    elif EnDeDis == "discriminator":
                        #print(self.Discriminator_Netparameter_dict)
                        self.Discriminator_Netparameter_dict[latent_dimension][str(model_index)] = Child
                    
                    last_model_index = model_index
                    #if Sum models are exceeding 80% of generation, break, so best old models can survive mutated
                    if model_index > int(0.8*Generation_limit):
                        print("Generation limit reached... Breaking")
                        break


                if EnDeDis == "encoder":  
                    print(f"MATING DONE, lenght of new ENCODER modellist with {latent_dimension} is  {str(len(self.Encoder_Netparameter_dict[latent_dimension]))}")
                    print(f"TRY TO READ ONE LAYERDISCRIPTION {self.Encoder_Netparameter_dict[latent_dimension]['0']}")
                    #print(f"Encoder Netparameter dict {self.Encoder_Netparameter_dict}")
                elif EnDeDis == "decoder":
                    print(f"MATING DONE, lenght of new DECODER modellist with {latent_dimension} is  {str(len(self.Decoder_Netparameter_dict[latent_dimension]))}")
                    #print(f"Decoder Netparameter dict {self.Decoder_Netparameter_dict}")

                elif EnDeDis == "discriminator":
                    print(f"MATING DONE, lenght of new DISCRIMINATOR modellist with {latent_dimension} is  {str(len(self.Discriminator_Netparameter_dict[latent_dimension]))}")
                    #print(f"DISCRIMINATOR Netparameter dict {self.Discriminator_Netparameter_dict}")
                

                #####################################
                #       best models survive, mutate and reinitialize
                #####################################
                print("Proceed with best models' survival, mutatation and reinitializion")

                parallel_layer_possibility = self.init_all_parallel_layers()
                #print("parallel layer possibilitys", parallel_layer_possibility)

                for best_model in Best_Network_Arch_list:
                    last_model_index +=1
                    mutated_model = []
                    opt= best_model['opt']
                    latent_dimension  = opt.latent_dim
                    for layer in best_model['LayerDescription']:
                        
                        gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch = layer
                        if self.opt.batch_size == 1:
                            batchnorm_switch = 0 #cause no batchnorm, when just having batchsize of 1

                        ##mutation occurs on gaussian noise, parrallel_layers, dropout/pct, batchnorm
                        gaussian_noise = random.randint(0,1)
                        parallel_layers = list(parallel_layer_possibility[random.randint(0,len(parallel_layer_possibility)-1)])
                        #print("chosen parrallel layers", parallel_layers)
                        
                        #high prob that best models have already good dropout value, so mutated around the old value
                        mean, standard_dev = dropout[0], 0.001
                        new_dropout = [random.randint(0,1), float(np.random.normal(mean,standard_dev))]
                        if new_dropout[1] <= 0.0  or  new_dropout[1] >= 1.0:
                            #if dropout is a non valid value then just take the old value
                            pass
                        else:
                            dropout = new_dropout
                        batchnorm = random.randint(0,1)
                        
                        mutated_model.append([gaussian_noise , layermagn, parallel_layers , channellist, dropout ,batchnorm_switch])
                    
                    print("Last_model_index"+ str(last_model_index))
                    if EnDeDis == "encoder":  
                        self.Encoder_Netparameter_dict[latent_dimension][str(last_model_index)] = mutated_model
                        print(f"MUTATE DONE, lenght of new modellist is  {str(len(self.Encoder_Netparameter_dict[latent_dimension]))}")
                    elif EnDeDis == "decoder":
                        self.Decoder_Netparameter_dict[latent_dimension][str(last_model_index)] = mutated_model
                        print(f"MUTATE DONE, lenght of new modellist is  {str(len(self.Decoder_Netparameter_dict[latent_dimension]))}")
                    elif EnDeDis == "discriminator":
                        self.Discriminator_Netparameter_dict[latent_dimension][str(last_model_index)] = mutated_model
                        print(f"MUTATE DONE, lenght of new modellist is {str(len(self.Discriminator_Netparameter_dict[latent_dimension]))}")

                if EnDeDis == "encoder":  
                    return_dict = self.Encoder_Netparameter_dict
                    
                elif EnDeDis == "decoder":
                    return_dict = self.Decoder_Netparameter_dict
                    
                elif EnDeDis == "discriminator":
                    return_dict = self.Discriminator_Netparameter_dict


                if EnDeDis == "encoder":  
                    print(f"Mating and mutating DONE, lenght of new ENCODER modellist with {latent_dimension} is  {str(len(self.Encoder_Netparameter_dict[latent_dimension]))}")
                    print(f"TRY TO READ ONE LAYERDISCRIPTION {self.Encoder_Netparameter_dict[str(latent_dimension)]['0']}")
                    #print(f"Encoder Netparameter dict {self.Encoder_Netparameter_dict}")
                elif EnDeDis == "decoder":
                    print(f"Mating and mutating DONE, lenght of new DECODER modellist with {latent_dimension} is  {str(len(self.Decoder_Netparameter_dict[latent_dimension]))}")
                    #print(f"Decoder Netparameter dict {self.Decoder_Netparameter_dict}")

                elif EnDeDis == "discriminator":
                    print(f" MODEL SURVIVAL AND MATING DONE, lenght of new DISCRIMINATOR modellist with {latent_dimension} is  {str(len(self.Discriminator_Netparameter_dict[latent_dimension]))}")


            except:
                PrintException()
                input("fail in assembling new models")

        return return_dict

    def get_child_arch(self, EnDeDis , latent_dimension ,Childlist_index, Netparameter_dict):
        LayerDescription = Netparameter_dict[latent_dimension][Childlist_index]

        return LayerDescription


## 4 Training, Testing and Validation

Training, testing and validation can be achieved by executing the following main loop. Before training can be accessed, the creation of a dataset has to be executed. 
Afterwards training can begin to first search randomly for network architectures until the minimum population is reached to proceed further with the evolutionary approach.
Since this training process acquires many models to choose from, the models have to be tested and validated.


In [None]:
#MAIN

#Main program
if __name__ == '__main__':
    #torch.multiprocessing.set_start_method('spawn') # solution to start training/testing in an own process !!!!

    print("The Main Functions are... \n mkdata   to generate a dataset from pictures and...  \n gan to train/test/validate the proposed Generative Adversarial Network ")
    MainFunction = input(MainFunctionDiscription)


    if MainFunction == "mkDataset" or MainFunction.lower() == "mkdata":
        print("STARTING to generate dataset!")
        opt.ProjectName = input("Whats the name of your project?")
        print("Please choose the Folder you want the program work with")
        opt.DataFolder = askdirectory()
        #This Program aimes to take any given data as an input, pack it onto a Multi-parameter-matrix representation to calc the spacial boxcountdistribution 
        #so characterization/categorization/DataSearch/Datageneration from it is possible
        DataHandler = Loader.DataHandler(opt)
        DataHandler = Loader.choose_DataFormat(DataHandler)   
        #DataHandler.precision = input("Set Precision for balancing train/test dataset: 0: No Balancing/50/50split     1...3...5:Coarse Balaning (many train/few test)      5...7...9: Fine balancing (few train/many test)   ")         
        #DataHandler = Loader.make_train_test_Datasets(DataHandler,opt)    #AND CALCS BOXCOUNTS on the run
        DataHandler = Loader.make_train_test_Datasets_multicore(DataHandler,opt)    #AND CALCS BOXCOUNTS on the run  

    if MainFunction.lower() == "FractalGAN" or MainFunction.lower() == "gan":
        print("STARTING FRACTALGAN !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
        opt.TrainOrTest = input("Do you want to train or test the model(train/test/val): ")
        opt.ProjectName = input("Whats the name of your project?")
        try:
            if sys.platform == "linux" or sys.platform == "darwin":
                modelpath = FileParentPath + "/models/GAN/" + opt.ProjectName + "/"
            elif sys.platform == "win32":
                modelpath = FileParentPath + "\\models\\GAN\\" + opt.ProjectName + "\\"
            print(f"set modelpath to {modelpath}")
            os.mkdir(modelpath)
        except:
            PrintException()
            pass

        FractalGAN_Worker(opt)


### 4.1 Dataset generation

A dataset is generated by the self written ‘loader’ module. It creates a ‘datahandler-object’ where all necessary information about the data and the dataset is located at. After executing the main loop and specifying to make a new dataset, the project name has to be specified, which in this case is called "mnist". The wanted folder can be now specified by the opened file-dialog. Choosing a folder with many thousand files can take a moment. In the future, this algorithm should be able to handle more than just pictures, but at this point the datatype “pictures” has to be chosen. Then it has to be decided, if the old dataset has to be deleted, if there should be an old one with the same project name. 
Now the dataset is generated and consists of a custom pytorch dataset with multiple labels. The labels are the box count ratio and lacunarities with multiple box sizes derived from [20] and are calculated for each image. The image and label arrays are normalized to a data range between 0 and 1 and split into a training and testing dataset. The testing dataset is necessary to prevent the memorization of the training dataset by evaluating its performance to previously unseen data.
Any CPU core processes an individual file, so the full potential of a modern cpus is utilized. 


### 4.1 Training

By executing the main loop and passing “gan” and “train” to the input prompts, training begins by starting the “FractalGAN_Worker”. There the ‘datahandler’ from the ‘loader’ module is defining the previously built dataset to work with. The worker calls the ‘begin_training’ function, which initiates the hyperparameter optimization handled by the module hyperopt. It chooses the learning rate, its first and second moment of its decay and the size of the latent dimension. The hyper optimization is saving its results in a trials-object to be able to resume previous training sessions, if the trials object is found. Alternatively a new trials-object is generated. Training begins immediately after passing all necessary parameters to the initialization loop. There, random network architectures are generated, while the current number of trials are lower than the minimum population of for example 100. Is the minimum population of network architectures reached, evolutionary network generation is executed like described in chapter 3.4.6.4 . 
Also the inpainting class is set up and has to be chosen according to the user's intent. By setting the “noise bool” variable to “True” and the standard deviation of the gaussian noise to '0.01', the incoming images and their auxiliary data gets corrupted by gaussian noise and the networks have to eliminate this noise from the analyzed pictures. Since this is the main function demonstrated by this work, the upscaling and masking features have to be set to ‘False’.
When initialization of the encoder, decoder, discriminator and inpainting class are successful, training begins by executing the ‘TrainGAN’ function. This function sets up all the necessary losses and sends those with the encoder, decoder, discriminator and inpainting networks to the GPU’s memory, if possible. 
Then the adam optimizer gets initialized, which is needed for adjusting the weights and biases of the networks. There is one optimizer present for the encoder and decoder networks, also known as the autoencoder. An optimizer for the discriminator network and an optimizer just for the encoder network when an adversarial autoencoder is trained. 

After the initialization procedure the main training loop can begin by iterating over the training dataset. The hyperparameters and many other options can be changed while training by changing the specified values within the ‘config.py’ file, which is located in the working directory of the code. 

The images and the fractal arrays consist of special box counts and lacunarity arrays are casted into a tensor, which are sent to the gpu as well. 
The adversarial ground truths, which describe a valid or a fake image are created in memory while the valid tensor is set to one, the fake tensor is zero. These ground truths are there to calculate the adversarial loss, which is the binary cross entropy loss and is the difference between the predicted value by the discriminator network and one of the valid or fake tensors. So if the discriminator network predicts a one at a valid sample, then the loss being the difference is equal to zero.
Before the data can be passed through the encoder and decoder networks, the inpainting network has to alter the input data, so noise is added.

Now the training of the encoder and decoder network begins by passing the data through the encoder network resulting in the gaussian distributed latent variable. This encoded data is passed to the decoder network to reconstruct the original image without the added noise.
The pixel wise loss is calculated resulting in the so-called reconstruction error. Then backpropagation happens through the decoder and encoder network and the weights and biases of these networks are adjusted by the autoencoder optimizer.

After the data passed through the encoder and decoder network, the discriminator is fed with the encoded latent variable from the encoder network to predict the validity of the encoded image. To get the undisturbed latent variable from the encoder, it is set to evaluation mode, so weights and biases are fixed and dropout is disabled. The encoder network is used as a generator for the discriminator network to generate a fake latent distribution. The wanted, or real distribution is a generated gaussian distribution. The discriminator network tries to predict and differentiate between the real and fake distribution. A real and a fake loss are calculated from the predictions and used to optimize the discriminator network. 
Now the encoder network is turned back from evaluation mode to training mode and data passes another time through it and the discriminator network. The computed generator loss is the binary cross entropy loss of the predicted and the valid tensor and is back propagated. The generator optimizer is adjusting the weights in such a form, that the encoder tries to fit the given gaussian distribution to fool the discriminator. If done correctly, the encoder should cast the input data to the wanted distribution and the discriminator should more likely detect the encoder's fake distribution as a real one. This last step is there to ensure the latent space distribution is in a specified form, enabling latent space analysis like shown in [29].

To control the training loop and know where to stop training, a trailing loss is implemented, which is initiated by a high value and decays for each passed batch of images. The trailing loss is compared to the summed loss of the encoder, decoder and discriminator and if the trailing loss is lower than the summed loss, the training loop breaks. 
The decay of the trailing loss is faster in the population initialization phase and in the evolutionary phase the loss decays slower when model loss is low . 
This was chosen because the hyper optimization up to this point is just randomly guessing the optimal values and by training a few models by capping the training time for each model, many samples for suggesting the optimal hyperparameters are gathered.

The training can proceed until the user has acquired the necessary model accuracy.

In [None]:
def init_GAN(all_ness_params):
    global opt

    opt, Dataset, DataLoader, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace  =  all_ness_params 
    if opt.autoencoder == "off":
        pass

    ################################################################################################
    ###  EVOLUTIONARY/RANDOM NETWORK GENERATOR
    ################################################################################################

    NetGenParameters = {'img_size':opt.img_size[0],  'latent_dim':opt.latent_dim, 'Max_Lenght': opt.Max_Lenght, 'Max_parallel_layers': opt.Max_parallel_layers, 'opt':opt }
    netgen = Network_Generator(NetGenParameters)
    opt.latent_dim_x = int((opt.img_size[1]/ opt.img_size[0]) * opt.latent_dim)
    print("opt.latent_dim_x is  ",opt.latent_dim_x)

    ########################################################################################
    #####################                   Inpainting Layers          #####################  
    print("Generating Masking Layers")
    print("image size is", opt.img_size)
        
    opt.noisebool = True
    #std = 0.001     #moderate disturbance       #cant see anything at all
    opt.std = 0.01      #Hard Disurbance
    #opt.std = 0.001     #moderate disturbance
    #std = 0.0001    #light disturbance
    opt.std_decay_rate = 0 

    # if maskbool is None, Random masking is applied
    opt.maskbool = False
    #maskmean       x                       Y
    opt.maskmean = opt.img_size[1]/2 , opt.img_size[0]/2    #just the center for exploring  
    #maskmean = 50, 33
    #               x. Y                    
    opt.maskdimension = 50,25

    opt.LetterboxBool = False
    opt.LetterboxHeight = 30

    opt.PillarboxBool = False
    opt.PillarboxWidth = 10
    
    InpaintingParameters = {
        'opt': opt,
        'superresolution': (opt.superres, 2),
        'noise': (opt.noisebool, opt.std, opt.std_decay_rate),
        'mask': (opt.maskbool, opt.maskmean , opt.maskdimension),
        'Letterbox': (opt.LetterboxBool, opt.LetterboxHeight),
        'Pillarbox': (opt.PillarboxBool, opt.PillarboxWidth),

    }

    inpainting = Inpainting(InpaintingParameters)



    ########################################################################################
    #####################                   ENCODER                    #####################
    ########################################################################################    

    ########################################################################################    
    #       EVOLUTIONAL GENERATION OF ENCODER NETWORK
    ########################################################################################    
    #opt.Population_Init_Threshold = 200
    #Generation_limit = 100  #after x Child Models new mating begins to crossover the last generation with the recent trained ones.
    Generation_limit = opt.Population_Init_Threshold *2
    print("Trials at "+  str(opt.max_trials)+ " and chosen Popuation Init threshold is," +str(opt.Population_Init_Threshold))
    #First time the Population init threshold is reached, mating will be initialized, so every model & Fitness is extracted from the previously calced models
    if opt.max_trials >= opt.Population_Init_Threshold and opt.first_time == True:
        print("INIT MATING ENCODER NOW")
        time.sleep(1)
        netgen.init_mating(opt)
        Encoder_Netparameter_dict = netgen.generate_children_from_parents("encoder",Generation_limit,opt)
        opt.Enc_Child_index = 0
        #first_time = False juist after discriminator
        #input("CHECK IF ENCODER ARE INITATED")
    
    elif opt.max_trials <= opt.Population_Init_Threshold:     #Population Init
        Valid_mag_train = netgen.init_all_magnification_trains('encoder')
        #print("All valid Valid_mag_train combos are")
        #print(Valid_mag_train)
        Valid_parallel_layer_train = netgen.init_all_parallel_layers()
        #print("All valid Valid_parallel_layer_train combos are",Valid_parallel_layer_train)
        LayerDescription = netgen.generate_random_Net("encoder",opt.No_latent_spaces, Valid_mag_train,Valid_parallel_layer_train )

    if opt.max_trials >= opt.Population_Init_Threshold:   # Crossover
        chosen_latent_spaces_ori = ['2', '4', '8', '16', '32', '64', '128']
        # make sure, that latent spaces are always tinier than input, else there is no compression 
        latent_dimensions = [x for x in chosen_latent_spaces_ori if int(x) < opt.img_size[0]]

        for latent_dimension in latent_dimensions:
            print(f"For {latent_dimension} there are,{len(Encoder_Netparameter_dict[latent_dimension])} encoder net entrys")
        
        if opt.first_time_choosing_latent_size == True:
            while True:         
                try:
                    opt.latent_dim = int(input(f"Choose latent dimension from {latent_dimensions}"))
                    opt.latent_dim_x = int((opt.img_size[1]/ opt.img_size[0]) * opt.latent_dim)

                    opt.first_time_choosing_latent_size = False
                    break
                except:
                    PrintException()
                    continue

        found = False  
        
        while found == False:    
            try:
                print(f"Try to get a child arch with for ENCODER with latent dim of{opt.latent_dim} and Index {opt.Enc_Child_index} with {len(Encoder_Netparameter_dict[str(opt.latent_dim)])}")
                LayerDescription = netgen.get_child_arch("encoder",str(opt.latent_dim), str(opt.Enc_Child_index),Encoder_Netparameter_dict)
                print("Entry found. continue with NET INIT")
                chosen_latent_dim = int(opt.latent_dim)
                print(f"Chosen latent dimension is {chosen_latent_dim}")
                found = True
                break
            except:
                PrintException()
                opt.Enc_Child_index += 1
                if opt.Enc_Child_index >= len(Encoder_Netparameter_dict[latent_dimension]):
                    print("End of Encoder Netparameter dict reached exit")
                    break
                continue
    
    print("ENCODER NETWORK LAYERDISCRIPTION")
    print(LayerDescription) 

    try:
        start_dim = chosen_latent_dim #evogen
        print(f"Overwrote latent dimension with"[chosen_latent_dim])
    except:
        start_dim = opt.latent_dim #randomgen

    end_dim = opt.img_size[0]   
    magnification = end_dim / start_dim  #(32*32)/(128*128) = 16 -> 16x magnification ratio compared to input

    input_shape = (opt.batch_size,1, opt.img_size[0],opt.img_size[1])
    NetParameters = {'LayerDescription': LayerDescription, 'input_shape': input_shape, 'SpacialBoxcounting':BoxcountEncoder, 'magnification':magnification, 'opt':opt, 'device': device}

    #Init Encoder
    encoder = Encoder(NetParameters)
    count_parameters(encoder)
    
    EncoderNetParameters = NetParameters
    #del EncoderNetParameters['opt']['Encoder_Netparameter_dict']
    #print(EncoderNetParameters)
    ########################################################################################
    #####################               DECODER/GENERATOR              #####################
    ########################################################################################   

    try:
        start_dim = chosen_latent_dim #evogen
    except:
        start_dim = opt.latent_dim #randomgen    end_dim = opt.img_size[0]   

    magnification = end_dim / start_dim  #(32*32)/(128*128) = 16 


    ########################################################################################   
    #                       EVOLUTIONAL GENERATION OF DECODER NETWORK
    ########################################################################################   
    #First time the Population init threshold is reached, mating will be initialized, so every model & Fitness is extracted from the previously calced models
    if opt.max_trials >= opt.Population_Init_Threshold and opt.first_time == True:
        print("INIT MATING DECODER NETORKS NOW")
        Decoder_Netparameter_dict = netgen.generate_children_from_parents("decoder",Generation_limit,opt)
        opt.Dec_Child_index = 0
        if opt.autoencoder == "on":
            opt.first_time = False

    elif opt.max_trials <= opt.Population_Init_Threshold:     #Population Init
        Valid_mag_train = netgen.init_all_magnification_trains('decoder')
        #print(f"All valid Valid_mag_train combos are {Valid_mag_train}")
        Valid_parallel_layer_train = netgen.init_all_parallel_layers()
        #print(f"All valid Valid_parallel_layer_train combos are {Valid_parallel_layer_train}")
        LayerDescription = netgen.generate_random_Net("decoder",opt.No_latent_spaces, Valid_mag_train,Valid_parallel_layer_train )
        if opt.autoencoder == "on":
            opt.Population_Init_Threshold =+ Generation_limit

    if opt.max_trials >= opt.Population_Init_Threshold:   # Crossover
        print(f"No Entries in decoder netparameter dict with latent dim of {chosen_latent_dim}  :  {len(Decoder_Netparameter_dict[str(chosen_latent_dim)])}")
        print(Decoder_Netparameter_dict[chosen_latent_dim])
        while True:
            try:
                print(f"Try to get ChildArch from Generated Dict with latent_dim={chosen_latent_dim} and model index {opt.Dec_Child_index} with Layerdiscription:")
                LayerDescription = netgen.get_child_arch("decoder",str(chosen_latent_dim), str(opt.Dec_Child_index), Decoder_Netparameter_dict)
                print("Entry found. continue with NET INIT")
                break
            except:
                opt.Dec_Child_index += 1
                print("THIS IS THE WHOLE DECODER NETPARAMETER DICT")
                print(Decoder_Netparameter_dict[str(chosen_latent_dim)])
                input(f"no model found with searched key, press key to continue")
                PrintException()
                if opt.Dec_Child_index >= Generation_limit:
                    print("no model found")
                    break

    print("DECODER NETWORK LAYERDISCRIPTION")
    print(LayerDescription) 


    input_shape = (opt.batch_size,1, opt.img_size[0],opt.img_size[1])
    NetParameters = {'LayerDescription': LayerDescription, 'input_shape': input_shape,'magnification': magnification, 'opt':opt, 'No_latent_spaces': opt.No_latent_spaces}

    # Initialize Decoder
    decoder = Decoder(NetParameters)
    count_parameters(decoder)
    DecoderNetParameters = NetParameters


    ########################################################################################
    #####################            DISCRIMINATOR                  #####################
    ########################################################################################   
    if opt.autoencoder == "off":
        ########################################################################################   
        #       EVOLUTIONAL GENERATION OF DISCRIMINATOR NETWORK
        ########################################################################################   
        #First time the Population init threshold is reached, mating will be initialized, so every model & Fitness is extracted from the previously calced models
        if opt.max_trials >= opt.Population_Init_Threshold and opt.first_time == True:
            print("INIT MATING NOW")
            Discriminator_Netparameter_dict = netgen.generate_children_from_parents("discriminator",Generation_limit,opt)
            opt.Dis_Child_index = 0
            if opt.autoencoder == "off":
                opt.first_time = False
                opt.Population_Init_Threshold =+ Generation_limit
        elif opt.max_trials <= opt.Population_Init_Threshold:     #Population Init
            Valid_mag_train = netgen.init_all_magnification_trains('discriminator')
            #print(f"All valid Valid_mag_train combos are {Valid_mag_train}")
            Valid_parallel_layer_train = netgen.init_all_parallel_layers()
            #print(f"All valid Valid_parallel_layer_train combos are {Valid_parallel_layer_train}")
            LayerDescription = netgen.generate_random_Net("discriminator",opt.No_latent_spaces, Valid_mag_train,Valid_parallel_layer_train )

        if opt.max_trials >= opt.Population_Init_Threshold:   # Crossover
            while True:
                try:
                    LayerDescription = netgen.get_child_arch("discriminator",chosen_latent_dim, str(opt.Dis_Child_index),Discriminator_Netparameter_dict)
                    print("Entry found. continue with NET INIT")
                    break
                except:
                    PrintException()
                    opt.Dis_Child_index += 1
                    if opt.Dis_Child_index >= Generation_limit:
                        print("no model found")
                        break

        print("DISCRIMINATOR CHILD LAYERDISCRIPTION", LayerDescription)
        
        input_shape = (opt.batch_size, opt.latent_dim**2)
        NetParameters = {'LayerDescription': LayerDescription, 'input_shape': input_shape, 'No_latent_spaces':opt.No_latent_spaces, 'opt':opt,  'device': device}

        #INIT DIS NETWORK
        discriminator = Discriminator(NetParameters)
        count_parameters(encoder)
        DiscriminatorNetParameters = NetParameters

    else:
        discriminator = None
        DiscriminatorNetParameters = None


    #HYPERPARMS INIT END<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    return opt, inpainting, InpaintingParameters, encoder,EncoderNetParameters,  decoder, DecoderNetParameters, discriminator, DiscriminatorNetParameters




# -------------------------------------------------------------------------------------
#  Training loop
# -------------------------------------------------------------------------------------
def TrainGAN(opt,trainDataloader, inpainting ,  encoder, decoder, discriminator , BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss):
    
    ########################################################################################
    #####################               LOSSES                         #####################
    if opt.autoencoder == "off":
        adversarial_loss = torch.nn.BCELoss() # Use binary cross-entropy loss
    pixelwise_loss = torch.nn.L1Loss()

    ########################################################################################
    #####################               Send to device                 #####################   

    if device == "cuda":
        try:
            BoxcountEncoder.cuda()
        except:
            pass
        try:
            inpainting.cuda()
        except:
            PrintException()
            pass
        encoder.cuda()
        decoder.cuda()
        if opt.autoencoder == "off":
            discriminator.cuda()
            adversarial_loss.cuda()
        pixelwise_loss.cuda()

    ########################################################################################
    #####################               Optimizers                     #####################
    optimizer_AE = torch.optim.Adam(itertools.chain(encoder.parameters(), decoder.parameters()), lr=opt.lr, betas=(opt.b1, opt.b2))
    if opt.autoencoder == "off":
        optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))
        optimizer_G = torch.optim.Adam(encoder.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))
    Tensor = torch.cuda.FloatTensor if device == "cuda" else torch.FloatTensor

    Loss_Now = 100000.0 #Starting loss just to ensure, that first model is best after 1st training
    Dis_Loss = 100000.0
    TrailingLoss = 10.0
    LossLastRound = 100.00
    opt.max_happens = 500
    Happend_counter = 1
    config_counter = 0
    opt.breaker = False
    import config
    dataloaderlenght = len(trainDataloader)

    for epoch in range(opt.n_epochs):
        printcounter = 0
        for i, BatchToBePredicted in enumerate(trainDataloader):
            start = time.perf_counter()
            display = False
            config_counter += 1
            if config_counter >= 50:
                config = reload(config) #to alter the configs on the fly while training/testing and importing here to always import changes made to the config.py file
                config_file = config.config_file()
                opt = config_file.set_opt_parameters(config_file.ON_OFF_Switch,opt)
                config_counter = 0

            if opt.breaker:
                #if user specifys opt.breaker = True in config.py, 
                break

            
            imgs, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16 = BatchToBePredicted  #unpacking images and fractal arrays
            BCR_LAK_map_2.float(), BCR_LAK_map_4.float(), BCR_LAK_map_8.float(), BCR_LAK_map_16.float()
            
            #convert to tensors and send to device(ram/vram)
            real_labels_2 = Variable(BCR_LAK_map_2.type(Tensor))
            real_labels_2.to(device)

            real_labels_4 = Variable(BCR_LAK_map_4.type(Tensor))
            real_labels_4.to(device)

            real_labels_8 = Variable(BCR_LAK_map_8.type(Tensor))
            real_labels_8.to(device)

            real_labels_16 = Variable(BCR_LAK_map_16.type(Tensor))
            real_labels_16.to(device)

            # Configure input
            real_imgs = Variable(imgs.type(Tensor))
            real_imgs.to(device)

            #Data on which the pixelloss is calced - because real_imgs are altered by inpainting module
            ori_imgs = Variable(imgs.type(Tensor))
            ori_imgs.to(device)

            # Adversarial ground truths
            valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False)
            valid.to(device)
            fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False)
            fake.to(device)

            # -----------------
            #  Train Generator
            # -----------------
            optimizer_AE.zero_grad()

            #if you hit the limit of updatemaskevery, when random mask mode is true. choose new random mask
            if printcounter == opt.UpdateMaskEvery and inpainting.randdommask == True:
                # -----------------
                #  UPDATE INPAINTING
                # -----------------

                opt.maskmean = (torch.randint(50,150,(1,),device=torch.device(opt.device))[0]  , torch.randint(50,150,(1,),device=torch.device(opt.device))[0]  )
                opt.maskdimension = (torch.randint(1,50,(1,),device=torch.device(opt.device))[0]  , torch.randint(1,50,(1,),device=torch.device(opt.device))[0]  )
                print("Reinit inpainting with  " ,opt.maskmean, opt.maskdimension)
                #ReInitilization of Inpainting layer
                InpaintingParameters = {
                    'opt': opt,
                    'superresolution': (opt.superres, 2),
                    'noise': (opt.noisebool, opt.std, opt.std_decay_rate),
                    'mask': (opt.maskbool, opt.maskmean , opt.maskdimension),
                    'Letterbox': (opt.LetterboxBool, opt.LetterboxHeight),
                    'Pillarbox': (opt.PillarboxBool, opt.PillarboxWidth),
                }
                inpainting = Inpainting(InpaintingParameters)
                printcounter = 0 #Reset Printcounter

            batches_done = epoch * dataloaderlenght + i

            if batches_done % opt.sample_interval == 0:
                input_imgs = real_imgs[:,0,:,:].detach().cpu().numpy() 

            real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16 = inpainting(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)
            printcounter +=1
            
            start_autoencoder = time.perf_counter()
            encoded_imgs = encoder(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)
            end_encoder = time.perf_counter()

            decoded_imgs = decoder(encoded_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16 )
            end_autoencoder = time.perf_counter()

            # ---------------------
            #  Train Discriminator
            # ---------------------
            if opt.autoencoder == "off":
                start_discriminator = time.perf_counter()
                
                predicted = discriminator(encoded_imgs)
                ########################################################################################
                #####################     Calculate Losses and Backpropagation     #####################  
                # Loss measures generator's ability to fool the discriminator
                discr_loss = 0.001 * adversarial_loss(predicted, valid) 
                Pixelloss =  0.999 * pixelwise_loss(decoded_imgs, ori_imgs)  #compare against ori_imgs, because real_imgs were altered by iz
                AE_loss = discr_loss + Pixelloss
                AE_loss[torch.isnan(AE_loss)] = 1.0
                AE_loss.backward()
                AutoEncoder_loss = float(AE_loss.item())
                optimizer_AE.step()

                encoder.eval()  # to disable dropout and fix the encoder network
                optimizer_D.zero_grad()

                # Sample noise as fake latent variable
                z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim*opt.latent_dim_x ))))
                z_fake = encoder(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)
                # Measure discriminator's ability to classify real from generated samples
                real_loss = adversarial_loss(discriminator(z), valid)
                fake_loss = adversarial_loss(discriminator(z_fake), fake)
                
                d_loss = 0.5 * (real_loss + fake_loss)
                d_loss.backward()
                optimizer_D.step()

                encoder.train()
                optimizer_G.zero_grad()
                #New fake latent space with backprop of generator(encoder) to match the gaussian distribution
                z_fake = encoder(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)
                g_loss = adversarial_loss(discriminator(z_fake),valid)
                g_loss.backward()
                optimizer_G.step()
                gen_loss = float(g_loss.item())

                if float(d_loss.item()) == 50.0:
                    print("d_loss is 50 and AE_loss should be nan...sleep 2 and break")
                    time.sleep(2)
                    Loss_Now = 100000.0 -float(i)
                    break
                else:
                    Loss_Now = float(AE_loss.item()) + float(d_loss.item())

                Dis_Loss = float(d_loss.item())
                
                LossThisRound = Dis_Loss + AutoEncoder_loss

                if round(LossThisRound,5) == round(LossLastRound,5):
                    Happend_counter +=1
                    if Happend_counter >=opt.max_happens:
                        #if this happend 10 times, then break
                        print("Non Decreasing/Constant Discriminator Loss happend ", Happend_counter)
                        print("Last trailing Loss: "+ str(TrailingLoss)+ " Constant Discriminator loss happend "+str(Happend_counter)+"/"+str(opt.max_happens))
                        break
                    else:
                        #print("Loss Converging")
                        pass

                print(
                    "[Epoch %d/%d] [Batch %d/%d] [D loss: %f] [AE loss: %f] [G loss: %f]"
                    % (epoch, opt.n_epochs, i, dataloaderlenght, Dis_Loss, AutoEncoder_loss,gen_loss)
                )

            else:
                ########################################################################################
                #####################     Calculate Losses and Backpropagation     ##################### 
                AE_loss = pixelwise_loss(decoded_imgs, ori_imgs)  #compare against ori_imgs, because real_imgs were altered by inpainting class
                AE_loss[torch.isnan(AE_loss)] = 1.0
                AE_loss.backward()
                optimizer_AE.step()
                AutoEncoder_loss = float(AE_loss.item())                
                LossThisRound = AutoEncoder_loss
                print(
                    "[Epoch %d/%d] [Batch %d/%d]  [AE loss: %f]"
                    % (epoch, opt.n_epochs, i, dataloaderlenght, AutoEncoder_loss))

                print("Last trailing Loss: "+ str(TrailingLoss))
                        
            if opt.CurrentTrial <= opt.Population_Init_Threshold:
                TrailingLoss = np.multiply(TrailingLoss, 0.999)
            elif  opt.CurrentTrial <= opt.Population_Init_Threshold + 20:
                TrailingLoss = np.multiply(TrailingLoss, 0.99999)  
            elif  opt.CurrentTrial <= opt.Population_Init_Threshold + 50:
                TrailingLoss = np.multiply(TrailingLoss, 0.999999) 
            else:    
                scaling = 0.001
                multiplicator = 1.0 - (AutoEncoder_loss * scaling)
                TrailingLoss = np.multiply(TrailingLoss, multiplicator)  
        
            end = time.perf_counter()
            
            print(f"Lapse time:{round(end-start,6)}s   Encoder time: {round(end_encoder - start_autoencoder,6)}s    Generator/Decoder time: {round(end_autoencoder - end_encoder,6)} ")

            LossLastRound = LossThisRound
            if LossThisRound >= TrailingLoss:
                #If The loss this round is Higher than the mean of the trailing loss, then break training, cause model isn't going anywhere 
                print("Breaking, cause model doesnt converge anymore, but please check anyway")
                time.sleep(1)
                break

    ### SAVE MODEL IF its better than before
    if previous_Best_Loss == None:
        try:
            previous_Best_Loss = AutoEncoder_loss + Dis_Loss
        except:
            previous_Best_Loss = AutoEncoder_loss 

    try:
        Loss_Now = AutoEncoder_loss + Dis_Loss
    except:
        Loss_Now = AutoEncoder_loss

    print("Best loss so far  :", previous_Best_Loss)
    print("loss of this model:", Loss_Now)

    return Loss_Now, AutoEncoder_loss, previous_Best_Loss, Dis_Loss, encoder, decoder, discriminator

#Function to pass optimizing hyperparameters to TrainGAN function, for evaluating fitness and saving conditions
def TrainGAN_with(all_ness_params):
    global opt
    opt, Dataset, trainDataloader, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace  =  all_ness_params 

    print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<HYPERPARAMETERS>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
    opt.n_epochs =  HyperParameterspace['n_epochs']
    opt.lr =  HyperParameterspace['lr']
    opt.b1 =  HyperParameterspace['b1']
    opt.b2 =  HyperParameterspace['b2']

    if opt.max_trials <= opt.Population_Init_Threshold :
        print("Altering latent dim through hyperopt. Should only happen, when in pop init phase")
        opt.latent_dim =  HyperParameterspace['latent_dim']
    
    opt.latent_dim_x = int((opt.img_size[1]/ opt.img_size[0]) * opt.latent_dim)
    
    print("opt.lr", opt.lr)
    print("opt.b1",opt.b1)
    print("opt.b2", opt.b2)
    print("opt.latent_dim", opt.latent_dim)
    print("opt.latent_dim_x is  ",opt.latent_dim_x)
    print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<HYPERPARAMETERS>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
    
    Not_Working = 0
    
    opt, inpainting, InpaintingParameters ,encoder,EncoderNetParameters,  decoder, DecoderNetParameters, discriminator, DiscriminatorNetParameters  = init_GAN(all_ness_params)
    
    while True:
        try:
            Loss_Now, AE_loss, previous_Best_Loss,Dis_Loss,  encoder, decoder, discriminator = TrainGAN(opt,trainDataloader,inpainting , encoder, decoder, discriminator , BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss)
            break
        except:
            PrintException()
            print("Model could not be initialized, sleep 1s and try again with next child arch")
            print("Child_index "+ str(opt.Enc_Child_index ))
            opt.Enc_Child_index += 1
            opt.Dec_Child_index += 1
            opt.Dis_Child_index += 1
            try:    
                print("Clear up memory by deleting the models from vram")
                del encoder
                del decoder
                if opt.autoencoder == "off":
                    del discriminator
                del inpainting
            except:
                pass

            torch.cuda.empty_cache()

            opt, inpainting, InpaintingParameters ,encoder,EncoderNetParameters,  decoder, DecoderNetParameters, discriminator, DiscriminatorNetParameters  = init_GAN(all_ness_params)
            
            Not_Working += 1
            if Not_Working >= 1000:
                PrintException()
                raise Exception("INIT or Training not possible")
                break

            continue
        
    # save all models, when better model was found with LOWER LOSS 
    if Loss_Now <= previous_Best_Loss :
        saveplace = FileParentPath
        saveplace +="/models/"
        saveplace +="/GAN/"
        saveplace += opt.ProjectName +"/"
        saveplace += str(int(time.time())) + "_"        #to append something unique to filename preventing overwriting

        print(EncoderNetParameters)
        print("!!!!!!!!!!!!!!!!!!!!!!!!!!")
        print(EncoderNetParameters['LayerDescription'])
        #print(len(EncoderNetParameters['LayerDescription']))

        #ENCODER########################################
        NetParametersSaveplace = saveplace+"Loss" + str(round(AE_loss,6)) +"---_ENCODER" +".netparams"
        with open(NetParametersSaveplace, "wb") as f:
            pickle.dump(EncoderNetParameters, f)
        
        stateDictSaveplace = saveplace+"Loss" + str(round(AE_loss,6)) +"---_ENCODER" +".model"
        torch.save(encoder.state_dict(), stateDictSaveplace)

        #DECODER########################################
        NetParametersSaveplace = saveplace+"Loss" + str(round(AE_loss,6)) +"---_DECODER" +".netparams"
        with open(NetParametersSaveplace, "wb") as f:
            pickle.dump(DecoderNetParameters, f)
        
        stateDictSaveplace = saveplace + "Loss" + str(round(AE_loss,6)) +"---_DECODER.model"
        torch.save(decoder.state_dict(), stateDictSaveplace)

        if opt.autoencoder == "off":
            #Discriminator#####################################
            NetParametersSaveplace =  saveplace+"Loss" + str(round(Dis_Loss,6)) +"---_DISCRIMINATOR" +".netparams"
            with open(NetParametersSaveplace, "wb") as f:
                pickle.dump(DiscriminatorNetParameters, f)
            
            stateDictSaveplace =   saveplace +"Loss" + str(round(Dis_Loss,6)) +"---_DISCRIMINATOR.model"
            torch.save(discriminator.state_dict(), stateDictSaveplace)

        print("Model Saved")
        previous_Best_Loss = Loss_Now

    else:
        print("Loss was higher/worse than previous best model")

    return {'loss': Loss_Now, 'status': STATUS_OK}


#neccesary function to load previously saved hyperparameteroptimization trials object and for initiating the training procedure
def begin_training(all_ness_params):
    opt, Dataset, DataLoader, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace  =  all_ness_params 
    opt.Enc_Child_index = 0
    opt.Dec_Child_index = 0
    opt.Dis_Child_index = 0
    opt.first_time_choosing_latent_size = True

    #Source: https://github.com/hyperopt/hyperopt/issues/267
    #To save trials object to pick up where you left
    def run_trials(Modelname,all_ness_params):
        #ATTENTION: If you want to begin training anew, then you have to delete the .hyperopt file
        TrialsSaveplace = FileParentPath
        TrialsSaveplace +=  "/Hyperoptimization/"+ str(Modelname) +".hyperopt" 
        trials_step = 1  # how many additional trials to do after loading saved trials. 1 = save after iteration
        max_trials = 5  # initial max_trials. put something small to not have to wait
        opt.max_trials = max_trials
        try:  # try to load an already saved trials object, and increase the max
            trials = pickle.load(open(TrialsSaveplace, "rb"))
            print("Found saved Trials! Loading...")
            max_trials = len(trials.trials) + trials_step
            opt.max_trials = max_trials
            print("Rerunning from {} trials to {} (+{}) trials".format(len(trials.trials), max_trials, trials_step))
        except:  # create a new trials object and start searching
            trials = Trials()

        lowest_loss = fmin(TrainGAN_with, all_ness_params, algo=tpe.suggest, max_evals=max_trials, trials=trials)
        print("Lowest achieved loss so far:", lowest_loss)
    
        # save the trials object
        with open(TrialsSaveplace, "wb") as f:
            pickle.dump(trials, f)

    # loop indefinitely and stop whenever you like by setting MaxTrys
    MaxTrys = 10000
    #initial_population = 50  #Initial population of 250 trys for every length and depth in network
    #Interrupt Trials
    äInterrupt_trials_index = 0
    #Both parallell layers and max lenght has to be 2 at least for permutations 
    import config

    for TotalTrials in range(MaxTrys):
        opt.CurrentTrial = TotalTrials
        config = reload(config) #to alter the configs on the fly while training/testing and importing here to always import changes made to the config.py file
        config_file = config.config_file()
        ON_OFF_Switch = config_file.ON_OFF_Switch
        #print("Onn OF Switch is ",ON_OFF_Switch)
        opt = config_file.set_opt_parameters(ON_OFF_Switch,opt)
        HyperParameterspace = config_file.set_Hyperparameterspace(config_file.ON_OFF_Switch_Hyperparams ,HyperParameterspace)
        all_ness_params = opt, Dataset, DataLoader, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss , HyperParameterspace  
        run_trials( opt.ProjectName,all_ness_params)    



### 4.2. Testing

The testing procedure is very similar to the training loop. The main differences are that there are no optimizers present to adjust the trained networks and the networks are in evaluation mode and no backpropagation happens. Also a different dataset is used to ensure that the neural networks did not memorize the training dataset. So if testing accuracy is still almost as good as within the training procedure, the networks have acquired general knowledge about the underlying problem.
The test data is passed through the inpainting, encoder and decoder network and an output is generated. 

To access the testing loop, the main loop is executed and “gan”, “test” and “mnist” are entered to the terminal prompt. Now a filedialog window opens to select a specifically chosen encoder model, while the belonging decoder model is chosen automatically. The belonging discriminator network has to be chosen also by the user and is identifiable by the same unix timestamp within the filename.
To validate the user's intended functionality, a window is displayed showing the input image, the distrubed image and the reconstructed image. The possibilities exist to show the next batch, or to exit the testing of the current model.
Should the reconstructed image look like the input image without any noise, then the goal of denoising data is accomplished. 



In [None]:

def TestGAN(all_ness_params):

    opt, Dataset, testDataloader, inpainting, InpaintingParameters ,encoder,EncoderNetParameters,  decoder, DecoderNetParameters,  discriminator , DiscriminatorNetParameters, BoxCountEnc_model,BoxcountEncoder,device, previous_Best_Loss  , HyperParameterspace = all_ness_params

    inpainting = Inpainting(InpaintingParameters) #initialize inpainting class

    ########################################################################################
    #####################               LOSSES                         #####################   

    if opt.autoencoder == "off":
        adversarial_loss = torch.nn.BCELoss()  # Use binary cross-entropy loss

    pixelwise_loss = torch.nn.L1Loss()   # mean squared error pixel-wise loss

    ########################################################################################
    #####################               Send to device                 #####################
    print(f"Chosen Testing device == {opt.device}")
    
    if opt.device == "cuda":
        try:
            BoxcountEncoder.cuda()
        except:
            pass
        try:
            inpainting.cuda()
        except:
            PrintException()
            pass

        encoder.cuda()
        decoder.cuda()
        if opt.autoencoder == "off":
            discriminator.cuda()
            adversarial_loss.cuda()
        pixelwise_loss.cuda()

    Tensor = torch.cuda.FloatTensor if opt.device == "cuda" else torch.FloatTensor
    config_counter = 0

    for i, BatchToBePredicted in enumerate(testDataloader):
        torch.no_grad() #no gradients required during testing.
        
        #reload config every time while testing, cause changes to the config.py file should apply to the next batch instantly
        config_counter += 1
        if config_counter >= 1:
            config = reload(config) #to alter the configs on the fly while training/testing and importing here to always import changes made to the config.py file
            config_file = config.config_file()
            opt = config_file.set_opt_parameters(config_file.ON_OFF_Switch,opt)
            config_counter = 0


        #USE LABELS FROM CPU BOXCOUNT
        imgs, BCR_LAK_map_2, BCR_LAK_map_4, BCR_LAK_map_8, BCR_LAK_map_16 = BatchToBePredicted
        BCR_LAK_map_2.float(), BCR_LAK_map_4.float(), BCR_LAK_map_8.float(), BCR_LAK_map_16.float()

        real_labels_2 = Variable(BCR_LAK_map_2.type(Tensor))
        real_labels_2.to(opt.device)

        real_labels_4 = Variable(BCR_LAK_map_4.type(Tensor))
        real_labels_4.to(opt.device)

        real_labels_8 = Variable(BCR_LAK_map_8.type(Tensor))
        real_labels_8.to(opt.device)

        real_labels_16 = Variable(BCR_LAK_map_16.type(Tensor))
        real_labels_16.to(opt.device)

        # Adversarial ground truths
        valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False)
        fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False)
        valid.to(opt.device)
        fake.to(opt.device)

        # Configure input
        real_imgs = Variable(imgs.type(Tensor))
        real_imgs.to(opt.device)
        ori_imgs = Variable(imgs.type(Tensor))
        ori_imgs.to(opt.device)
        input_imgs = real_imgs[:,0,:,:].detach().cpu().numpy() #batch, 1, y,x

        # -----------------
        #  Test Generator
        # -----------------
        real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16 = inpainting(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)
        inpainted_imgs = real_imgs.detach().cpu().numpy() # for displaying purposes
        
        encoded_imgs = encoder(real_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16)

        decoded_imgs = decoder(encoded_imgs, real_labels_2, real_labels_4, real_labels_8, real_labels_16 )
        
        ###########################################################
        ###  SHOW GENERATED PICTURES
        ###########################################################

        ArrayList = np.array([])
        Namelist = []
        inpainted_imgs = inpainted_imgs[:,0,:,:]        
        gen_imgs = decoded_imgs[:,0,:,:].detach().cpu().numpy()
        for i in range(opt.batch_size):
            if i == 0:
                Namelist.append("Input")
                ArrayList = np.array([input_imgs[i,:,:]])
            else:
                ArrayList = np.append(ArrayList,[input_imgs[i,:,:]],axis=0)
                Namelist.append("")

        for i in range(opt.batch_size):
            ArrayList = np.append(ArrayList,[inpainted_imgs[i,:,:]],axis=0)
            if i == 0:
                Namelist.append("Noise/Mask")
            else:
                Namelist.append("")


        for i in range(opt.batch_size):
            try:
                ArrayList = np.append(ArrayList,[gen_imgs[i,:,:]],axis=0)
            except:
                pass
            if i == 0:
                Namelist.append("Output")
            else:
                Namelist.append("")

        showNPArrayImageArray(ArrayList, Namelist, opt, False)

        if opt.autoencoder == "off":
            # Loss measures generator's ability to fool the discriminator
            AE_loss = 0.001 * adversarial_loss(discriminator(encoded_imgs), valid) + 0.999 * pixelwise_loss(decoded_imgs, ori_imgs)
            AE_loss[torch.isnan(AE_loss)] = 1.0

            # ---------------------
            #  TEST Discriminator
            # ---------------------
            # Sample noise as discriminator ground truth
            z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim**2))))
            
            # Measure discriminator's ability to classify real from generated samples
            real_loss = adversarial_loss(discriminator(z), valid)
            fake_loss = adversarial_loss(discriminator(encoded_imgs.detach()), fake)

            d_loss = 0.5 * (real_loss + fake_loss)

            print("[[Batch %d/%d] [D loss: %f] [AE loss: %f] [G loss: %f]"
                % ( i, len(testDataloader), d_loss.item(), AE_loss.item(), fake_loss.item()))

            batches_done =  len(testDataloader) + i

        else:
            AE_loss = pixelwise_loss(decoded_imgs, ori_imgs)   
            AE_loss[torch.isnan(AE_loss)] = 1.0

            print(
                "[[Batch %d/%d] [AE loss: %f]"
                % ( i, len(testDataloader),  AE_loss.item())
            )

        control = input("What do you want to do? Next Batch(n), Break(b): (n/B)")
        if control.lower() == "n":
            continue
        elif control.lower() == "b" or control.lower() == "" :
            break


### 4.3 Validation

Validation mode can be entered by entering “gan”, “val” and “mnist” to the terminal prompts while executing the main loop. Validation of the individual models are exactly the same as within testing with the difference of testing all trained models sorted by inclining loss, which is described in detail in chapter 3.4.5.7. With this strategy, the user should encounter useful models first, saving hours of time that would take, when checking every trained model. The user can specify at each generated validation window to show the next batch of test images to ensure the network's functionality or to break testing the current model. After breaking, the model can be deleted if behaving badly. Deleting the bad models with high accuracy should enable the evolutionary network generator avoiding those network architectures. The models with bad accuracy, hence high loss values can be automatically deleted while training with the evolutionary network generator initialization. So training models even after validation by deleting all unnecessary models should enable the evolutionary network generation approach a higher chance to generate new better working network architectures.

# 5 Results and discussion

The results and the following discussion will focus on three main aspects. First showing the functionality of a working denoising autoencoder. Secondly it’ll be a comparison between the adversarial autoencoder and the variational autoencoder and what to choose at which wanted functionality.
At last a discussion of the possible advantages of evolutionary network search by investigating the loss development over training steps is conducted.

The training procedure for the variational and adversarial autoencoders were conducted similarly. The population initialization threshold was arbitrarily chosen at 120, so up to that number of trials random network architectures were generated. Afterwards the first evolutionary network architecture search was started by crossover of the previously trained models, but without deleting the models with higher loss value than average loss. For 10 trials every latent dimension was trained in this mode. After this the standard deviation for the gaussian noise was increased from the default value of 0.01 to 0.02 disturbing the image more than before.
Then the evolutionary network search was started another time, but by deleting the bad models and net parameters at the start of the next training run, only good models were able to generate new models from old ones. At last the program was executed in validation mode with deletion of all bad behaving models with good loss value.

## 5.1 Denoising Autoencoder
![Working Denoising Autoencoder]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/Working_Autoencoder.gif "Working Denoising Autoencoder")


Figure 10: Working denoising autoencoder showing input, altered and output images


As seen in figure 10 the functionality of a denoising autoencoder was achieved. The selected machine learning models were able to extract the added noise from the altered data and reconstructed the input images almost perfectly. 

![Bad Denoising Autoencoder]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/blocky_pixelated_Autoencoder.gif "blocky_pixelated_Autoencoder")

Figure 11: Badly functioning denoising autoencoder showing input, altered and output images


As it can be seen in 11, the input and the reconstructed image differ quite a bit. Depending on the model and its architecture the output can be pixelated, or the denoising isn’t handled accordingly. With the shown model in particular the noise from the black regions of the original image are removed almost completely. Just here and there a noise speckle remains but the most noise is retained within the handwritten digits.
The loss during trainings from worst to best ranged from 0.254 and ended at 0.019. Note that no model achieved a loss of zero, hence perfect accuracy. This is normal for machine learning models because of their stochastic nature and an accuracy of 100 % tends to be a human error during training or testing where the neuronal networks can exploit some error. 
If the user validates this percentage of an error as good enough for the process, a model for image denoising can be deployed. Alternatively radio transmissioned data, which got corrupted during transmission could be reconstructed, when uncorrupted data was available to train with. 
This is just able to work, when no auxiliary data is used in the decoder network, since the receiving side would need this data in an uncorrupted state to reproduce the compressed image. Training the model with a chosen latent dimension size of 2 should urge the evolutionary network architecture search to utilize the auxiliary data, because a size of 2 is not enough to reproduce all pictures with their given handwriting style. 


![Denoising Fractal Autoencoder]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/Fractal_Autoencoder.gif "Denoising Fractal Autoencoder")

Figure 12: Images reconstructed from an autoencoder, which uses spacial box count arrays

It can be seen in figure 12, that the model with a higher compression ratio and a smaller latent dimension size of 2 performed worse than the best performing model, which uses no auxiliary data and a latent dimension size of 16, which was foreseeable. With higher compression ratio and so forth smaller latent dimensionality, the encoder and decoder models have to compute a problem with higher difficulty.
Depending on the use case and the available transmission bandwidth, represented by the size of the latent dimension the evolutionary network generator is able to use whatever architecture and auxiliary data it needs to perform its given task. 


The neuronal model file sizes range from a few hundreds of kilobytes up to 50 megabytes while most of them take a few megabytes when taking a look within the model folder. Despite the low hard drive memory consumption the ram and vram consumption can be in the gigabyte range when loading and executing the neuronal networks. So training complex models is limited by the vram of the utilized graphics card.


## 5.2 Loss development with random and evolutionary network architecture search

A variational autoencoder was trained by changing the ‘opt.autoencoder’ variable to ‘on’ and the loss values and its training step saved into a table, which results in the shown figure 13. 
The loss during training with random network searchs max value began at 0.254 and ranged to 0.18 at it's minimum from the initial training step until step 120, which arbitrarily marks the minimum initial population. 
The random network search so forth is also capable of producing a model handling its given task. But being dependent on chance is not a valuable property when performing in the real world and the evolutionary network search has also some benefits.

![Loss over trials var AE]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/Loss_over_trials.jpg "Loss_over_trials")

Figure 13: Loss over trials chart depicting the loss performance

Figure 13 shows the loss of the trained network architectures over the conducted training trials. To filter the data a linear fitted trend line was interpolated to gain knowledge about the general direction of loss development. The gradient of this linear trend line was fitted and is -4,08E-5 and so forth linked to a decreasing loss over the conducted trials. Also moving averages over 3 and 5 trials were drawn to filter the data.

After step 120 the evolutionary network architecture search was conducted without deleting the bad models, which also took part in the network architecture crossover process up to training step 160.
After training step 160, the process was stopped and continued with deleting the models with a higher loss value than the mean of all losses. 

As it can be seen in the Figure 13 even random network architecture search can aquire good performing models. If the task is relatively simple like with reproducing handwritten digits the network architecture has less significant impact on the models performance and random network search is a viable method. The loss volatility of the models in random network architecture search is higher than in the evolutionary approach after trial 120.
The moving averages also depict more stability in evolutionary search seen by less volatile breakouts of the 3-trial moving average after trial 120.

Fast optimization through genetic evolutionary network search is not really possible, because genetic algorithms tend to need many more trials and generations of network architectures to get to even better loss values. Since the quality of the trained autoencoder like seen in figure 10 was acceptable enough further training wasn't conducted. With a harder problem to tackle like analyzing and generating human faces could leverage the evolutionary approach more.

The advantages of evolutionary network architecture search are that it is able to evaluate from past experiences. In contrast to the random search it doesn't have to rely on pure chance and creates almost every time a good performing network architecture, because it resembles different versions of already good performing model architectures. 
Other advantages are the scalability of this kind of network search. So training could be deployed to many computation devices, even a decentralized network of computers where each network training can compute on its own and redirect the trained network and its net parameters to the host to generate the next generation of network architectures. So many computation devices could compute the same task, without beginning from scratch every time.

## 5.3  Adversarial autoencoders compared to variational autoencoders

While a discriminator can help the autoencoder's accuracy, for training a denoising model it isn’t necessary. The adversarial autoencoder is there to parameterize the latent space, so a human can understand this latent space and can pass some samples to the latent space for the decoder network to generate some real output . If just this functionality of denoising input images is wanted, training a variational autoencoder is just as feasible as the adversarial counterpart. The variational autoencoder was trained the same way, but by setting “on” for “opt.autoencoder”-variable, the discriminator network and the additional training step to fit the encoder network to the wanted gaussian distribution is disabled. This action saves a significant amount of VRAM and processing time due to the decreased instructions. 

Figure 14 shows the first few adversarial autoencoders trained in the same manner as the variational autoencoder. The functionality of denoising the input images was achieved in a more pixelated output with sometimes some noise still prevalent. 

![Adversarial_AE]( https://raw.githubusercontent.com/ollimacp/FractalGan/main/Adversarial_AE.gif "Adversarial_AE")

Figure 14: Output generated through adversarial autoencoders 

So concluding that in this kind of task, the adversarial discriminator did not help the networks task to denoise. 

A specific advantage of the variational autoencoder could be used in encrypted transmission of messages. The latent space of the variational autoencoder isn’t constrained to be in a certain distribution. By training a variational autoencoder and giving the receiving side the decoder network. The encrypted message could be processed by the encoder network. The encoded latent space is used as an encrypted message. The receiving side uses the decoder network to decrypt the message. Due to the variations of the latent space in any trained model promotes security in transmitting data. If the encoding gets cracked or the encoder or decoder models get compromised, new ones can be trained and deployed. Even if the source code to train these models is stolen doesn’t necessarily mean that the encryption process is disclosed. Just when the attacker has the source code, the to be encrypted data and one of the machine learning models could train the counter part of the autoencoder to fit the other model and thus compromise the encryption.
A major disadvantage of this encryption and transmission method would be that the decrypted message by the decoder model doesn’t result in 100 % accuracy. So this kind of transmission could be just done, if the decoded message is interpretable. En- and decrypting instructions for computers for example would result very probably in an error.
The security of this code would have to be tested rigorously if even to be considered to handle delicate information transmission. 




# 6 Conclusion & Outlook


## 6.1 Conclusion


This work shows that generative adversarial networks and autoencoder networks are a powerful and versatile tool for many potential problems without needing labeled data. Custom convolutional neural networks inspired through inception and residual networks paired with an evolutionary, generative network architecture search offers a fast and viable methodology to train a neural network according to any specified task. Through many choosable parameters, this algorithm can be adjusted to different network architectures like adversarial and variational autoencoder networks. Using spacial fractal metrics can help the networks while encoding to a small latent space. It was shown that an autoencoder setup was capable of being trained in a few hours to remove noise from images using the mnist handwritten digits dataset.



## 6.2 Outlook

Many potentials are still to be explored to improve the algorithm to a fully functional and deployable tool for users. To be less constrained by a gpu’s vram, future versions could build on the module deepspeed & pytorch lighting just having to load parts of the network to the graphics card and offload the rest of the neural net into the ram of the computer. This would lead to multi billion parameter models with much greater capabilities than the million parameter models now and could process high resolution colored  images rather than grayscale pixelated thumbnails. 
To expand the algorithms versatility, the inpainting module has to be checked and trained with other settings to enable neural inpainting, superresolution and aspect ratio changes.  
The loss functions for the reconstruction loss and adversarial loss have lots of possibilities to improve since in this work just utilized simple loss functions like binary cross entropy and pixel wise mean squared error. For example time and resource consumption could flow into the loss function to urge the optimization to use more efficient model architectures.
Enabling further analyzing features like the latent space exploration as done in the original adversarial autoencoder work, another program modus operandi has to be constructed. Also the possibility of choosing another latent space distribution than the gaussian normal distribution could enable some controllability for the user to shape the latent space better to the input data.
The potential benefits and drawbacks of using fractal data arrays have to be observed and evaluated closely. The main benefit of fractal box counts and lacunaritys are useful to search, sort and compare data to itself forming a basis of a search engine, which is not limited by the data type.
To gain another level of artistic control, another program function has to be written to combine a latent variable and the fractal arrays from other pictures, hoping that the combined output resembles something in between. This way when generating for example a human face, some parts of the fractal arrays could be altered by crude image modification and let the neural networks generate a believable, wanted appearance. 

In the far future the combined functionality of the algorithm in this and previous workings, and the functions to be embedded into the program could form a versatile framework for data analyzing, generation and manipulation to be used from users to miscellaneous problems and tasks.



## 7 ACKNOWLEDGMENTS AND SOURCES


### [1] Spacial box counting Ole Peters (2021): https://github.com/ollimacp/spacial-boxcounting-cpu-gpu
### [2] Time module documentation: https://docs.python.org/3/library/time.html
### [3] Numpy module documentation: https://numpy.org/doc/
### [4] Sys module documentation: https://docs.python.org/3/library/sys.html
### [5] Os module documentation: https://docs.python.org/3/library/os.html
### [6] Tqdm module documentation: https://tqdm.github.io/
### [7] Matplotlib module documentation: https://matplotlib.org/stable/
### [8] Python image library documentation: https://pillow.readthedocs.io/en/stable/
### [9] Hyperopt module documentation: http://hyperopt.github.io/hyperopt/
### [10] Tkinter module documentation: https://docs.python.org/3/library/tk.html
### [11] Pickle module documentation: https://docs.python.org/3/library/pickle.html
### [12] Multiprocessing module documentation: https://docs.python.org/3/library/multiprocessing.html
### [13] Pytorch module documentation: https://pytorch.org/docs/stable/index.html
### [14] Itertools module documentation: https://docs.python.org/3/library/itertools.html
### [15] Defaultdict class documentation: https://www.kite.com/python/docs/collections.defaultdict
### [16] Random module documentation: https://docs.python.org/3/library/random.html
### [17] Linecache module documentation: https://docs.python.org/3/library/linecache.html
### [18] Pathlib module documentation: https://docs.python.org/3/library/pathlib.html
### [19] Importlib module documentation: https://docs.python.org/3/library/importlib.html
### [20] Ole Peters (2021): https://github.com/ollimacp/spacial-boxcounting-cpu-gpu
### [21] Kaiming He et al., (2015): Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]
### [22] Christian Szegedy et al., (2014): Going Deeper with Convolutions . arXiv:1409.4842 [cs.CV]
### [23] Raphael Gontijo Lopes et al., (2019): Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation. arXiv:1906.02611 [cs.LG]
### [24] Bing Xu et al., (2015): Empirical Evaluation of Rectified Activations in Convolutional Networks. arXiv:1505.00853v2 [cs.LG]
### [25] Santurkar, S., Tsipras, D., Ilyas, A., & Madry, A. (2018). How does batch normalization help optimization?. Advances in neural information processing systems, 31.
### [26] Boxcounting: https://en.wikipedia.org/wiki/Box_counting
### [27] Diederik et al., (2019): An Introduction to Variational Autoencoders. arXiv:1906.02691 [cs.LG]
### [28] Goodfellow et al., (2014): Generative Adversarial Networks. arXiv:1406.2661 [stat.ML]
### [29] Makhzani et al., (2016): Adversarial Autoencoders. arXiv:1511.05644 [cs.LG]
### [30] Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2), 99-127.
### [31] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958
### [32] Kingma and Welling, (2014): Auto-Encoding Variational Bayes. arXiv:1312.6114 [stat.ML], Chapter 2.4
### [33] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Learning Workshop (Snowbird), 2011.



