#**Generative Adversarial Networks (GANs)**

Generative Adversarial Networks (GANs) are a class of artificial intelligence algorithms introduced by Ian Goodfellow and his colleagues in 2014. GANs consist of two neural networks, the generator and the discriminator, which are trained simultaneously through an adversarial process. The main idea behind GANs is to generate new data samples that are similar to a given dataset by training the generator to create realistic samples and the discriminator to distinguish between real and generated samples.

1. **Generator:**

The generator is responsible for creating new data samples. It takes random noise as input and transforms it into data that ideally resembles the real data distribution.
The architecture of the generator typically involves one or more layers of a neural network, commonly using convolutional or transpose convolutional layers.
The generator aims to generate data points that are indistinguishable from real data by the discriminator.

2. **Discriminator:**

The discriminator evaluates input samples and classifies them as either real or generated. Its purpose is to distinguish between real data from the dataset and fake data produced by the generator.
The architecture of the discriminator is similar to a binary classifier, typically using convolutional layers in the context of image data.

The discriminator aims to correctly classify real and generated samples, pushing the generator to improve its output.

![](https://www.researchgate.net/profile/Christine-Dewi-2/publication/337787422/figure/fig2/AS:873238225764352@1585207625152/The-GAN-system-architecture-6.jpg)

##**Adversarial Training Process:**

1. **Initialization:**

*  The generator and discriminator are initialized with random weights.
2. **Training Iterations:**

*  During each iteration, the generator creates fake samples, and the discriminator evaluates both real and fake samples.
*  The discriminator is trained to correctly classify real and fake samples, while the generator is trained to produce samples that can fool the discriminator.
*  The back-and-forth process of training the generator and discriminator is repeated iteratively.
3. **Feedback Loop:**

*  As the training progresses, the generator becomes better at creating realistic samples, and the discriminator becomes more accurate in distinguishing between real and generated samples.
*  This creates a feedback loop where the generator continually improves its ability to generate realistic data, and the discriminator becomes more adept at differentiating real from fake.

###**Loss Functions:**

1. **Generator Loss:**

*  The generator aims to minimize the likelihood of the discriminator correctly classifying generated samples as fake. The generator loss is often defined as the log probability of the discriminator making a mistake.
2. **Discriminator Loss:**

*  The discriminator loss is the sum of the losses on real and generated samples. It aims to correctly classify real samples as real and generated samples as fake.

---

#**Deep Convolutional GAN (DCGAN)**

**Deep Convolutional GAN (DCGAN)** was proposed by a researcher from MIT and Facebook AI research. It is widely used in many convolution-based generation-based techniques. The focus of this paper was to make training GANs stable. Hence, they proposed some architectural changes in the computer vision problems. In this article, we will be using DCGAN on the fashion MNIST dataset to generate images related to clothes.

**Need for DCGANs:**

DCGANs are introduced to reduce the problem of mode collapse. Mode collapse occurs when the generator got biased towards a few outputs and can’t able to produce outputs of every variation from the dataset. For example- take the case of mnist digits dataset (digits from 0 to 9) , we want the generator should generate all type of digits but sometimes our generator got biased towards two to three digits and produce them only. Because of that the discriminator also got optimized towards that particular digits only, and this state is known as mode collapse. But this problem can be overcome by using DCGANs.

**Architecture:**

![picture](https://jhui.github.io/assets/gm/gm.png)

The generator of the DCGAN architecture takes 100 uniform generated values using normal distribution as an input. First, it changes the dimension to 4x4x1024 and performed a fractionally stridden convolution 4 times with a stride of 1/2 (this means every time when applied, it doubles the image dimension while reducing the number of output channels). The generated output has dimensions of (64, 64, 3). There are some architectural changes proposed in the generator such as the removal of all fully connected layers, and the use of Batch Normalization which helps in stabilizing training.  In this paper, the authors use ReLU activation function in all layers of the generator, except for the output layers. We will be implementing generator with similar guidelines but not completely the same architecture.

The role of the discriminator here is to determine that the image comes from either a real dataset or a generator. The discriminator can be simply designed similar to a convolution neural network that performs an image classification task. However, the authors of this paper suggested some changes in the discriminator architecture. Instead of fully connected layers, they used only strided-convolutions with LeakyReLU as an activation function, the input of the generator is a single image from the dataset or generated image and the output is a score that determines whether the image is real or generated.


#**Wasserstein Generative Adversarial Networks (WGANs)**

WGAN’s architecture uses deep neural networks for both generator and discriminator. The key difference between GANs and WGANs is the loss function and the gradient penalty. WGANs were introduced as the solution to mode collapse issues. The network uses the Wasserstein distance, which provides a meaningful and smoother measure of distance between distributions.

**Wasserstein GAN Algorithm**

The algorithm is stated as follows:

![](https://media.geeksforgeeks.org/wp-content/uploads/20231214151915/Screenshot-from-2023-12-14-15-19-01.png)

* The function f solves the maximization problem given by the Kantorovich-Rubinstein duality. To approximate it, a neural network is trained parametrized with weights w lying in a compact space W and then backprop as a typical GAN.
* To have parameters w lie in a compact space, we clamp the weights to a fixed box. Weight clipping is although terrible, yields good results when experimenting. It is simpler and hence implemented. EM distance is continuous and differentiable allows to train the critic till optimality.
* The JS gradient is stuck at local minima but the constrain of weight limits allows the possible growth of the function to be linear in most parts and get optimal critic.
* Since the optimal generator for a fixed discriminator is a sum of deltas on the places the discriminator assigns the greatest values to, we train the critic until optimality prevents modes from collapsing.
* It is obvious that the loss function at this stage is an estimate of the EM distance, as the critic f in the for loop lines indicates, prior to each generator update. Thus, it makes it possible for GAN literature to correlate based on the generated samples’ visual quality.
* This makes it very convenient to identify failure modes and learn which models perform better than others without having to look at the generated samples.


**Benefits of WGAN algorithm over GAN**

* WGAN is more stable due to the Wasserstein Distance which is continuous and differentiable everywhere allowing to perform gradient descent.
* It allows to train the critic till optimality.
* There is still no evidence of model collapse.
* Not struck in local minima in gradient descent.
* WGANs provide more flexibility in the choice of network architectures. The weight clipping, generators architectures can be changed according to choose.

---

#**Applications of GANs**

Image editing using pre-trained GAN models, data augmentation, and image inpainting are powerful techniques that leverage the capabilities of deep learning for various tasks in computer vision. Let's explore each of these concepts:

1. **Image Editing with Style Transfer:**

  * Style Transfer with GANs: Style transfer involves combining the content of one image with the artistic style of another. GANs, especially those designed for style transfer (e.g., ArtGANs), can be used to achieve visually appealing results. These models learn to separate and manipulate content and style features in images.

  * Implementation: You can use pre-trained GAN models for style transfer or train your own models. Popular architectures include CycleGAN and AdaIN (Adaptive Instance Normalization) based models. These models can be fine-tuned or applied directly for various style transfer applications.

2. **Data Augmentation for Increasing Dataset Size and Diversity:**

  * Purpose: Data augmentation is crucial for training robust and generalizable deep learning models. It involves applying various transformations to the existing dataset to increase its size and diversity, helping the model to learn invariant features.

  * Techniques: Common data augmentation techniques include rotation, flipping, scaling, translation, and changes in brightness and contrast. Augmenting the dataset can prevent overfitting and enhance the model's ability to handle variations in input data.

  * Implementation: Popular deep learning frameworks, such as TensorFlow and PyTorch, provide libraries for integrating data augmentation seamlessly into the training pipeline.

3. **Image Inpainting for Filling in Missing Image Parts:**

  * Purpose: Image inpainting is used to fill in missing or damaged parts of an image. Deep learning models, including GANs, have shown remarkable results in generating realistic content to complete the missing regions.

  * Techniques: GANs designed for image inpainting typically consist of a generator that fills in missing parts and a discriminator that evaluates the realism of the completed image. Contextual information surrounding the missing regions guides the generation process.

  * Implementation: Pre-trained models like DeepFill v1 and v2, or GANs trained specifically for inpainting, can be employed for various inpainting tasks. These models can be fine-tuned on specific datasets if necessary.

**Considerations:**

  * Training and Fine-Tuning: Fine-tuning pre-trained models on specific tasks or datasets may be necessary for optimal performance in image editing tasks.

  * Hardware Requirements: Image editing with GANs, especially high-resolution tasks, may require significant computational resources. GPU acceleration is often recommended for efficient training and inference.

  * Evaluation: Assessing the quality of generated images and the performance of edited images is crucial. Metrics such as perceptual similarity and visual inspection are commonly used for evaluation.

These techniques collectively offer a powerful suite of tools for enhancing and manipulating images using deep learning, providing solutions for creative tasks and addressing challenges such as limited data and missing information.