## When Should You Use Data Augmentation?   

1. To prevent models from overfitting.
2. The initial training set is too small.
3. To improve the model accuracy.
4. To Reduce the operational cost of labeling and cleaning the raw dataset. 


### Limitations of Data Augmentation

1. The biases in the original dataset persist in the augmented data.
2. Quality assurance for data augmentation is expensive. 
3. Research and development are required to build a system with advanced applications. For example, generating high-resolution images using GANs can be challenging.
4. Finding an effective data augmentation approach can be challenging. 

# Ejemplos

## Text Data Augmentation 
1. Word or sentence shuffling: randomly changing the position of a word or sentence. 
2. Word replacement: replace words with synonyms.
3. Syntax-tree manipulation: paraphrase the sentence using the same word.
4. Random word insertion: inserts words at random. 
5. Random word deletion: deletes words at random. 


## Image Augmentation 
1. Geometric transformations: randomly flip, crop, rotate, stretch, and zoom images. You need to be careful about applying multiple transformations on the same images, as this can reduce model performance. 
2. Color space transformations: randomly change RGB color channels, contrast, and brightness.
3. Kernel filters: randomly change the sharpness or blurring of the image. 
4. Random erasing: delete some part of the initial image.
5. Mixing images: blending and mixing multiple images. 

## Audio Data Augmentation 
1. Noise injection: add gaussian or random noise to the audio dataset to improve the model performance. 
2. Shifting: shift audio left (fast forward) or right with random seconds.
3. Changing the speed: stretches times series by a fixed rate.
4. Changing the pitch: randomly change the pitch of the audio. 

## Advanced Techniques 
1. Generative adversarial networks (GANs): used to generate new data points or images. It does not require existing data to generate synthetic data. 

# Aplicaciones

1. **Healthcare**: For example, in the case of Pneumonia Classification, you can use random cropping, zooming, stretching, and color space transformation to improve the model performance. However, you need to be careful about certain augmentations as they can result in opposite results. For example, random rotation and reflection along the x-axis are not recommended for the X-ray imaging dataset. 

2. **Natural Language Processing**: text data augmentation is generally used in situations with limited quality data, and improving the performance metric takes priority. You can apply synonym augmentation, word embedding, character swap, and random insertion and deletion. These techniques are also valuable for low-resource languages.

