# Handling Class Imbalance in Image Classification

Handling class imbalance in image classification involves techniques to ensure that the model doesn't become biased toward the majority class. Here are common approaches:

**1. Data-Level Techniques**

   - **Oversampling the Minority Class**: Duplicate or augment images from the minority classes to increase their representation. Data augmentation techniques (e.g., rotation, cropping, flipping) can help create diverse samples for minority classes without introducing exact duplicates.
   - **Undersampling the Majority Class**: Randomly reduce the number of samples in majority classes to match the minority class size. This is more feasible with larger datasets, though it risks losing important information.

**2. Algorithm-Level Techniques**

   - **Class Weights Adjustment**: Many deep learning frameworks allow specifying a weight for each class in the loss function. This penalizes misclassifications of the minority class more than the majority class, encouraging the model to pay more attention to the minority class.
   - **Focal Loss**: Focal loss is designed for class imbalance by dynamically scaling the loss for hard-to-classify examples, typically from minority classes. It modifies the cross-entropy loss by adding a scaling factor that reduces the loss for well-classified examples and focuses on hard examples.

   $$ 
   \text{Focal Loss} = -\alpha (1 - p_t)^\gamma \log(p_t)
   $$

   where \( p_t \) is the predicted probability for the true class, \( \alpha \) is a balancing factor for class imbalance, and \( \gamma \) controls the focus on hard examples.

**3. Hybrid and Advanced Techniques**

   - **Two-Stage Training**: Train the model first on the original data, then fine-tune with balanced classes or using only the minority class. This approach helps retain information while enhancing sensitivity to minority classes.
   - **Synthetic Data Generation**: Use techniques like **Generative Adversarial Networks (GANs)** to generate synthetic images for the minority class. GANs can create realistic, diverse images that augment the dataset.
   - **Self-Supervised Learning**: In self-supervised learning, the model learns from unlabeled data, which can later be fine-tuned on a smaller, balanced labeled dataset, improving minority class recognition.

**4. Evaluation Adjustments**

   - **Metrics Beyond Accuracy**: Use metrics like precision, recall, F1-score, or area under the ROC curve (AUC) to get a more balanced view of performance on imbalanced data, as accuracy can be misleading with class imbalance.
   - **Confusion Matrix Analysis**: Reviewing the confusion matrix helps identify if the model is biased toward majority classes, guiding further balancing efforts.

Each technique can be combined depending on the severity of imbalance, dataset size, and model complexity, but balancing data effectively often requires experimenting with several methods.


# While both data augmentation and oversampling aim to improve model performance, they address different challenges in machine learning. Data augmentation enhances dataset diversity, whereas oversampling focuses on correcting class imbalance.

# Data Augmentation Techniques in Convolutional Neural Networks (CNNs)

Data augmentation is a crucial technique used to artificially expand the size of a training dataset by applying various transformations to the original data. This helps improve the generalization of CNNs and reduces overfitting. Here are some common data augmentation techniques:

**1. Geometric Transformations**
- **Rotation**: Rotate images by a certain angle.
  - Example: Rotate by 15, 30, or 45 degrees.
  
- **Translation**: Shift images along the x or y axis.
  - Example: Shift images by a few pixels left, right, up, or down.

- **Scaling**: Zoom in or out on images.
  - Example: Scale images to 90% or 110% of their original size.

- **Flipping**: Flip images horizontally or vertically.
  - Example: Horizontal flips are common for many tasks.

**2. Color Space Transformations**
- **Brightness Adjustment**: Change the brightness of images.
  - Example: Increase or decrease brightness by a fixed factor.

- **Contrast Adjustment**: Modify the contrast of images.
  - Example: Enhance or reduce the contrast of images.

- **Saturation Adjustment**: Alter the saturation levels of images.
  - Example: Make images more or less colorful.

- **Hue Adjustment**: Shift the hue of colors in images.
  - Example: Change colors to see how the model reacts to different color variations.

**3. Noise Injection**
- **Gaussian Noise**: Add random noise to images to make them more robust.
  - Example: Add small Gaussian noise to pixel values.

- **Salt-and-Pepper Noise**: Introduce random white and black pixels.
  - Example: Randomly set a percentage of pixels to maximum or minimum values.

**4. Random Erasing**
- **Random Erasing**: Randomly remove sections of an image to make the model learn to focus on different features.
  - Example: Select a random rectangle in the image and set it to a constant value or noise.

**5. Elastic Transformations**
- **Elastic Deformations**: Apply random elastic deformations to images.
  - Example: Distort images to create variations while preserving overall structure.

**6. Cutout**
- **Cutout**: Randomly mask out square regions in images.
  - Example: Set square patches in an image to zero or the mean pixel value.

**7. Mixup**
- **Mixup**: Create new training examples by mixing two images and their corresponding labels.
  - Example: For images A and B with labels \(y_A\) and \(y_B\), create a new image 
  $$
  \text{Image}_{new} = \lambda \cdot \text{Image}_A + (1 - \lambda) \cdot \text{Image}_B
  $$ 
  where \( \lambda \) is a random value between 0 and 1.

**8. Random Cropping**
- **Random Cropping**: Randomly crop images to create variations in scale and aspect ratio.
  - Example: Crop a random section of the original image.

**Conclusion**
Data augmentation helps increase the diversity of the training dataset, making CNNs more robust and improving their performance on unseen data. Many deep learning frameworks (like TensorFlow and PyTorch) provide built-in support for these augmentation techniques.


## One Shot Learning

**One Shot Learning** is a machine learning approach that enables a recognition system to identify or classify objects based on a single example or image. This is particularly challenging in face recognition, where traditionally, deep learning models require large datasets to achieve good performance.

**Definition**
- **One Shot Learning**: A recognition system can recognize a person by learning from just one image.

**Challenges**
Historically, deep learning has not performed well when the amount of training data is small. One Shot Learning addresses this challenge by learning a **similarity function** rather than traditional classification.

**Similarity Function**
To evaluate the similarity between two images, we define a function \( d \):
$$
d(\text{img1}, \text{img2}) = \text{degree of difference between img1 and img2}
$$
Where:
- **img1** and **img2** are the images being compared.
- **d** outputs a value representing how similar or different the images are.

**Key Points:**
- A lower value of \( d \) indicates that the images are likely of the same person (i.e., faces are similar).
- We introduce a threshold \( T \) to make a decision:
$$
\text{If } d(\text{img1}, \text{img2}) \leq T \text{, then the faces are considered the same.}
$$

**Advantages of One Shot Learning**
- **Efficiency**: It allows for effective recognition with minimal training data, which is crucial in scenarios where data collection is limited.
- **Robustness**: The similarity function can generalize well to new inputs, making it adaptable to various situations.

**Conclusion**
One Shot Learning provides a solution to the challenge of recognizing individuals from very limited data. By focusing on learning a similarity function, it allows for effective face recognition even with just a single example image.


# Triplet Loss

Triplet loss is a loss function commonly used in deep learning, particularly in tasks involving similarity learning, such as face recognition and image retrieval. It aims to ensure that the distance between an anchor sample and a positive sample (similar) is smaller than the distance between the anchor sample and a negative sample (dissimilar) by a predefined margin. 

**Definition**
- Given three inputs: an anchor $x_a$, a positive sample $x_p$ (similar to the anchor), and a negative sample $x_n$ (dissimilar to the anchor), the triplet loss can be defined as:

$$
L(x_a, x_p, x_n) = \max(0, d(x_a, x_p) - d(x_a, x_n) + \alpha)
$$

where:
-  $d(x_i, x_j)$ is a distance metric (e.g., Euclidean distance) between samples $x_i$ and $x_j$,
-  $\alpha$ is the margin that is enforced between positive and negative pairs.

**Importance for CNNs**
- **Learning Discriminative Features**: Triplet loss helps CNNs learn embeddings that are well-separated for different classes while bringing similar classes closer together in the feature space. This is particularly useful in applications where distinguishing between classes is challenging.
- **Robustness to Variations**: It provides a robust mechanism for the model to learn invariant features despite variations in pose, lighting, or other conditions, making it suitable for real-world applications.

**Applications of Triplet Loss**
1. **Face Recognition**: In face recognition systems, triplet loss can be used to ensure that images of the same person are close in the embedding space, while images of different people are far apart.
2. **Image Retrieval**: For systems that retrieve images based on similarity, triplet loss helps improve the ranking of images based on user queries.
3. **Object Tracking**: In object tracking, triplet loss can help to distinguish the target object from background clutter or other objects.
4. **Speaker Verification**: In audio processing, triplet loss can be applied to ensure that recordings of the same speaker are closer together than recordings from different speakers.

By applying triplet loss in CNNs, models can achieve higher accuracy and robustness in distinguishing between classes based on learned embeddings.



### EDA Questions for CV (Image Data)

1. What are the common dimensions of the images (width, height)?
2. How does the aspect ratio vary across the dataset?
3. What is the color distribution across images?
4. Are there differences in brightness or contrast among the images?
5. What is the edge distribution in the images (sharp vs. smooth regions)?
6. What common textures or patterns are present in different image categories?
7. Is there a class imbalance in the number of images per category?
8. What are the most common objects detected in the images?
9. Are there patterns in metadata, such as capture date, location, or resolution?
10. Do images have similarities in background, lighting, or occlusion within classes? 
