## Categorical Focal Loss
In this assignment we will implement a categorical focal loss function with "L1" and "L2" regularization for multi-class classification problems.\
Focal Loss have several applications in problems which have inbalance datasets such as Object Detection:
you can learn more about this loss function here:
https://medium.com/swlh/focal-loss-what-why-and-how-df6735f26616

In [6]:
import numpy
import tensorflow as tf
import tensorflow_datasets as tfds
print("NumPy version:", numpy.__version__)
print("TensorFlow version:", tf.__version__)
print("Available GPUs:", tf.config.list_physical_devices('GPU'))


NumPy version: 2.1.3
TensorFlow version: 2.19.0
Available GPUs: []


***

The article you sent is about **Focal Loss**, a well-known loss function designed for **imbalanced classification problems**, especially in fields like **object detection** (such as the RetinaNet model). Let's break it down simply and precisely:

***

## ✅ Main Issue: Class Imbalance

In many problems (like cancer detection or object recognition in images), positive class samples are much fewer than negative class samples. For example:

- 99% of the data is “non-cancer”
- 1% of the data is “cancer”

A model that always predicts “non-cancer” will still have 99% accuracy—but is completely useless in practice.

***

## ❗ What’s the problem with Binary Cross-Entropy?

The BCE loss function treats easy samples (which the model predicts correctly) and hard samples (which the model predicts incorrectly) equally.  
In an imbalanced dataset, there are many easy samples → the model just learns to predict them → the minority class gets ignored.

***

## ✅ Solution: Focal Loss

### 📌 Focal Loss Formula (binary):

For a single sample:

$$
FL(p_t) = - \alpha_t \cdot (1 - p_t)^\gamma \cdot \log(p_t)
$$

#### Parameter Explanations:

- $$ p_t $$: The predicted probability for the true class (if y=1 → p, if y=0 → 1−p)
- $$ \alpha_t $$: Weight for each class (to address class imbalance)
- $$ \gamma $$: **Focusing parameter** – controls how much the loss focuses on hard samples

***

## ✅ Role of γ (gamma)

- If $$ \gamma = 0 $$, focal loss is just ordinary cross-entropy.
- The larger $$ \gamma $$ is, the less the loss is affected by **easy samples** (those the model already predicts well).
- This forces the model to focus on **hard samples**.

***

## ✅ Role of α (alpha)

- A number between 0 and 1 that helps emphasize the minority class.
- For example, if class 1 is the minority, you might set $$ \alpha = 0.75 $$ for class 1 and $$ 1 - \alpha = 0.25 $$ for class 0.

***

## ✳️ Simple Example

Suppose the model predicts 0.95 probability for the correct class (i.e., it’s an easy example):

- With **cross-entropy**, you get a low loss, but it’s still counted equally
- With focal loss and $$ \gamma = 2 $$, the loss is much smaller—so the model pays less attention to this easy sample

***

## 📈 Common Applications of Focal Loss:

- Object Detection (RetinaNet)
- Imbalanced Binary Classification (e.g., fraud detection, cancer classification)
- Face verification, anomaly detection

***

## ✅ Simple Summary

| Case    | Explanation                                                                    |
|---------|--------------------------------------------------------------------------------|
| Issue   | Data imbalance makes the model focus on learning easy/majority samples          |
| Solution| Focal loss **down-weights** easy samples                                       |
| γ (gamma) | Controls the focus on hard samples                                           |
| α (alpha) | Balances classes in case of imbalance                                        |

***



Let me rewrite the Focal Loss formula and the associated loss terms in a more logical and readable format, with each equation on a new line as requested.

Focal Loss Formula:

$$
FL(y_{true}, y_{pred}) = - \alpha * y_{true} * (1 - y_{pred})^γ * log(y_{pred})
$$

$$
l1(y_{true}, y_{pred}) = \sum |y_{pred}|
$$

$$
l2(y_{true}, y_{pred}) = \sum (y_{pred})^2
$$

$$
total.loss = FL + l1_w * l1 + l2_w * l2
$$

This format places each equation on a new line after the equals sign, making it easier to follow.

In [7]:
import tensorflow as tf

class CategoricalFocalLoss(tf.keras.losses.Loss):
    def __init__(self, alpha=0.25, gamma=2.0, l1=0.0, l2=0.0, **kwargs):
        """
        پارامترها:
         - alpha: ضریب تعادل کلاس‌ها (معمولاً برای کلاس‌های نامتوازن)
         - gamma: ضریب تمرکز روی نمونه‌های سخت (focusing parameter)
         - l1: ضریب منظم‌سازی L1 روی پیش‌بینی‌ها (y_pred)
         - l2: ضریب منظم‌سازی L2 روی پیش‌بینی‌ها (y_pred)
        """
        super(CategoricalFocalLoss, self).__init__(**kwargs)
        self.alpha = alpha          # وزن‌دهی به کلاس‌ها
        self.gamma = gamma          # تمرکز روی نمونه‌های سخت
        self.l1_weight = l1         # ضریب L1
        self.l2_weight = l2         # ضریب L2

        def call (self, y_true, y_pred):
            y_pred = tf.clip_by_value(y_pred, clip_value_min =1e-7, clip_value_max =1 - 1e-7 )
            focal = -self.alpha * (y_true)* tf.pow((1 - y_pred), self.gamma) * tf.math.log(y_pred)
            focal_class = rf.reduce_sum(focal,axis =1)
            focal_class_batch = tf.reduce_mean(focal_class)
            
            l1 = tf.math.reduce_sum(tf.math.abs(y_pred))
            l2 = tf.math.reduce_sum(tf.math.square(y_pred))
        return focal_class_batch+ l1*self.l1_weight + l2* self.l2_weight

In [None]:
class CategoricalFocalLoss(tf.keras.losses.Loss):
    def __init__(self, alpha=0.25, gamma=2.0, l1=0.0, l2=0.0, **kwargs):
        """
        Parameters:
         - alpha: Class balancing factor (usually for imbalanced classes)
         - gamma: Focusing parameter for hard samples
         - l1: L1 regularization weight on predictions (y_pred)
         - l2: L2 regularization weight on predictions (y_pred)
        """
        super(CategoricalFocalLoss, self).__init__(**kwargs)
        self.alpha = alpha           # Class weighting
        self.gamma = gamma           # Focus on hard samples
        self.l1_weight = l1          # L1 coefficient
        self.l2_weight = l2          # L2 coefficient

    def call(self, y_true, y_pred):
        """
        Computes the final loss, including:
          1. Multiclass focal loss
          2. L1 regularization on y_pred
          3. L2 regularization on y_pred
        """
        # 1. Prevent log(0)
        #    To avoid log(0) and prevent NaNs
        y_pred = tf.clip_by_value(y_pred, 1e-7, 1.0 - 1e-7)

        # 2. Compute focal loss term:
        #    -y_true * (1 - y_pred)^gamma * log(y_pred)
        #    Then multiply by alpha and sum across classes
        #    Shape: [batch_size, num_classes]
        focal_term = - self.alpha * y_true * tf.pow(1.0 - y_pred, self.gamma) * tf.math.log(y_pred)
        # Sum across the class axis → vector of [batch_size]
        focal_loss_per_sample = tf.reduce_sum(focal_term, axis=1)
        # Mean over the whole batch
        focal_loss = tf.reduce_mean(focal_loss_per_sample)

        # 3. L1 regularization:
        #    Sum of absolute values of y_pred over entire batch and all classes
        #    L1 = sum(|y_pred|)
        l1_term = tf.reduce_sum(tf.abs(y_pred))

        # 4. L2 regularization:
        #    Sum of squares of y_pred over entire batch and all classes
        #    L2 = sum((y_pred)^2)
        l2_term = tf.reduce_sum(tf.square(y_pred))

        # 5. Combine all components:
        #    total_loss = focal_loss + l1_weight * L1 + l2_weight * L2
        total_loss = (
            focal_loss
            + self.l1_weight * l1_term
            + self.l2_weight * l2_term
        )

        return total_loss


In [9]:

def build_model(dense_units, input_shape=(224, 224) + (3,)):
  model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=input_shape),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(dense_units, activation='relu'),
      tf.keras.layers.Dense(2, activation='softmax')
  ])
  return model

In [None]:
import tensorflow_datasets as tfds
import os

# 1) Force fresh download
tfds.core.utils.gcs_utils._is_gcs_disabled = True
data_dir = os.path.normpath('C:/Users/AERO/tensorflow_datasets')
dataset = tfds.load('cats_vs_dogs', split='train', data_dir=data_dir, download=True)


# 2) Preprocessing
def preprocess(features):
    img = tf.image.resize(features['image'], (224, 224))
    img = tf.cast(img, tf.float32) / 255.
    label = tf.one_hot(features['label'], 2)
    return img, tf.cast(label, tf.float32)

dataset = dataset.map(preprocess, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.shuffle(1000).batch(32).cache().prefetch(tf.data.AUTOTUNE)

# 3) Build, compile, fit
model = build_model(dense_units=256)  
model.compile(
    optimizer='adam',
    loss=CategoricalFocalLoss(),
    metrics=['accuracy']
)

model.fit(dataset, epochs=10)


In [3]:
import tensorflow as tf
print("TensorFlow Version:", tf.__version__)
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print("TensorFlow is using GPU:", tf.test.is_gpu_available())

TensorFlow Version: 2.10.0
Num GPUs Available: 1
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
TensorFlow is using GPU: True


In [1]:
import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print("✅ GPU detected:", gpus)
else:
    print("❌ No GPU found")


✅ GPU detected: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [2]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("Available GPUs:", tf.config.list_physical_devices('GPU'))


TensorFlow version: 2.10.0
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
