<div style="text-align: center;">
    <a href="https://www.dataia.eu/">
        <img border="0" src="https://github.com/ramp-kits/template-kit/raw/main/img/DATAIA-h.png" width="90%"></a>
</div>

# Template Kit for RAMP challenge

<i> Thomas Moreau (Inria) </i>

## Introduction

Describe the challenge, in particular:

- Where the data comes from?
- What is the task this challenge aims to solve?
- Why does it matter?

# Exploratory data analysis

The goal of this section is to show what's in the data, and how to play with it.
This is the first set in any data science project, and here, you should give a sense of the data the participants will be working with.

You can first load and describe the data, and then show some interesting properties of it.

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load the data

import problem
X, y = problem.get_train_data()

  from .autonotebook import tqdm as notebook_tqdm
!!!!!!!!!!!!megablocks not available, using torch.matmul instead
<All keys matched successfully>


In [2]:
X_test, y_test = problem.get_test_data()

<All keys matched successfully>


In [3]:
#convert to tensors
import torch
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

In [4]:
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset

dataset = TensorDataset(X, y)

# Create DataLoader
batch_size = 64
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)


In [5]:
from submissions.starting_kit import CondImageGenerator
model = CondImageGenerator.ConditionalVAE(input_channels=3, hidden_dim_enc=128, hidden_dim_dec=128,
                 latent_dim=128, n_layers_enc=4, n_layers_dec=4,
                 condition_dim=768, image_size=128, cond_new_dim=768, device='cuda')
model.fit(dataloader)

Epoch 1/1: 100%|██████████| 175/175 [00:10<00:00, 16.12it/s]

Loss: 0.2406





In [6]:
dataset_test = TensorDataset(X_test, y_test)

# Create DataLoader
batch_size = 64
dataloader_test = DataLoader(dataset_test, batch_size=batch_size, shuffle=False)

In [7]:
result = model.predict(dataloader_test)

100%|██████████| 26/26 [00:00<00:00, 63.04it/s]


In [10]:
from ramp_custom.clip_score import CLIPScore
score_type = CLIPScore()
score = score_type(X_test, result)
print(score)

284.28366


In [None]:
!ramp-test --submission submissions/starting_kit --quick-test

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\Akshita Kumar\capstone\Scripts\ramp-test.exe\__main__.py", line 7, in <module>
  File "C:\Users\Akshita Kumar\capstone\Lib\site-packages\rampwf\utils\cli\testing.py", line 117, in start
    main()
  File "C:\Users\Akshita Kumar\capstone\Lib\site-packages\click\core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Akshita Kumar\capstone\Lib\site-packages\click\core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\Akshita Kumar\capstone\Lib\site-packages\click\core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Akshita Kumar\capstone\Lib\site-packages\click\core.py", line 788, in invoke
    return __callback(*args, **kwargs

# Challenge evaluation

A particularly important point in a challenge is to describe how it is evaluated. This is the section where you should describe the metric that will be used to evaluate the participants' submissions, as well as your evaluation strategy, in particular if there is some complexity in the way the data should be split to ensure valid results.

In [None]:
import numpy as np
from numpy.linalg import norm
from rampwf.score_types.base import BaseScoreType

class CLIPScore(BaseScoreType):
    """
    CLIP Score for text-conditioned image generation.
    
    This metric computes the cosine similarity between the image and text embeddings.
    A higher similarity indicates a better match.
    """
    is_lower_the_better = False  # Higher scores are better
    minimum = -1.0  # Cosine similarity ranges from -1 to 1
    maximum = 1.0
    
    def __init__(self, name='clip score', precision=2):
        self.name = name
        self.precision = precision

    def __call__(self, text_embeddings, image_embeddings):
        """
        Compute the CLIP score between text embeddings and image embeddings.

        Parameters
        ----------
        text_embeddings : numpy array
            Text embeddings extracted from CLIP. Shape: (n_samples, embedding_dim).
        image_embeddings : numpy array
            Image embeddings extracted from CLIP. Shape: (n_samples, embedding_dim).

        Returns
        -------
        float
            The averaged CLIP score (cosine similarity between text and image embeddings).
        """
        #if text_embeddings.shape != image_embeddings.shape:
         #   raise ValueError("Text and image embeddings must have the same shape.")
         # Ensure text and image embeddings have the same shape
        n_samples, channels, height, width = image_embeddings.shape
        image_embeddings_flat = image_embeddings.reshape(n_samples, -1)  # Flatten to (n_samples, channels * height * width)

        # Normalize both embeddings to unit vectors
        text_norm = text_embeddings / (norm(text_embeddings, axis=1, keepdims=True) + 1e-8)
        text_norm = text_norm.numpy()
        image_norm = image_embeddings_flat / (norm(image_embeddings_flat, axis=1, keepdims=True) + 1e-8)
        image_norm = image_norm

        # Compute pairwise cosine similarity
        cos_sim = np.sum(text_norm * image_norm, axis=1)

        # Average over all samples
        clip_score = np.mean(cos_sim)
        print(f"CLIP score: {clip_score:.{self.precision}f}")
        return clip_score

AttributeError: module 'clip' has no attribute 'load'

In [7]:
dataset_test = TensorDataset(X_test, y_test)

# Create DataLoader
batch_size = 64
dataloader_test = DataLoader(dataset_test, batch_size=batch_size, shuffle=False)

In [8]:
result = model.predict(dataloader_test)

100%|██████████| 26/26 [00:00<00:00, 33.23it/s]


In [75]:
score_type = CLIPScore()
score = score_type(X_test, result)
print(score)

  text_norm = text_embeddings / (norm(text_embeddings, axis=1, keepdims=True) + 1e-8)


ValueError: operands could not be broadcast together with shapes (1655,768) (1655,49152) 

In [34]:
result.shape

(1655, 3, 128, 128)

In [36]:
X_test.shape

torch.Size([1655, 768])

In [42]:
score_type = CLIPScore()
score = score_type(X_test, result)
print(score)

  text_norm = text_embeddings / (norm(text_embeddings, axis=1, keepdims=True) + 1e-8)


ValueError: operands could not be broadcast together with shapes (1655,768) (1655,3,128,128) 

In [87]:
import numpy as np
from scipy.linalg import sqrtm
from rampwf.score_types.base import BaseScoreType

class FID(BaseScoreType):
    """
    Fréchet Inception Distance (FID) for image generation.
    
    This metric computes the difference between the statistics of the
    generated images and the real images. Lower values indicate that the
    generated images are closer to the real ones.
    
    Note: This implementation flattens images and computes covariances
    directly. For more robust evaluations, it is common to extract features
    using a pre-trained network such as Inception.
    """
    is_lower_the_better = True
    minimum = 0.0
    maximum = float('inf')
    
    def __init__(self, name='FID', precision=2):
        self.name = name
        self.precision = precision

    def __call__(self, y_true, y_pred):
        """
        Compute the FID between ground truth images and generated images.
        
        Parameters
        ----------
        y_true : numpy array
            Ground truth images. Shape: (n_samples, height, width, channels).
        y_pred : numpy array
            Generated images. Shape: (n_samples, height, width, channels).
        
        Returns
        -------
        float
            The computed FID score.
        """
        # Flatten images to vectors
        y_true_flat = y_true.reshape(y_true.shape[0], -1)
        y_pred_flat = y_pred.reshape(y_pred.shape[0], -1)
        
        # Compute mean vectors
        mu_true = np.mean(y_true_flat, axis=0)
        mu_pred = np.mean(y_pred_flat, axis=0)
        
        # Compute covariance matrices
        sigma_true = np.cov(y_true_flat, rowvar=False)
        sigma_pred = np.cov(y_pred_flat, rowvar=False)
        
        # Compute squared difference between means
        diff = mu_true - mu_pred
        diff_squared = np.sum(diff**2)
        
        # Compute the square root of the product of covariance matrices
        covmean = sqrtm(sigma_true.dot(sigma_pred))
        if np.iscomplexobj(covmean):
            covmean = covmean.real
        
        fid = diff_squared + np.trace(sigma_true + sigma_pred - 2 * covmean)
        return fid


In [89]:
result.shape

(1655, 3, 128, 128)

In [None]:
FID_score = FID()
FID_score(y_test.numpy(), result)

# Submission format

Here, you should describe the submission format. This is the format the participants should follow to submit their predictions on the RAMP plateform.

This section also show how to use the `ramp-workflow` library to test the submission locally.

## The pipeline workflow

The input data are stored in a dataframe. To go from a dataframe to a numpy array we will use a scikit-learn column transformer. The first example we will write will just consist in selecting a subset of columns we want to work with.

In [2]:
# %load submissions/starting_kit/estimator.py

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


def get_estimator():
    pipe = make_pipeline(
        StandardScaler(),
        LogisticRegression()
    )

    return pipe


## Testing using a scikit-learn pipeline

In [None]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(get_estimator(), X_df, y, cv=5, scoring='accuracy')
print(scores)

[0.97222222 0.96527778 0.97212544 0.95121951 0.96167247]


## Submission

To submit your code, you can refer to the [online documentation](https://paris-saclay-cds.github.io/ramp-docs/ramp-workflow/stable/using_kits.html).