# ECHO: Quickstart 

This is the IPYNB notebook to walk you through loading **ECHO**.

## Loading the image_to_image split:
### 1) Download images
First, run the helper script to fetch images from Twitter/X. This will save files under `data/image_to_image/`:

```bash
python download_script.py
```

**IMPORTANT**: The downloaded images are subject to Twitter/X’s Terms & Policies. You are responsible for ensuring your own compliance and for any actions you take with this data.

### 2) A note on content difference

Since images are fetched from the public web, individual files may change or disappear over time. To make experiments reproducible, we released CLIP embeddings for the reference images. You can verify your local copies against the references with:

```bash
python verify_images.py --folder "data/image_to_image" --embeddings "reference_embeddings.json"
```

This will print a summary of the verification/checking results.

If you have problems downloading the data, please **contact us** at [echo-bench@googlegroups.com](mailto:echo-bench@googlegroups.com).

In [1]:
from torchvision import transforms
from torch.utils.data import DataLoader
from echo_bench_i2i import EchoBenchHFImageToImage, echo_bench_i2i_collate
from datasets import load_dataset

# load the image-to-image split
ds = EchoBenchHFImageToImage(
    repo_id="echo-bench/echo2025",
    name="image_to_image",
    split="test",
)

print(len(ds), ds[0]["id"], len(ds[0]["input_images"]))

loader = DataLoader(ds, batch_size=4, shuffle=False, collate_fn=echo_bench_i2i_collate)
batch = next(iter(loader))
print(batch["id"], len(batch["input_images"][0]))


710 ivxXSVAkagWfT6HvE3F6aR 1
['ivxXSVAkagWfT6HvE3F6aR', '6aW6H3s3HEuBP6MB55FoH4', 'HEx2VuyZ4brzkAo8LxNUBm', '5y8E8Jwez26MRjyngqJcc2'] 1


## Loading the text_to_image split:
This can be directly accessed by loading the dataset from huggingface

In [2]:
# load the text-to-image split
ds_text_to_image = load_dataset(
    "echo-bench/echo2025",
    name="text_to_image",
    split="test",
)
ds_text_to_image


Generating test split:   0%|          | 0/848 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'prompt', 'prompt_modified', 'prompt_fill_blank', 'input_images', 'output_images', 'community_feedback', 'has_nsfw'],
    num_rows: 848
})

## Loading the analysis split:
This can be directly accessed by loading the dataset from huggingface

In [5]:
# load the analysis split
ds_analysis = load_dataset(
    "echo-bench/echo2025",
    name="analysis",
    split="test",
)
ds_analysis

Generating test split:   0%|          | 0/29336 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'prompt', 'prompt_modified', 'quality', 'community_feedback'],
    num_rows: 29336
})