# Exploring Image and Text-to-Image Embeddings

<a target="_blank" href="https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/workshop_resources/ws4-embeddings/multimodal_on_radio.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

If something doesn't work, you can [report a problem](https://github.com/impresso/impresso-datalab-notebooks/blob/main/reporting-problems.md).

## What is this notebook about?

This notebook demonstrates how to explore historical image collections using Impresso’s text-to-image and image-only embeddings, from keyword search to visual similarity retrieval.

In the **first section**, we begin with Open-CLIP to perform text-to-image search: starting from keyphrases, we retrieve relevant images, then embed those same keywords to compare them directly with image embeddings, and finally refine the queries to explore more nuanced results. 

In the **second section**, we work with DinoV2 image-only embeddings to identify visual similarities within the collection. Given a single reference image, we search for visually related items and interpret what features the model captures.

We will explore **how radio is represented both in images, and in the programs**. This will allow us to explore the image and textual elements using both types of embeddings.

## What you will learn?

- Perform keyword-based image retrieval using image captions and Open-CLIP, and convert a text query into an embedding for text-to-image similarity search;
- Use DINOv2 to search for visual similarities directly from a reference image;
- Compare how multimodal embeddings and visual-only embeddings support different research strategies.

## Useful resources

- [Impresso Python Library](https://impresso.github.io/impresso-py/)
- [Impresso Huggind Face](https://ipyleaflet.readthedocs.io/en/latest/index.html)

## Prerequisites

Run the following cells to install the required package and to connect to Imrpesso API:

> If you are working with Google Colab, you may need to restart the kernel. Go to *Runtime* and select *Restart session*. 

In [None]:
# Impresso Python package with embeddings search feature

!pip install --force-reinstall git+https://github.com/impresso/impresso-py.git@embeddings-search

In [None]:
# Connecting to Impresso API

from impresso import connect
impresso = connect('https://dev.impresso-project.ch/public-api/v1')

> During this notebook, we will want to go back and fourth between the notebook and the Impresso App. Having a small helper function to construct links can be really helpful

In [None]:
# We can not see the images, but can access them in the Impresso webapp

def img_webapp_url(uid, issue_mode=True):
  mode = "issue" if issue_mode else "search/images"
  pre, suf = uid.split('-a-')
  suffix = f"{pre}-a/view?articleId={suf}" if issue_mode else uid
  return f'https://dev.impresso-project.ch/app/{mode}/{suffix}'

# Text-to-Image embeddings with Open-Clip

First, we explore how the system retrieves images for simple keywords or short phrases related to radio, so we can get an initial sense of the most similar results.

## 1. Keyword search on image captions

In [None]:
kw_radio = 'radio'

result = impresso.images.find(term=kw_radio)
result

## 2. Text-to-image similarity search with Open-Clip

Next, we embed the same keyword with Open-CLIP and use this embedding to search through the Open-CLIP image embeddings, enabling text-to-image similarity retrieval.

In [None]:
kw_embedding = impresso.tools.embed_text(text=kw_radio, target='multimodal')
kw_embedding

> Having inspected the generated embedding, one might wonder what these weird characters and numbers mean: ```openclip-768:1zuRvAvlzLvJJxw9tRKdO5yyaryrbIK9wuLOvDO12bz...```
The reason for why this embedding does not look like a vector of numbers is rather simple: **It's encoded in a data-efficient format**.

In [None]:
# Searchingimages similar to the embedding

results = impresso.images.find(
  embedding=kw_embedding,
  limit=6
)
results

In [None]:
results.df[['contentItemUid', 'imageTypes.visualContentType']].index

In [None]:
import numpy as np
import pandas as pd

# Print the URLs for the first 5 images

for uid, r in results.df.head(5).iterrows():
  print(f"Result {uid} - link to image CI {r.previewUrl} - type {r['imageTypes.visualContentType']}")
  if str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> The extracted images either feature iillustrations of radios (physical radio sets) or illustrated headers of radio sections.
We can try to filter by image type, such as `Object`, `Non-Figurative Visual Content` and `Ornament or Illustrated Title`.


In [None]:
# filter results based on the image content type
object_results = impresso.images.find(
  content_type="Object",
  embedding=kw_embedding,
  limit=5
)

# Print the URLs for the first 5 images
print(f"Results for images of type Object")
for uid, r in object_results.df.head(5).iterrows():
  print(f"Result {uid} - link to image CI {img_webapp_url(uid, issue_mode=False)}")
  if str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")


> We recover several elements from the previous search, though not the image `EXP-1960-03-31-a-i0096`, which is now replaced by others.
We also observe that the image found in `EXP-2009-01-06-a-i0096` and `IMP-2009-01-06-a-i0080` was reused a few weeks later in `IMP-2010-02-02-a-i0120` by the editors.

In [None]:
# filter results based on the image content type
non_fig_results = impresso.images.find(
  content_type="Non-Figurative Visual Content",
  embedding=kw_embedding,
  limit=5
)

# Print the URLs for the first 5 images
print(f"Results for images of type Non-Figurative Visual Content")
for uid, r in non_fig_results.df.head(5).iterrows():
  print(f"Result {uid} - link to image CI {img_webapp_url(uid, issue_mode=False)}")
  if str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> There are far fewer results of this type, as they are generally rarer in the data. However, in both cases the model identifies the Radio section logo, likely because it also contains text.

## 3. Complexe search queries with embeddings

Next, we refine our search by embedding a more **descriptive query** that targets the radio program section of a newspaper.

In [None]:
program_query = "Weekly radio program"

# now embed the keyword prompt using the open-clip model
pgm_embedding = impresso.tools.embed_text(text=program_query, target='multimodal')
pgm_embedding


# look at images similar to the embedding
pgm_results = impresso.images.find(
  embedding=pgm_embedding,
  limit=6
)

for uid, r in pgm_results.df.head(5).iterrows():
  print(f"Result {uid} - link to image CI {img_webapp_url(uid, issue_mode=False)} - type {r['imageTypes.visualContentType']}")
  if str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> We’ve successfully retrieved the illustrated section titles!
This query captures many of the radio program pages from L’Impartial in the late 1930s and early 1940s.

> Since **CLIP is multilingual**, we can try the same search using a query in French.

In [None]:
program_query_fr = "Programme Radio de la semaine"

# now embed the keyword prompt using the open-clip model
pgm_fr_embedding = impresso.tools.embed_text(text=program_query_fr, target='multimodal')
pgm_fr_embedding


# look at images similar to the embedding
pgm_fr_results = impresso.images.find(
  embedding=pgm_fr_embedding,
  limit=6
)

for uid, r in pgm_fr_results.df.head(5).iterrows():
  print(f"Result {uid} - link to image CI {img_webapp_url(uid, issue_mode=False)} - type {r['imageTypes.visualContentType']}")
  if str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> We obtain a very similar situation: most results are program pages, but this time they are more recent and often list TV programs (note that the Swiss national radio and TV share the same name).
Now let’s see if we can go further and **retrieve actual images of radio stations**, ideally with people listening to the radio.


In [None]:
radio_query_fr = "Personnes écoutant la radio à côté du poste de radio."

# now embed the keyword prompt using the open-clip model
radio_fr_embedding = impresso.tools.embed_text(text=radio_query_fr, target='multimodal')
radio_fr_embedding


# look at images similar to the embedding
radio_fr_results = impresso.images.find(
  embedding=radio_fr_embedding,
  limit=6
)

for uid, r in radio_fr_results.df.head(5).iterrows():

  print(f"Result {uid} - link to image CI {r.previewUrl} - type {r['imageTypes.visualContentType']}")
  if 'contentItemUid' in r and str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> We retrieve more images of actual radio stations, often with people present. It can be useful to compare this with a similar sentence in English, or to refine the query to explicitly require a human figure in the scene.

In [None]:
radio_query_en = "People listening to a radio monitor."

# now embed the keyword prompt using the open-clip model
radio_en_embedding = impresso.tools.embed_text(text=radio_query_en, target='multimodal')
radio_en_embedding


# look at images similar to the embedding
radio_en_results = impresso.images.find(
  embedding=radio_en_embedding,
  limit=6
)

for uid, r in radio_en_results.df.head(5).iterrows():

  print(f"Result {uid} - link to image CI {r.previewUrl} - type {r['imageTypes.visualContentType']}")
  if 'contentItemUid' in r and str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

> Having a query in english seems to have done the trick here; as you can see most images are of the type human representation!
**Don't hesitate to explore further with different queries, more complex and simple ones, varying languages and using the help of the image type filter** to specify more precisely what's of interest!

# Image-only embeddings with DinoV2
Let's dive more into **image-to-image embeddings**, and search for images that matche ones that are of particular interest to us.

## 1. Searching for similar images with an external image
Suppose we are interested in studying the **spread of new technologies in the 1980s**: in that case, the image `EXP-1983-08-31-a-i0208` from content item `EXP-1983-08-31-a-i0195` could serve as a useful reference to discover similar articles.


In [None]:
example_image_id = 'EXP-1983-08-31-a-i0208'

example_embedding = impresso.images.get_embeddings(example_image_id)
example_embedding[1]

In [None]:
dino_results = impresso.images.find(
  embedding=example_embedding[1],
  limit=6
)

for uid, r in dino_results.df.head(5).iterrows():

  print(f"Result {uid} - link to image CI {r.previewUrl} - type {r['imageTypes.visualContentType']}")
  if 'contentItemUid' in r and str(r.contentItemUid)!='nan':
    print(f"       {r.contentItemUid} - link to corresponding CI {img_webapp_url(r.contentItemUid)}")

## 2. Searching for similar images with an external URL

In [None]:
# Embedding an image from a URL

image_url = 'https://gallica.bnf.fr/iiif/ark:/12148/bpt6k6069079/f2/775,369,1303,887/max/0/default.jpg'
embedding = impresso.tools.embed_image(image=image_url, target="image")


In [None]:
# Searching similar images from the embedded image URL

results = impresso.images.find(
  embedding=embedding,
  limit=5
)

results

## 3. Search for images with an embedded image

In [None]:
#  Embbed an image from Impresso to find similar items

image_id = 'tageblatt-1937-07-10-a-i0080'
embedding = impresso.images.get_embeddings(image_id)
embedding

In [None]:
results = impresso.images.find(
  content_type="Map - Geopolitical",
  embedding=embedding[0],
  date_range=DateRange("1800-01-01", "1960-01-01"),
  limit=5
)

results

## 4. Combining image-to-image and text-to-image search with complex filters

In [None]:
# Search for images combining keyword and content type

result = impresso.images.find(
    term="Palästina",
    content_type="Map - Geopolitical")
result

In [None]:
# Search for images combining text-to-image and date-range filter

result = impresso.images.find(
    term="Jerusalem", 
    date_range=DateRange("1900-01-01", "1960-01-01"))

result

In [None]:
# Search for images combining keyword, date-range, and content type filters

from impresso import OR

result = impresso.images.find(term="Jerusalem",
                             date_range=DateRange("1900-01-01", "1960-01-01"),
                             content_type=OR('Scenery or Landscape','Human representations - Scene'))
result

# Conclusion

In this notebook, we explored how Impresso's models - Open-CLIP for **text-to-image search** and DINOv2 for **image-to-image similarity** - can be used to navigate historical visual collections. 
Starting from simple and more descriptive queries, we saw how Open-CLIP retrieves radio programs and illustrated section titles across languages, before turning to DINOv2 to find visually similar images from a single reference example.

Together, these approaches show **how multimodal and visual embeddings can help us move beyond keyword search**.

---
## Project and License info

### Notebook credits [CreditLogo.png](https://credit.niso.org/)

**Writing - Original draft:**  Roman Kalyakin. **Conceptualization:** Marten Düring. **Software:** Roman Kalyakin. **Writing - Review & Editing**: Pauline Conti, Cao Vy. **Validation:** Maud Ehrmann, Kirill Veprikov. **Datalab editorial board:** Caio Mello (Managing), Pauline Conti, Emanuela Boros, Marten Düring, Juri Opitz, Martin Grandjean, Estelle Bunout, Cao Vy. **Data curation & Formal analysis:** Maud Ehrmann, Emanuela Boros, Pauline Conti, Simon Clematide, Juri Opitz, Andrianos Michail. **Methodology:** Roman Kalyakin. **Supervision:** Marten Düring. **Funding aquisition:** Maud Ehrmann, Simon Clematide, Marten Düring, Raphaëlle Ruppen Coutaz.

<br><a target="_blank" href="https://creativecommons.org/licenses/by/4.0/">
  <img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by.png"  width="100" alt="Open In Colab"/>
</a> 

This notebook is published under [CC BY 4.0 License](https://creativecommons.org/licenses/by/4.0/)

For feedback on this notebook, please send an email to info@impresso-project.ch

### Impresso project

[Impresso - Media Monitoring of the Past](https://impresso-project.ch) is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. [CRSII5_173719](http://p3.snf.ch/project-173719) and the second project (2023-2027) by the SNSF under grant No. [CRSII5_213585](https://data.snf.ch/grants/grant/213585) and the Luxembourg National Research Fund under grant No. 17498891.
<br></br>
### License

All Impresso code is published open source under the [GNU Affero General Public License](https://github.com/impresso/impresso-pyindexation/blob/master/LICENSE) v3 or later.


---

<p align="center">
  <img src="https://github.com/impresso/impresso.github.io/blob/master/assets/images/3x1--Yellow-Impresso-Black-on-White--transparent.png?raw=true" width="350" alt="Impresso Project Logo"/>
</p>
