<a target="_blank" href="https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/workshop_resources/ws4-embeddings/MultiModal.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
!pip install -qqq git+https://github.com/impresso/impresso-py.git@embeddings-search

# Mapping Holy Lands?

The visual representation of Israel/Palestine through the centuries.
An exercise in multimodality.

This notebook contains a few examples for multimodal and image search focussing on the visual representation of Israel and Palestine in the historical press.

The main aim of the notebooks is to showcase the API functionalities of the datalab.

Key questions:
- How can we find maps (or other types of visualisation) of these contested (and historically fluctuating) places, names and borders?
- How to combine image and text search?
- How can we analyse these outputs at scale and from a multimodal approach?

In [None]:
# restart the kernel just in case...
import os
os.kill(os.getpid(), 9)

In [None]:
from impresso import connect

impresso = connect('https://dev.impresso-project.ch/public-api/v1')

In [None]:
# use a wikipedia image url as starting point
image_url = 'https://upload.wikimedia.org/wikipedia/commons/e/e6/Dioecesis_Orientis_400_AD.png'
embedding = impresso.tools.embed_image(image=image_url, target="image")


In [None]:
# look at images similar to the embedding
results = impresso.images.find(
  embedding=embedding,
  limit=5
)

In [None]:
results

In [None]:
# we can not see the images, but can acccess them in the Impresso webapp
# but have to create a link
def img_webapp_url(uid):
  pre, suf = uid.split('-a-')
  return f'https://dev.impresso-project.ch/app/issue/{pre}-a/view?articleId={suf}'

In [None]:
# add Impresso webapp link
df = results.df.copy()
df['webapp_url'] = df.contentItemUid.apply(img_webapp_url)

In [None]:
df.webapp_url.values

In [None]:
# filter results based on the image content type
results = impresso.images.find(
  content_type="Map - Geopolitical",
  embedding=embedding,
  limit=5
)

In [None]:
# inspect in Impresso webapp
df = results.df.copy()
df['webapp_url'] = df.contentItemUid.apply(img_webapp_url)
df['webapp_url'].values

In [None]:
# filter with a date range
from impresso import DateRange

results = impresso.images.find(
  content_type="Map - Geopolitical",
  embedding=embedding,
  date_range=DateRange("1800-01-01", "1960-01-01"),
  limit=5
)

results


In [None]:
# use an images prompt for find geopolitical maps
result = impresso.images.find(term="Palästina",content_type="Map - Geopolitical")
result

In [None]:
# get the image embeddings
image_id = 'tageblatt-1937-07-10-a-i0080'
embedding = impresso.images.get_embeddings(image_id)
embedding

In [None]:
# use the embedded image to find similar items
results = impresso.images.find(
  content_type="Map - Geopolitical",
  embedding=embedding[0],
  date_range=DateRange("1800-01-01", "1960-01-01"),
  limit=5
)

In [None]:
results

In [None]:
# combined text-to-image search with a date range
result = impresso.images.find(term="Jerusalem", date_range=DateRange("1900-01-01", "1960-01-01"))
result

In [None]:
results.df['imageTypes.visualContent']

In [None]:
# trying to find other representations
result = impresso.images.find(term="Jerusalem",
                             date_range=DateRange("1900-01-01", "1960-01-01"),
                             content_type=OR('Scenery or Landscape','Human representations - Scene'))
result

# Fin.