# Impresso API SETUP

In [2]:
!pip install git+https://github.com/impresso/impresso-py.git@embeddings-search

Collecting git+https://github.com/impresso/impresso-py.git@embeddings-search
  Cloning https://github.com/impresso/impresso-py.git (to revision embeddings-search) to /tmp/pip-req-build-xphgqihk
  Running command git clone --filter=blob:none --quiet https://github.com/impresso/impresso-py.git /tmp/pip-req-build-xphgqihk
  Running command git checkout -b embeddings-search --track origin/embeddings-search
  Switched to a new branch 'embeddings-search'
  Branch 'embeddings-search' set up to track remote branch 'embeddings-search' from 'origin'.
  Resolved https://github.com/impresso/impresso-py.git to commit a5fd1a1fbb4b130b3b96d7483e92e6eadf763f71
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting attrs<24.0.0,>=23.2.0 (from impresso==0.9.13)
  Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting httpx<0.28.0,>=0.27.0 (from impresso==0.9.13)


In [3]:
from impresso import connect

impresso_session = connect('https://dev.impresso-project.ch/public-api/v1')


Click on the following link to access the login page: https://dev.impresso-project.ch/datalab/token
 - 🔤 Enter your email/password on this page.
 - 🔑 Once logged in, a secret token will be generated for you.
 - 📋 Copy this token and paste it into the input field below. Then press "Enter". 👇🏼.

🔑 Enter your token: ··········
🎉 You are now connected to the Impresso API!  🎉
🔗 Using API: https://dev.impresso-project.ch/public-api/v1


In [4]:
# ---embed and search---

question_q1 = "What happened in Zurich in the year 1950?"

question_q2 = "How many water fountains are there in Zurich?"

embedding_q1 = impresso_session.tools.embed_text(text=question_q1, target="text")

embedding_q2 = impresso_session.tools.embed_text(text=question_q2, target="text")

# RETRIEVAL PREPARATION

In [5]:
search_result_q1 = impresso_session.search.find(embedding=embedding_q1, limit=20)

# using the embedding to search in impresso
# get ids of search results
result_uids_q1 = [r["uid"] for r in search_result_q1.raw["data"]]

result_articles_q1 = [impresso_session.content_items.get(uid) for uid in result_uids_q1]

In [6]:
search_result_q2 = impresso_session.search.find(embedding=embedding_q2, limit=20)

# using the embedding to search in impresso
# get ids of search results
result_uids_q2 = [r["uid"] for r in search_result_q2.raw["data"]]

result_articles_q2 = [impresso_session.content_items.get(uid) for uid in result_uids_q2]

In [7]:
def format_and_display_articles(articles, fields=["title", "publicationDate", "languageCode", "transcript"], transcript_line_characters=100, max_transcript_characters=1500, return_string=False):
  """
  Formats a list of article objects into a string representation and optionally prints it.
  Includes transcript length and allows limiting the printed transcript length.

  Args:
    articles: A list of article objects (with a .raw attribute containing a dictionary).
    fields: A list of strings representing the fields to include for each article.
    transcript_line_length: The maximum number of characters per line for the transcript when printed.
    max_transcript_length: The maximum number of characters to print for the transcript.
                           If None, the entire transcript is printed.
    return_string: If True, the function returns a single string containing the formatted
                   article information. If False, it prints the information.
  """
  formatted_output = []
  for i, article in enumerate(articles):
    article_output = [f"--- Article {i+1} ---"]
    for field in fields:
      if field == "title":
        title = article.raw.get(field)
        article_output.append(f"Title: {title if title is not None else 'No Title Found'}")
      elif field == "publicationDate":
        date = article.raw.get(field)
        if date is not None:
          article_output.append(f"Publicationdate: {date.split('T')[0]}")
      elif field == "languageCode":
        lang = article.raw.get(field)
        if lang is not None:
            article_output.append(f"Languagecode: {lang}")
      elif field == "transcript":
        transcript = article.raw.get(field)
        if transcript is not None:
            transcript_length = len(transcript)
            article_output.append(f"Transcript Length: {transcript_length} characters")
            article_output.append(f"{field.capitalize()}:")
            transcript_to_print = transcript[:max_transcript_characters] if max_transcript_characters is not None else transcript
            if max_transcript_characters is not None and len(transcript) > max_transcript_characters:
                transcript_to_print += "... [truncated]"
            # Split transcript into chunks for better readability
            for j in range(0, len(transcript_to_print), transcript_line_characters):
              article_output.append(transcript_to_print[j:j+transcript_line_characters])

    article_output.append("-" * (len(f"--- Article {i+1} ---")))
    formatted_output.append("\n".join(article_output))

  if return_string:
    return "\n\n".join(formatted_output)
  else:
    print("\n\n".join(formatted_output))

In [8]:
format_and_display_articles(result_articles_q1)

--- Article 1 ---
Title: EN SUISSE : A ZURICH, DES ÉMEUTIERS FONT METTRE EN LIBERTÉ UN PRISONNI[...]
Publicationdate: 1919-06-16
Languagecode: fr
Transcript Length: 1669 characters
Transcript:
ZURICH, -15 juin. — Des troubles assez sérieux ont éclaté à Zurich, où, comme on le sait, existe une
 minorité socialiste extrémiste des plus actives. Une grande manifestation avait été organisée, vend
redi soir, pair l'Union ouvrière zurichoise, pour honorer la mémoire de Posa Luxembourg. Le préfet d
e police de Zurich, le socialiste Greber, sur la promesse que la manifestation se déroulerait dans l
e calme, avait donné des instructions pour que la polioe.se tînt éloignée de la réunion. Celle-ci co
mmença bien; mais un orateur vint annoncer que le délégué socialiste suisse. Conrad Wyss venait, en 
rentrant d'Allemagne, d'être arrêté par les autorités fédérales, pour avoir introduit des brochures 
et des tracts de propagande. Aussi tôt la foule se porta vers la prison et réclama la mi.se en liber

In [9]:
format_and_display_articles(result_articles_q2)

--- Article 1 ---
Title: isi Siiiss®
Publicationdate: 1928-07-19
Languagecode: fr
Transcript Length: 2732 characters
Transcript:
_isi _Siiiss _® Ne gaspillons pas l'eau ! ZURICH, 19. — Le Service des eaux de la ville de Zurich vi
ent de publier un avis concernant la consommation de l'eau, avis motivé par le -fait qu'en ces temps
 de chaleur excessive, on a coutume de mettre le lait et d'autres aliments dans l'eau courante pour 
les rafraîchir, d'arroser les jardins par trop copieusement, bref, de gaspiller l'eau. Ces jours der
niers, la consommation d'eau a dépassé 100, 000 m 3, ce qui représente une moyenne de 425 litres par
 personne et par jour, tandis que jusqu'ici, la consommation la plus forte n'avait pas dépassé 83 mi
lle m 3. Zurich possède, il est vrai, un réservoir inépuisable — en l'espèce son lac — mais il n'en 
reste pas moins que les installations de filtrage et de transport ont des capacités limitées. Samedi
 dernier, la consommation a même atteint 103, 000 m 3, ce qui e

# Reranking Utilities

In [10]:
from scipy.stats import spearmanr

def calculate_overlap_between_rank_and_reranking(original_articles, reranked_articles):
  """
  Calculates and prints ranking metrics between the original and reranked article lists.

  Args:
    original_articles: The original list of article objects.
    reranked_articles: The reranked list of article objects.
  """
  # Get the UIDs of the articles in the original and reranked lists
  original_uids = [article.raw["uid"] for article in original_articles]
  reranked_uids = [article.raw["uid"] for article in reranked_articles]

  # Create ranking lists based on UIDs
  original_ranking = [original_uids.index(uid) for uid in reranked_uids]
  reranked_ranking = list(range(len(reranked_uids)))

  # Calculate Spearman correlation
  spearman_corr, _ = spearmanr(original_ranking, reranked_ranking)

  print(f"Spearman correlation between original and reranked rankings: {spearman_corr:.4f}")

  # Calculate overlap in top 5 and top 10
  original_top_3 = set(original_uids[:3])
  reranked_top_3 = set(reranked_uids[:3])
  overlap_3 = len(original_top_3.intersection(reranked_top_3))
  print(f"Overlap in top 3 results: {overlap_3}")

  # Calculate overlap in top 5 and top 10
  original_top_5 = set(original_uids[:5])
  reranked_top_5 = set(reranked_uids[:5])
  overlap_5 = len(original_top_5.intersection(reranked_top_5))
  print(f"Overlap in top 5 results: {overlap_5}")

  original_top_10 = set(original_uids[:10])
  reranked_top_10 = set(reranked_uids[:10])
  overlap_10 = len(original_top_10.intersection(reranked_top_10))
  print(f"Overlap in top 10 results: {overlap_10}")

In [11]:
from sentence_transformers import SentenceTransformer
import numpy as np

def rerank_articles_by_other_embedding_model(original_articles, question, embedding_model):
  """
  Reranks a list of articles based on their relevance to a given question using a different embedding model.

  Args:
    articles: A list of article objects (with a .raw attribute containing a dictionary).
    question: The question string.
    embedding_model: The SentenceTransformer model to use for generating embeddings.

  Returns:
    A list of article objects, sorted by relevance to the question in descending order.
  """
  # Combine title and transcript for article text
  article_texts = [str(article.raw.get("title")) + " " + article.raw.get("transcript") for article in original_articles]

  # Generate embeddings for articles and question
  article_embeddings = embedding_model.encode(article_texts)
  question_embedding = embedding_model.encode(question)

  # Calculate cosine similarity using numpy
  similarity_scores = np.dot(article_embeddings, question_embedding.reshape(-1, 1)).flatten() / (np.linalg.norm(article_embeddings, axis=1) * np.linalg.norm(question_embedding))

  # Pair articles with their scores and sort
  scored_articles = sorted(zip(original_articles, similarity_scores), key=lambda x: x[1], reverse=True)

  # Return only the sorted articles
  reranked_articles = [article for article, score in scored_articles]

  calculate_overlap_between_rank_and_reranking(original_articles=original_articles, reranked_articles=reranked_articles)
  return reranked_articles

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

In [12]:
embedding_model_denoising = SentenceTransformer('impresso-project/halloween_workshop_ocr_robust_preview', trust_remote_code=True)
embedding_model_lux = SentenceTransformer('impresso-project/halloween_workshop_ocr_robust_with_lux_preview', trust_remote_code=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/283 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/58.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/impresso-project/halloween_workshop_ocr_robust_preview:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/impresso-project/halloween_workshop_ocr_robust_preview:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/312 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/283 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/58.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/impresso-project/halloween_workshop_ocr_robust_with_lux_preview:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/impresso-project/halloween_workshop_ocr_robust_with_lux_preview:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/312 [00:00<?, ?B/s]

# Retrieval Augmented Generation

In [13]:
# 1. Install the 'openai' library
!pip install openai -q

In [14]:
def generate_interface_links(articles, url_format_string):
  """
  Generates a formatted string of interface links using the 'guid' from each article.

  Args:
    articles: A list of article objects (with a .raw attribute containing a dictionary).
    url_format_string: A string representing the URL format, expecting a '{guid}' placeholder.

  Returns:
    A single string containing all the formatted URLs, each on a new line.
  """
  interface_links = []
  for article in articles:
    guid = article.raw.get("uid")
    if guid:
      formatted_url = url_format_string.format(guid=guid)
      interface_links.append(formatted_url)
  return "\n".join(interface_links)

## RERANKING

### Q1

In [15]:
reranked_articles_q1 = rerank_articles_by_other_embedding_model(result_articles_q1, question_q1, embedding_model=embedding_model_denoising)

Spearman correlation between original and reranked rankings: -0.4556
Overlap in top 3 results: 0
Overlap in top 5 results: 1
Overlap in top 10 results: 3


In [16]:
format_and_display_articles(result_articles_q1)

--- Article 1 ---
Title: EN SUISSE : A ZURICH, DES ÉMEUTIERS FONT METTRE EN LIBERTÉ UN PRISONNI[...]
Publicationdate: 1919-06-16
Languagecode: fr
Transcript Length: 1669 characters
Transcript:
ZURICH, -15 juin. — Des troubles assez sérieux ont éclaté à Zurich, où, comme on le sait, existe une
 minorité socialiste extrémiste des plus actives. Une grande manifestation avait été organisée, vend
redi soir, pair l'Union ouvrière zurichoise, pour honorer la mémoire de Posa Luxembourg. Le préfet d
e police de Zurich, le socialiste Greber, sur la promesse que la manifestation se déroulerait dans l
e calme, avait donné des instructions pour que la polioe.se tînt éloignée de la réunion. Celle-ci co
mmença bien; mais un orateur vint annoncer que le délégué socialiste suisse. Conrad Wyss venait, en 
rentrant d'Allemagne, d'être arrêté par les autorités fédérales, pour avoir introduit des brochures 
et des tracts de propagande. Aussi tôt la foule se porta vers la prison et réclama la mi.se en liber

In [17]:
format_and_display_articles(reranked_articles_q1)

--- Article 1 ---
Title: d octobre 1950 BULLETIN DE BOURSE
Publicationdate: 1950-10-05
Languagecode: fr
Transcript Length: 2103 characters
Transcript:
d octobre 1950 BULLETIN DE BOURSE u E Zurich :. Cour _.', Obligations A 5 VA% Féd. 42 / ms 101-25 d'
«» 3%% Féd. 43 / av. 10675 d 1 t) 6-8° 3%% Féd. 44 / mal 1 ( wfid 10 < _"° 3% Fédéral 49.. _™ _' 9° 
104. 75 3% C. F. F. 38.. 1 t) 3 _- 65 m 50 Actions Swissair.... 220 215 d B. Com. de Bâle 263 265 Ba
nque Fédérale 178 179 Union B. Suisses 900 902 Société B. Suisse 786 786 Crédit Suisse.. 797 796 d C
onli Linoléum. 216 218 Electro Watt.. 715 717 interhandel... 668 667 Motor Colombus. 509 510 S. A. E
. G. Sér. 1 66 66 Indelec.... 267 274 Italo-Suisse prior. 83 83 Réassurances.. 5650 5710 Winterthour
 Ace. 5790 5800 Zurich Assuranc. 7900 d 7990 o Aar-Tessin i.. 1180 1180 Zurich : Cou. ra du Actions 
4 5 Saurer ¦ _<,, 880 890 Aluminium B ¦ s 2075 2098 Bally... a a -, 725 725 Brown-Boveri.. 925 930 F
. Mot. Suisse C. 1360 d 1360 Fischer.,, «

### Q2

In [18]:
reranked_articles_q2 = rerank_articles_by_other_embedding_model(result_articles_q2, question_q2, embedding_model=embedding_model_denoising)

Spearman correlation between original and reranked rankings: -0.0271
Overlap in top 3 results: 0
Overlap in top 5 results: 1
Overlap in top 10 results: 6


In [19]:
format_and_display_articles(result_articles_q2)

--- Article 1 ---
Title: isi Siiiss®
Publicationdate: 1928-07-19
Languagecode: fr
Transcript Length: 2732 characters
Transcript:
_isi _Siiiss _® Ne gaspillons pas l'eau ! ZURICH, 19. — Le Service des eaux de la ville de Zurich vi
ent de publier un avis concernant la consommation de l'eau, avis motivé par le -fait qu'en ces temps
 de chaleur excessive, on a coutume de mettre le lait et d'autres aliments dans l'eau courante pour 
les rafraîchir, d'arroser les jardins par trop copieusement, bref, de gaspiller l'eau. Ces jours der
niers, la consommation d'eau a dépassé 100, 000 m 3, ce qui représente une moyenne de 425 litres par
 personne et par jour, tandis que jusqu'ici, la consommation la plus forte n'avait pas dépassé 83 mi
lle m 3. Zurich possède, il est vrai, un réservoir inépuisable — en l'espèce son lac — mais il n'en 
reste pas moins que les installations de filtrage et de transport ont des capacités limitées. Samedi
 dernier, la consommation a même atteint 103, 000 m 3, ce qui e

In [20]:
format_and_display_articles(reranked_articles_q2)

--- Article 1 ---
Title: CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe
Publicationdate: 1982-09-07
Languagecode: fr
Transcript Length: 1809 characters
Transcript:
CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe Zurich, 6 (ATS).-Le 14 e congrès mondial de l'
Association des distributions d'eau (AIDE) s'est ouvert lundi à Zurich. Jusqu'au 10 septembre, plus 
de deux mille délégués de toutes les régions du monde débattront de problèmes liés au thème général 
« sans eau pas de vie ». Lors de leurs allocutions d'ouverture, le conseiller fédéral Hans Hiirliman
n et le président de la ville de Zurich Thomas Wagner ont relevé que la Suisse peut être considérée 
comme le château d'eau de l'Europe. Les problèmes qui se posent chez nous ne touchent pas à la quant
ité mais à la qualité. Selon M. Hùrlimann, l'exiguïté de notre pays, sa forte densité démographique 
et ses intenses activités économiques sont également sources de problèmes. Une statistique de la Soc
iété suisse de l'indu

### Q2 - LUX

In [21]:
reranked_lux_articles_q2 = rerank_articles_by_other_embedding_model(result_articles_q2, question_q2, embedding_model=embedding_model_lux)

Spearman correlation between original and reranked rankings: 0.1835
Overlap in top 3 results: 0
Overlap in top 5 results: 2
Overlap in top 10 results: 6


In [22]:
format_and_display_articles(reranked_articles_q2)

--- Article 1 ---
Title: CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe
Publicationdate: 1982-09-07
Languagecode: fr
Transcript Length: 1809 characters
Transcript:
CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe Zurich, 6 (ATS).-Le 14 e congrès mondial de l'
Association des distributions d'eau (AIDE) s'est ouvert lundi à Zurich. Jusqu'au 10 septembre, plus 
de deux mille délégués de toutes les régions du monde débattront de problèmes liés au thème général 
« sans eau pas de vie ». Lors de leurs allocutions d'ouverture, le conseiller fédéral Hans Hiirliman
n et le président de la ville de Zurich Thomas Wagner ont relevé que la Suisse peut être considérée 
comme le château d'eau de l'Europe. Les problèmes qui se posent chez nous ne touchent pas à la quant
ité mais à la qualité. Selon M. Hùrlimann, l'exiguïté de notre pays, sa forte densité démographique 
et ses intenses activités économiques sont également sources de problèmes. Une statistique de la Soc
iété suisse de l'indu

## USE CASE 1 - AI Assisted Source Finding

### Q1

In [23]:
format_and_display_articles(result_articles_q1[:5])
RAG_input_context_q1 = format_and_display_articles(result_articles_q1[:5], return_string=True)

url_format_string = "https://dev.impresso-project.ch/app/article/{guid}"
interface_links_string_q1 = generate_interface_links(result_articles_q1[:5], url_format_string)

--- Article 1 ---
Title: EN SUISSE : A ZURICH, DES ÉMEUTIERS FONT METTRE EN LIBERTÉ UN PRISONNI[...]
Publicationdate: 1919-06-16
Languagecode: fr
Transcript Length: 1669 characters
Transcript:
ZURICH, -15 juin. — Des troubles assez sérieux ont éclaté à Zurich, où, comme on le sait, existe une
 minorité socialiste extrémiste des plus actives. Une grande manifestation avait été organisée, vend
redi soir, pair l'Union ouvrière zurichoise, pour honorer la mémoire de Posa Luxembourg. Le préfet d
e police de Zurich, le socialiste Greber, sur la promesse que la manifestation se déroulerait dans l
e calme, avait donné des instructions pour que la polioe.se tînt éloignée de la réunion. Celle-ci co
mmença bien; mais un orateur vint annoncer que le délégué socialiste suisse. Conrad Wyss venait, en 
rentrant d'Allemagne, d'être arrêté par les autorités fédérales, pour avoir introduit des brochures 
et des tracts de propagande. Aussi tôt la foule se porta vers la prison et réclama la mi.se en liber

### Q2

In [24]:
format_and_display_articles(result_articles_q2[:5])
RAG_input_context_q2 = format_and_display_articles(result_articles_q2[:5], return_string=True)

url_format_string = "https://dev.impresso-project.ch/app/article/{guid}"
interface_links_string_q2 = generate_interface_links(result_articles_q2[:5], url_format_string)

--- Article 1 ---
Title: isi Siiiss®
Publicationdate: 1928-07-19
Languagecode: fr
Transcript Length: 2732 characters
Transcript:
_isi _Siiiss _® Ne gaspillons pas l'eau ! ZURICH, 19. — Le Service des eaux de la ville de Zurich vi
ent de publier un avis concernant la consommation de l'eau, avis motivé par le -fait qu'en ces temps
 de chaleur excessive, on a coutume de mettre le lait et d'autres aliments dans l'eau courante pour 
les rafraîchir, d'arroser les jardins par trop copieusement, bref, de gaspiller l'eau. Ces jours der
niers, la consommation d'eau a dépassé 100, 000 m 3, ce qui représente une moyenne de 425 litres par
 personne et par jour, tandis que jusqu'ici, la consommation la plus forte n'avait pas dépassé 83 mi
lle m 3. Zurich possède, il est vrai, un réservoir inépuisable — en l'espèce son lac — mais il n'en 
reste pas moins que les installations de filtrage et de transport ont des capacités limitées. Samedi
 dernier, la consommation a même atteint 103, 000 m 3, ce qui e

### Q2 Reranked - Denoised Only

In [25]:
format_and_display_articles(reranked_articles_q2[:5])
RAG_input_context_q2_reranked = format_and_display_articles(reranked_articles_q2[:5], return_string=True)

url_format_string = "https://dev.impresso-project.ch/app/article/{guid}"
interface_links_string_q2_reranked = generate_interface_links(reranked_articles_q2[:5], url_format_string)

--- Article 1 ---
Title: CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe
Publicationdate: 1982-09-07
Languagecode: fr
Transcript Length: 1809 characters
Transcript:
CONGRÈS À ZURICH La Suisse, château d'eau de l'Europe Zurich, 6 (ATS).-Le 14 e congrès mondial de l'
Association des distributions d'eau (AIDE) s'est ouvert lundi à Zurich. Jusqu'au 10 septembre, plus 
de deux mille délégués de toutes les régions du monde débattront de problèmes liés au thème général 
« sans eau pas de vie ». Lors de leurs allocutions d'ouverture, le conseiller fédéral Hans Hiirliman
n et le président de la ville de Zurich Thomas Wagner ont relevé que la Suisse peut être considérée 
comme le château d'eau de l'Europe. Les problèmes qui se posent chez nous ne touchent pas à la quant
ité mais à la qualité. Selon M. Hùrlimann, l'exiguïté de notre pays, sa forte densité démographique 
et ses intenses activités économiques sont également sources de problèmes. Une statistique de la Soc
iété suisse de l'indu

### Q2 Reranked - Denoised and LUX

In [26]:
format_and_display_articles(reranked_lux_articles_q2[:5])
RAG_input_context_q2_reranked_lux = format_and_display_articles(reranked_lux_articles_q2[:5], return_string=True)

url_format_string = "https://dev.impresso-project.ch/app/article/{guid}"
interface_links_string_q2_reranked_lux = generate_interface_links(reranked_lux_articles_q2[:5], url_format_string)

--- Article 1 ---
Title: engpasse
Publicationdate: 1976-07-07
Languagecode: de
Transcript Length: 967 characters
Transcript:
engpasse bps. Die Wasserversorgung in einigen Zürcher Quartieren ist durch den schlagartig gestiegen
en Verbrauch an der Grenze ihrer Leistungsfähigkeit angelangt. Wie die Abteilung Wasserversorgung de
r Industriellen Betriebe der Stadt ZUrich mitteilte, betrifft dies in erster Linie die peripher gele
genen Versorgungsgebicte Seebach, Höngg, Waidberg, Friesenberg, Leimbach und Witikon. Wohl gehört zu
rzeit die Wasserversorgung der Stadt Zürich zu den modernsten und leistungsfähigsten ihrer Art in Eu
ropa. Doch die endgültige Sicherstellung der Wasserlieferungen für das gesamte Stadtgebiet wird erst
 ab etwa 1982 bewerkstelligt sein, da sich momentan verschiedene in der Gemeindeabstimmung vom 4. Mä
rz 1973 bewilligte Projekte noch im Stadium der Realisierung befinden. So steht beispielsweise das G
rundwasserwerk Hardhof im Neubau. Darüberhinaus werden verschiedene

### OPENROUTER VERSION

In [27]:
import openai
import getpass
import os

# 2. Securely get your API key
# When you run this, a password-style box will appear.
# Paste your OpenRouter API key there and press Enter.
if "OPENROUTER_API_KEY" not in os.environ:
  os.environ["OPENROUTER_API_KEY"] = getpass.getpass("Enter your OpenRouter API Key: ")

# 3. Configure the OpenAI client to use OpenRouter
client = openai.OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=os.environ["OPENROUTER_API_KEY"],
)

# 4. Set up your request
model_id = "google/gemma-3-27b-it:free"
core_input = "Summarize how the following articles that might contain OCR errors and relate these articles to the question of the user. \n"

Enter your OpenRouter API Key: ··········


#### Q1

In [28]:
input_with_context = core_input + " Question: " + question_q1 + "\n Retrieved Context: " + RAG_input_context_q1 + "\n"

print(f"Sending request to model: {model_id}...")

try:
  completion = client.chat.completions.create(
    model=model_id,
    messages=[
      {
        "role": "user",
        "content": input_with_context
      },
    ],
  )

  # 6. Print the model's response
  response_text = completion.choices[0].message.content
  print(("\n--- Core input of task ---"))
  print(core_input)
  print(("\n--- Question ---"))
  print(question_q1)
  print("\n--- Model Response ---")
  print(response_text)
  print("----------------------")
  print("\n--- Source Article Links ---")
  print(interface_links_string_q1)

except openai.AuthenticationError:
  print("AuthenticationError: Invalid API key. Please check your OpenRouter API key.")
except Exception as e:
  print(f"An error occurred: {e}")

Sending request to model: google/gemma-3-27b-it:free...

--- Core input of task ---
Summarize how the following articles that might contain OCR errors and relate these articles to the question of the user. 


--- Question ---
What happened in Zurich in the year 1950?

--- Model Response ---


## Summary of Articles & Relation to Your Question (What happened in Zurich in 1950?)

Here's a breakdown of each article and how it relates to your question about events in Zurich in 1950:

**1. Article 1 (1919-06-16):** This article details riots in Zurich in *1919*, related to a socialist demonstration honoring Rosa Luxembourg. It's **not relevant** to your 1950 question.

**2. Article 2 (1953-10-26):** This is a "looking back" piece, mentioning events from 50 years prior (around 1903) and other anniversaries (40, 30, 20, and 10 years before 1953). It doesn’t specifically focus on 1950. It only briefly mentions events spanning decades and isn’t useful. **Not relevant** to your question.

**3. A

#### Q2

In [31]:
input_with_context = core_input + " Question: " + question_q2 + "\n Retrieved Context: " + RAG_input_context_q2 + "\n"

print(f"Sending request to model: {model_id}...")

try:
  completion = client.chat.completions.create(
    model=model_id,
    messages=[
      {
        "role": "user",
        "content": input_with_context
      },
    ],
  )

  # 6. Print the model's response
  response_text = completion.choices[0].message.content
  print(("\n--- Core input of task ---"))
  print(core_input)
  print(("\n--- Question ---"))
  print(question_q2)
  print("\n--- Model Response ---")
  print(response_text)
  print("----------------------")
  print("\n--- Source Article Links ---")
  print(interface_links_string_q2)

except openai.AuthenticationError:
  print("AuthenticationError: Invalid API key. Please check your OpenRouter API key.")
except Exception as e:
  print(f"An error occurred: {e}")

Sending request to model: google/gemma-3-27b-it:free...

--- Core input of task ---
Summarize how the following articles that might contain OCR errors and relate these articles to the question of the user. 


--- Question ---
How many water fountains are there in Zurich?

--- Model Response ---
Okay, here's a summary of the articles and how they relate to the question "How many water fountains are there in Zurich?".  I'll also note potential OCR issues.

**Summary of Articles:**

* **Article 1 (1928-07-19):** This article discusses water *consumption* in Zurich during a heatwave. It notes that consumption exceeded 100,000 m³ and 425 liters per person per day. It mentions Zurich has the lake as a reservoir but filtration/transport capacity is limited. It's about overall water usage, not fountains specifically.  OCR issues: "isi Siiiss" likely should be "ici Suisse".
* **Article 2 (1962-08-04):** **This is the most relevant article.** It states that Zurich had **26 public fountains** con

#### Q2 Reranked Denoising

In [32]:
input_with_context = core_input + " Question: " + question_q2 + "\n Retrieved Context: " + RAG_input_context_q2_reranked + "\n"

print(f"Sending request to model: {model_id}...")

try:
  completion = client.chat.completions.create(
    model=model_id,
    messages=[
      {
        "role": "user",
        "content": input_with_context
      },
    ],
  )

  # 6. Print the model's response
  response_text = completion.choices[0].message.content
  print(("\n--- Core input of task ---"))
  print(core_input)
  print(("\n--- Question ---"))
  print(question_q2)
  print("\n--- Model Response ---")
  print(response_text)
  print("----------------------")
  print("\n--- Source Article Links ---")
  print(interface_links_string_q2_reranked)

except openai.AuthenticationError:
  print("AuthenticationError: Invalid API key. Please check your OpenRouter API key.")
except Exception as e:
  print(f"An error occurred: {e}")

Sending request to model: google/gemma-3-27b-it:free...

--- Core input of task ---
Summarize how the following articles that might contain OCR errors and relate these articles to the question of the user. 


--- Question ---
How many water fountains are there in Zurich?

--- Model Response ---


Here's a summary of the articles and their relevance to the question "How many water fountains are there in Zurich?":

**Summary of Articles:**

* **Article 1 (1982):** Discusses a water distribution congress in Zurich and Switzerland's position as a "water castle" of Europe. It mentions water consumption statistics (248 liters/person/day) and investment in pollution control.  It doesn't mention fountains.
* **Article 2 (1914):** Focuses on the number of alcohol vendors (pubs/shops) in the canton and city of Zurich, aiming to regulate alcohol consumption. Completely irrelevant to water fountains.
* **Article 3 (1976):** Reports on water supply limitations in some Zurich districts due to increa

#### Q2 Reranked Denoising Lux

In [34]:
input_with_context = core_input + " Question: " + question_q2 + "\n Retrieved Context: " + RAG_input_context_q2_reranked_lux + "\n"

print(f"Sending request to model: {model_id}...")

try:
  completion = client.chat.completions.create(
    model=model_id,
    messages=[
      {
        "role": "user",
        "content": input_with_context
      },
    ],
  )

  # 6. Print the model's response
  response_text = completion.choices[0].message.content
  print(("\n--- Core input of task ---"))
  print(core_input)
  print(("\n--- Question ---"))
  print(question_q2)
  print("\n--- Model Response ---")
  print(response_text)
  print("----------------------")
  print("\n--- Source Article Links ---")
  print(interface_links_string_q2_reranked_lux)

except openai.AuthenticationError:
  print("AuthenticationError: Invalid API key. Please check your OpenRouter API key.")
except Exception as e:
  print(f"An error occurred: {e}")

Sending request to model: google/gemma-3-27b-it:free...

--- Core input of task ---
Summarize how the following articles that might contain OCR errors and relate these articles to the question of the user. 


--- Question ---
How many water fountains are there in Zurich?

--- Model Response ---


Okay, here's a summary of the articles and how they relate to the question "How many water fountains are there in Zurich?":

**Summary of Articles:**

* **Article 1 (engpasse - 1976):**  Discusses water supply issues in Zurich's outer districts due to increased consumption. It mentions upgrades to the water infrastructure (groundwater works, pumping stations, reservoirs, pipelines) but **doesn't mention the number of fountains.**
* **Article 2 (CONGRÈS À ZURICH - 1982):** Reports on a water distribution congress held in Zurich. Focuses on Switzerland as a "water castle" of Europe, water consumption statistics (580 million cubic meters nationally in 1981, 248 liters/person/day), and pollution c