In [1]:
%load_ext jupyter_black
%load_ext autoreload
%autoreload 2

In [2]:
import sys

sys.path.append("..")

In [3]:
import json

from langchain.embeddings import HuggingFaceEmbeddings
from tqdm.notebook import tqdm

from scripts.scraper import BasicScrapper
from scripts.rag import RAG

# Create database

In [4]:
scrapper = BasicScrapper()

polands_presidents_pages = [
    "https://en.wikipedia.org/wiki/President_of_Poland",
    "https://en.wikipedia.org/wiki/Gabriel_Narutowicz",
    "https://en.wikipedia.org/wiki/Stanis%C5%82aw_Wojciechowski",
    "https://en.wikipedia.org/wiki/Ignacy_Mo%C5%9Bcicki",
    "https://pl.wikipedia.org/wiki/Wojciech_Jaruzelski",
    "https://pl.wikipedia.org/wiki/Lech_Wa%C5%82%C4%99sa",
    "https://pl.wikipedia.org/wiki/Aleksander_Kwa%C5%9Bniewski",
    "https://pl.wikipedia.org/wiki/Lech_Kaczy%C5%84ski",
    "https://pl.wikipedia.org/wiki/Bronis%C5%82aw_Komorowski",
    "https://en.wikipedia.org/wiki/Andrzej_Duda",
]

# scrap all pages and save them to data/raw/polish_presiedents
for page in tqdm(polands_presidents_pages):
    scraped_page = scrapper.crawl(page)

    with open(f'../data/raw/polish_presidents/{scraped_page["title"]}.json', "w") as f:
        json.dump(scraped_page, f)

  0%|          | 0/10 [00:00<?, ?it/s]

# Add knowledge sources to the RAG

In [5]:
embedding = HuggingFaceEmbeddings(model_name=f"sentence-transformers/gtr-t5-base")
embedding.client.similarity_fn_name = "cosine"

Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



In [7]:
rag = RAG(
    embedding_model=embedding,
    index_name="presidents2",
    data_path="../data/raw/polish_presidents",
)

In [8]:
print(rag.ask("Who was the first president of Poland and when he/she was born?"))

Gabriel Narutowicz was the first president of Poland. He was born on March 29, 1865, in Telšiai, Lithuania, which was then part of the Russian Empire. He died on December 16, 1922, in Warsaw, Poland, after being assassinated five days after assuming office.

Narutowicz was a renowned engineer and politically independent. He served as the Minister of Public Works from 1920 to 1922 and briefly as Minister of Foreign Affairs in 1922. He was elected president by the National Assembly (the Sejm and the Senate) under the terms of the 1921 March Constitution.

Narutowicz's assassination was a major political crisis in Poland. It was widely believed to have been the work of right-wing extremists who opposed his election. His death led to a period of political instability in Poland.


In [9]:
print(rag.ask("List all polish presidents and the years of their presidency."))

Here is a list of all Polish presidents and the years of their presidency:

* Gabriel Narutowicz (1922)
* Stanisław Wojciechowski (1922–1926)
* Ignacy Mościcki (1926–1939)
* Władysław Raczkiewicz (1939–1947)
* August Zaleski (1947–1952)
* Bolesław Bierut (1947–1952)
* Aleksander Zawadzki (1952–1964)
* Edward Ochab (1964–1968)
* Józef Cyrankiewicz (1968–1970)
* Henryk Jabłoński (1970–1985)
* Wojciech Jaruzelski (1985–1990)
* Lech Wałęsa (1990–1995)
* Aleksander Kwaśniewski (1995–2005)
* Lech Kaczyński (2005–2010)
* Bronisław Komorowski (2010–2015)
* Andrzej Duda (2015–present)

Please note that this list only includes the presidents of the Second Polish Republic (1918–1939) and the Third Polish Republic (1989–present). There were also presidents of the First Polish Republic (1795–1797) and the Polish People's Republic (1947–1989), but they are not included in this list.
