In [1]:
%load_ext autoreload
%autoreload 2

%load_ext jupyter_black
%load_ext dotenv
%dotenv

In [2]:
import sys
import os

sys.path.append("..")

In [3]:
import json

from langchain_community.embeddings import HuggingFaceEmbeddings
from tqdm.notebook import tqdm

from scripts.scraper import BasicScrapper
from scripts.rag import RAG

# Create database

In [4]:
scrapper = BasicScrapper()

polands_presidents_pages = [
    "https://en.wikipedia.org/wiki/President_of_Poland",
    "https://en.wikipedia.org/wiki/Gabriel_Narutowicz",
    "https://en.wikipedia.org/wiki/Stanis%C5%82aw_Wojciechowski",
    "https://en.wikipedia.org/wiki/Ignacy_Mo%C5%9Bcicki",
    "https://pl.wikipedia.org/wiki/Wojciech_Jaruzelski",
    "https://pl.wikipedia.org/wiki/Lech_Wa%C5%82%C4%99sa",
    "https://pl.wikipedia.org/wiki/Aleksander_Kwa%C5%9Bniewski",
    "https://pl.wikipedia.org/wiki/Lech_Kaczy%C5%84ski",
    "https://pl.wikipedia.org/wiki/Bronis%C5%82aw_Komorowski",
    "https://en.wikipedia.org/wiki/Andrzej_Duda",
]

# scrap all pages and save them to data/raw/polish_presiedents
for page in tqdm(polands_presidents_pages):
    scraped_page = scrapper.crawl(page)

    with open(f'../data/raw/polish_presidents/{scraped_page["title"]}.json', "w") as f:
        json.dump(scraped_page, f)

  0%|          | 0/10 [00:00<?, ?it/s]

# Add knowledge sources to the RAG

In [7]:
embedding = HuggingFaceEmbeddings(model_name=f"sentence-transformers/gtr-t5-base")
embedding.client.similarity_fn_name = "cosine"



In [8]:
rag = RAG(
    index_name="presidents",
    embedding_model=embedding,
    pinecone_api_key=os.getenv("PINECONE_API_KEY"),
    google_api_key=os.getenv("GOOGLE_API_KEY"),
)

In [9]:
# depending on the size of the data this might take a while
rag.prepare_vector_store("../data/raw/polish_presidents")

In [10]:
print(rag.ask("Who was the first president of Poland and when he/she was born?"))

Gabriel Narutowicz was the first president of Poland. He was born on March 29, 1865, in Telšiai, Lithuania, then part of the Russian Empire. He was a professor of hydroelectric engineering and politician who served as the first President of Poland from 11 December 1922 until his assassination on 16 December, five days after assuming office. He previously served as the Minister of Public Works from 1920 to 1922 and briefly as Minister of Foreign Affairs in 1922. A renowned engineer and politically independent, Narutowicz was the first non-Catholic to be elected president of Poland. His assassination was a major political crisis in Poland and led to a period of instability.


In [11]:
print(rag.ask("List all polish presidents and the years of their presidency."))

Here is a list of all Polish presidents and the years of their presidency:

* Gabriel Narutowicz (1922)
* Stanisław Wojciechowski (1922–1926)
* Ignacy Mościcki (1926–1939)
* Władysław Raczkiewicz (1939–1947)
* August Zaleski (1947–1952)
* Bolesław Bierut (1952–1954)
* Edward Ochab (1954–1964)
* Aleksander Zawadzki (1964–1972)
* Stanisław Gierek (1972–1980)
* Wojciech Jaruzelski (1985–1990)
* Lech Wałęsa (1990–1995)
* Aleksander Kwaśniewski (1995–2005)
* Lech Kaczyński (2005–2010)
* Bronisław Komorowski (2010–2015)
* Andrzej Duda (2015–present)

Please note that this list only includes the presidents of the Second Polish Republic (1918–1939) and the Third Polish Republic (1989–present). There were also presidents of the Polish People's Republic (1947–1989), but they were not elected by the people, but rather by the Sejm, the Polish parliament.
