<a href="https://colab.research.google.com/github/kaljuvee/ipogate/blob/main/notebooks/ipogate_pdf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
! pip install langchain openai chromadb pymupdf tiktoken
from IPython.display import clear_output
clear_output()

In [2]:
import os
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

os.environ["OPENAI_API_KEY"] = ''

persist_directory = "./storage"
pdf_path = 'arcovara-ar.pdf'

loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=10)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embeddings,
                                 persist_directory=persist_directory)
vectordb.persist()

retriever = vectordb.as_retriever(search_kwargs={"k": 3})
llm = ChatOpenAI(model_name='gpt-4')

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

while True:
        user_input = input("Enter a query: ")
        if user_input == "exit":
            break

        query = f"###Prompt {user_input}"
        try:
            llm_response = qa(query)
            print(llm_response["result"])
        except Exception as err:
            print('Exception occurred. Please try again', str(err))

Enter a query: Can you give me a list of shareholders in this document?
The document lists the following shareholders:

1. Alarmo Kapital OÜ
2. FIREBIRD REPUBLICS FUND LTD
Enter a query: Can you give me the full list and also their shareholding percentages?
Sure, here is the list of shareholders and their shareholding percentages as of 31 December 2022:

1. Alarmo Kapital OÜ: 6,438,531 shares (62.0%)
2. FIREBIRD REPUBLICS FUND LTD: 337,057 shares (3.2%)
3. HM Investeeringud OÜ: 230,505 shares (2.2%)
4. Marko Teimann: 188,174 shares (1.8%)
5. FIREBIRD AVRORA FUND, LTD: 180,343 shares (1.7%)
6. Aia Tänav OÜ: 160,960 shares (1.5%)
7. K VARA OÜ: 150,901 shares (1.5%)
8. FIREBIRD FUND L.P: 150,901 shares (1.5%)
9. Rafiko OÜ: 133,948 shares (1.3%)
10. SANDER KARU: 70,606 shares (0.7%)
11. Other shareholders: 2,428,733 shares (23.4%)

Total: 10,388,367 shares (100.0%)
Enter a query: Can you give me a list of business risks from this document?
The document only mentions one business risk - str

KeyboardInterrupt: ignored