In [2]:
import os
os.chdir("../")

from src.chroma_store import initialize_vectorstore, load_documents_from_dir
from src.chunking_strategies import chunk_by_semantic, chunk_by_recursive_split

In [3]:
# Load the documents from the data directory.
documents = load_documents_from_dir("data/content")

chunks = chunk_by_recursive_split(documents, chunk_size=400)
# chunks = chunk_by_semantic(documents)

vectorstore = initialize_vectorstore(chunks)

--INFO-- Loading documents from data/content


2024-07-04 09:17:33 - src.chroma_store - INFO - Clearing out the chroma database.
2024-07-04 09:17:33 - src.chroma_store - INFO - Creating a new chroma database.


--INFO-- Loaded 1 documents
Split 1 documents into 51 chunks.
Time Tracking: Advisor shall provide the Company with a written report, in a format acceptable by the Company, setting forth the number of hours in which he provided the Services, on a daily basis, as well as an aggregated monthly report at the last day of each calendar month.
{'source': 'data/content/Robinson Advisory.docx', 'start_index': 2311}


2024-07-04 09:17:35 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-07-04 09:17:37 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


### Initialize retriever

In [4]:
similarity_threshold = 0.65
similarity_count = 20
retriever = vectorstore.as_retriever(search_type="similarity_score_threshold",
                                      search_kwargs={'score_threshold': similarity_threshold,
                                                      "k": similarity_count})

### Retreival of documents

In [5]:
from src.rag_pipeline import create_rank_fusion_chain, generate_answer
from src.utils import format_tuple_docs_to_text, format_docs_to_text

In [6]:
question = "Who are the parties to the Agreement and what are their defined names?"

In [7]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)
retrieval_chain = create_rank_fusion_chain(question, llm, retriever)

# docs = retrieval_chain.invoke({"question": question})
# context_text = format_tuple_docs_to_text(docs)

docs = retriever.get_relevant_documents(question)
context_text = format_docs_to_text(docs)


print(context_text)

  warn_deprecated(
2024-07-04 09:17:57 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Entire Agreement; No Waiver or Assignment: This Agreement together with the Exhibits, which are attached hereto and incorporated herein, set forth the entire Agreement between the parties and shall supersede all previous communications and agreements between the parties, either oral or written. This Agreement may be modified only by a written amendment executed by both parties. This Agreement may

------------

Term: The term of this Agreement shall commence on the Effective Date and shall continue until terminated in accordance with the provisions herein (the "Term").

------------

Governing Law and Jurisdiction:  This Agreement shall be governed by the laws of the State of Israel, without giving effect to the rules respecting conflicts of laws. The parties consent to the exclusive jurisdiction and venue of Tel Aviv courts for any lawsuit filed arising from or relating to this Agreement.

------------

Whereas,	Advisor has expertise and/or knowledge and/or relationships, which are re

In [8]:
print(chunks[0].page_content)

- 2-

ADVISORY SERVICES AGREEMENT

This Advisory Services Agreement is entered into as of June 15th, 2023 (the “Effective Date”), by and between Cloud Investments Ltd., ID 51-426526-3, an Israeli company (the "Company"), and Mr. Jack Robinson, Passport Number 780055578, residing at 1 Rabin st, Tel Aviv, Israel, Email: jackrobinson@gmail.com ("Advisor").


#### Reranking with cohere

In [9]:
# Helper function for printing docs


def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

In [10]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere

llm = Cohere(temperature=0)
compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke(
    question
)
pretty_print_docs(compressed_docs)

2024-07-04 09:18:06 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-04 09:18:07 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"


Document 1:

By: ________________________		By:________________________

Name:	Silvan Joseph				Name:	Jack Robinson		

Title: CEO					



Confidentiality, None Compete and IP Ownership Undertaking

Appendix A to Advisory Service Agreement as of June 15th, 2023
----------------------------------------------------------------------------------------------------
Document 2:

Whereas,	Advisor has expertise and/or knowledge and/or relationships, which are relevant to the Company’s business and the Company has asked Advisor to provide it with certain Advisory services, as described in this Agreement; and

Whereas, 	Advisor has agreed to provide the Company with such services, subject to the terms set forth in this Agreement.

NOW THEREFORE THE PARTIES AGREE AS FOLLOWS:
----------------------------------------------------------------------------------------------------
Document 3:

Notices: Notices under this Agreement shall be delivered to the party’s email address as follows: Company: info@c

#### Answer generation

In [None]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

answer = generate_answer(question, context_text, llm=llm)
answer


You are an experienced Legal Assistant who analyzes legal documents. Your expertise includes extracting facts and integrating information from multiple sources to provide well-supported answers. 

Guidelines:

1. Derive your answer strictly from the provided context. Do not introduce any new information.

2. Ensure complete contextuality: Address all aspects of the query, linking back to specific details in the context.

3. Avoid phrases like "In the context provided" or "According to my knowledge."

4. Be concise and to the point.

5. Write in a professional and legally appropriate manner.

Previous Q & A examples include:


Given the guidelines and examples, please answer the question based on the following context.