In [1]:
import os
import sys
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
os.chdir(r"C:\Users\TempAccess\Documents\Dhruv\RAG")
print(os.getcwd())

C:\Users\TempAccess\Documents\Dhruv\RAG


![alt text](proposition_chunking.svg)

In [3]:
from helper_function_openai import (
    Document,
    OpenAIEmbedder,
    FAISSVectorStore,
    OpenAIChat,
    chunk_text,
)

In [8]:
from typing import List, Dict, Any, Optional, Tuple
import json

In [9]:
sample_content = """Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's vision and culture.
Graham suggests that founders should leverage these strengths rather than conform to traditional managerial practices. "Founder Mode" is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.
Challenges of Scaling Startups
As startups grow, there is a common belief that they must transition to a more structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile spirit that drove the startup's initial success.
Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale.
"""

print(f"Document length: {len(sample_content)} chars")

Document length: 2649 chars


## Chunking

In [14]:
chunks = chunk_text(sample_content, chunk_size=800, chunk_overlap=200)

# Create Document objects with metadata + chunk_id
doc_splits = []
for i, chunk in enumerate(chunks):
    doc_splits.append(Document(
        content=chunk,
        metadata={
            "title": "Paul Graham's Founder Mode Essay",
            "source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ",
            "chunk_id": i + 1
        }
    ))

print(f"Created {len(doc_splits)} chunks")
for d in doc_splits:
    print(f"  Chunk {d.metadata['chunk_id']}: {len(d.content)} chars — {d.content[:80]}...")

Created 5 chunks
  Chunk 1: 456 chars — Paul Graham's essay "Founder Mode," published in September 2024, challenges conv...
  Chunk 2: 749 chars — ies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditi...
  Chunk 3: 900 chars — gers.
Unique Founder Abilities
Founders possess unique insights and abilities th...
  Chunk 4: 745 chars — structured managerial approach. However, many founders have found this transitio...
  Chunk 5: 593 chars — piration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was J...


### Generate Propositions

In [15]:
llm = OpenAIChat(
    model_name="gpt-4o-mini",
    temperature=0.0,
    max_tokens=3000
)

llm

<helper_function_openai.OpenAIChat at 0x1e7cb3df0e0>

In [16]:
def generate_propositions(chunk_text:str) -> List[str]:
    """
    Break a text chunk into atomic, factual, self-contained propositions.
    
    Replaces: ChatGroq + with_structured_output(GeneratePropositions) + FewShotChatMessagePromptTemplate
    Uses: OpenAI JSON mode with few-shot examples inline
    
    Returns:
        List of proposition strings
    """

    messages = [
        {
            "role": "system",
            "content": (
                "Please break down the following text into simple, self-contained propositions. "
                "Ensure that each proposition meets the following criteria:\n\n"
                "1. Express a Single Fact: Each proposition should state one specific fact or claim.\n"
                "2. Be Understandable Without Context: The proposition should be self-contained.\n"
                "3. Use Full Names, Not Pronouns: Avoid pronouns or ambiguous references; use full entity names.\n"
                "4. Include Relevant Dates/Qualifiers: If applicable, include necessary dates, times, and qualifiers.\n"
                "5. Contain One Subject-Predicate Relationship: Focus on a single subject and its action/attribute.\n\n"
                "Respond with JSON: {\"propositions\": [\"prop1\", \"prop2\", ...]}"
            )
        },
        {
            "role": "user",
            "content": "In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission."
        },
        {
            "role": "assistant",
            "content": json.dumps({"propositions": [
                "Neil Armstrong was an astronaut.",
                "Neil Armstrong walked on the Moon in 1969.",
                "Neil Armstrong was the first person to walk on the Moon.",
                "Neil Armstrong walked on the Moon during the Apollo 11 mission.",
                "The Apollo 11 mission occurred in 1969."
            ]})
        },
        {
            "role": "user",
            "content": chunk_text
        }
    ]
    
    result = llm.chat_json(messages)
    return result.get("propositions", [])

In [19]:
propositions = []

for doc in doc_splits:
    chunk_id = doc.metadata["chunk_id"]
    props = generate_propositions(doc.content)
    
    print(f"\n Chunk {chunk_id}: generated {len(props)} Propositions")
    
    for prop in props:
        propositions.append(Document(
            content=prop,
            metadata={
                "title": doc.metadata["title"],
                "source": doc.metadata["source"],
                "chunk_id": chunk_id
            }
        ))
    
print(f"\n Total Propositions: {len(propositions)}")

print("\nSample Propositions:")
for p in propositions[:5]:
    print(f"\n [{p.metadata['chunk_id']}]: {p.content}")


 Chunk 1: generated 8 Propositions

 Chunk 2: generated 11 Propositions

 Chunk 3: generated 12 Propositions

 Chunk 4: generated 10 Propositions

 Chunk 5: generated 9 Propositions

 Total Propositions: 50

Sample Propositions:

 [1]: Paul Graham is an author.

 [1]: Paul Graham published the essay 'Founder Mode' in September 2024.

 [1]: The essay 'Founder Mode' challenges conventional wisdom about scaling startups.

 [1]: The essay 'Founder Mode' argues that founders should maintain their unique management style.

 [1]: The essay 'Founder Mode' suggests that founders should not adopt traditional corporate practices as their companies grow.


In [42]:
propositions

[Document(content='Paul Graham is an author.', metadata={'title': "Paul Graham's Founder Mode Essay", 'source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, embedding=None),
 Document(content="Paul Graham published the essay 'Founder Mode' in September 2024.", metadata={'title': "Paul Graham's Founder Mode Essay", 'source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, embedding=[-0.018395042046904564, 0.03741077706217766, 0.010932636447250843, -0.007899711839854717, 0.016194408759474754, 0.0013057447504252195, -0.0019149747677147388, 0.06923533231019974, -0.0054169450886547565, 0.004849153105169535, 0.04542334005236626, -0.020891916006803513, 0.0483575202524662, -0.02388252131640911, -0.0054874783381819725, 0.0027384490240365267, -0.03927285224199295, -0.04502835497260094, 0.005674391053617001, 0.023134870454669, -0.0301881842315197, 0.018987521529197693, -0.02530729025

# Quality Check

In [20]:
def evaluate_propositions(proposition:str, original_text: str) -> Dict[str,int]:
    """
    Grade a proposition on accuracy, clarity, completeness, and conciseness (1-10 each).
    
    Replaces: ChatGroq + with_structured_output(GradePropositions)
    
    Returns:
        Dict with scores: {"accuracy": int, "clarity": int, "completeness": int, "conciseness": int}
    """

    messages = [
        {
            "role": "system",
            "content": (
                "You evaluate propositions extracted from documents. Rate each on a 1-10 scale:\n"
                "- accuracy: How well the proposition reflects the original text\n"
                "- clarity: How easy it is to understand without additional context\n"
                "- completeness: Whether it includes necessary details (dates, qualifiers)\n"
                "- conciseness: Whether it is concise without losing important information\n\n"
                "Example:\n"
                'Docs: "In 1969, Neil Armstrong became the first person to walk on the Moon during Apollo 11."\n'
                'Proposition: "Neil Armstrong walked on the Moon in 1969."\n'
                'Evaluation: {"accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10}\n\n'
                'Respond with JSON: {"accuracy": N, "clarity": N, "completeness": N, "conciseness": N}'
            )
        },
        {
            "role": "user",
            "content": f'Proposition: "{proposition}"\nOriginal Text: "{original_text}"'
        }
    ]
    
    return llm.chat_json(messages)

In [21]:
def passes_quality_check(scores:Dict[str, int], threshold:int=7) -> bool:
    """Check if all scores meet the threshold."""
    for category in ["accuracy", "clarity", "completeness", "conciseness"]:
        if scores.get(category,0) < threshold:
            return False
    return True

In [24]:
evaluated_propositions = []

for idx, prop_doc in enumerate(propositions):

    chunk_id = prop_doc.metadata["chunk_id"]
    original_text = doc_splits[chunk_id - 1].content

    scores = evaluate_propositions(prop_doc.content, original_text)

    if passes_quality_check(scores):
        evaluated_propositions.append(prop_doc)
    else:
        print(f"   X [{idx+1}] FAIL - {prop_doc.content}")
        print(f"   Scores: {scores}")
        
print(f"\n✓ {len(evaluated_propositions)}/{len(propositions)} propositions passed quality check")

   X [1] FAIL - Paul Graham is an author.
   Scores: {'accuracy': 5, 'clarity': 8, 'completeness': 3, 'conciseness': 9}
   X [14] FAIL - Founder Mode is an emerging paradigm.
   Scores: {'accuracy': 8, 'clarity': 7, 'completeness': 6, 'conciseness': 9}
   X [24] FAIL - Founder Mode is an emerging paradigm.
   Scores: {'accuracy': 8, 'clarity': 7, 'completeness': 6, 'conciseness': 9}
   X [42] FAIL - Brian Chesky is the co-founder of Airbnb.
   Scores: {'accuracy': 1, 'clarity': 5, 'completeness': 1, 'conciseness': 5}
   X [43] FAIL - Brian Chesky implemented a practice called 'Founder Mode' at Airbnb.
   Scores: {'accuracy': 5, 'clarity': 6, 'completeness': 4, 'conciseness': 7}

✓ 45/50 propositions passed quality check


# Build Vector Stores

In [28]:
embedder = OpenAIEmbedder(model="text-embedding-3-small")


def build_vector_store(documents: List[Document]) -> FAISSVectorStore:
    """Embed documents and build a FAISS vector store."""
    docs_with_embeddings = embedder.embed_documents(documents=documents)
    store = FAISSVectorStore(dimension=embedder.dimension)
    store.add_documents(docs_with_embeddings)
    return store

In [29]:
def search(store: FAISSVectorStore, query:str, k:int=4):
    """search a vector store and return results."""
    query_embedding = embedder.embed_text(query)
    return store.search(query_embedding, k = k)


In [30]:
vs_propositions = build_vector_store(evaluated_propositions)
print(f"Proposition Store: {vs_propositions.index.ntotal} Documents.")

vs_larger = build_vector_store(doc_splits)
print(f"Larger Chunk Store: {vs_larger.index.ntotal} Documents.")

Proposition Store: 45 Documents.
Larger Chunk Store: 5 Documents.


# Retrieval & Comparison

In [31]:
def compare_retrieval(query: str, k:int=4):
    """
    Run the same query against both vector stores and print side-by-side results.
    """
    print(f"\n{'='*80}")
    print(f"Query: {query}")
    print(f"{'='*80}")

    # Proposition retrieval
    print(f"\n--- Proposition-Based Retrieval ---")
    res_prop = search(vs_propositions, query, k=k)
    for i, r in enumerate(res_prop):
        print(f"  {i+1}) [Score: {r.score:.4f}] {r.document.content} — Chunk {r.document.metadata['chunk_id']}")
    
    # Larger chunk retrieval
    print(f"\n--- Larger Chunk Retrieval ---")
    res_larger = search(vs_larger, query, k=min(k, len(doc_splits)))
    for i, r in enumerate(res_larger):
        content_preview = r.document.content[:150].replace('\n', ' ')
        print(f"  {i+1}) [Score: {r.score:.4f}] {content_preview}... — Chunk {r.document.metadata['chunk_id']}")

In [32]:
compare_retrieval("Who's management approach served as inspiration for Brian Chesky's \"Founder Mode\" at Airbnb?")


Query: Who's management approach served as inspiration for Brian Chesky's "Founder Mode" at Airbnb?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7708] Brian Chesky was advised to run Airbnb in a traditional managerial style. — Chunk 4
  2) [Score: 0.7485] Brian Chesky found success by adopting a different management approach. — Chunk 4
  3) [Score: 0.7286] Brian Chesky's management approach was influenced by Steve Jobs' management style at Apple. — Chunk 4
  4) [Score: 0.7225] Steve Jobs had a management approach at Apple that inspired Brian Chesky. — Chunk 4

--- Larger Chunk Retrieval ---
  1) [Score: 0.7886] structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile... — Chunk 4
  2) [Score: 0.7542] piration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regard... — Chunk 5
  3) [Score: 0.5351] ger

In [33]:
compare_retrieval('What is the essay "Founder Mode" about?')


Query: What is the essay "Founder Mode" about?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7955] The essay 'Founder Mode' argues that founders should maintain their unique management style. — Chunk 1
  2) [Score: 0.7837] The essay 'Founder Mode' challenges conventional wisdom about scaling startups. — Chunk 1
  3) [Score: 0.7625] The essay 'Founder Mode' suggests that founders should not adopt traditional corporate practices as their companies grow. — Chunk 1
  4) [Score: 0.7469] The essay 'Founder Mode' claims that traditional advice often fails when applied to startups. — Chunk 1

--- Larger Chunk Retrieval ---
  1) [Score: 0.6920] ies grow. Conventional Wisdom vs. Founder Mode The essay argues that the traditional advice given to growing companies—hiring good people and giving t... — Chunk 2
  2) [Score: 0.6894] Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should m... — Chunk 1


In [34]:
compare_retrieval("Who is the co-founder of Airbnb?")


Query: Who is the co-founder of Airbnb?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7090] Brian Chesky is the co-founder of Airbnb. — Chunk 4
  2) [Score: 0.5118] Brian Chesky was advised to run Airbnb in a traditional managerial style. — Chunk 4
  3) [Score: 0.4611] The traditional managerial style led to poor outcomes for Airbnb. — Chunk 4
  4) [Score: 0.3719] Brian Chesky found success by adopting a different management approach. — Chunk 4

--- Larger Chunk Retrieval ---
  1) [Score: 0.4724] structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile... — Chunk 4
  2) [Score: 0.4555] piration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regard... — Chunk 5
  3) [Score: 0.2309] gers. Unique Founder Abilities Founders possess unique insights and abilities that professional managers do not, primaril

In [35]:
compare_retrieval('When was the essay "Founder Mode" published?')


Query: When was the essay "Founder Mode" published?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7796] Paul Graham published the essay 'Founder Mode' in September 2024. — Chunk 1
  2) [Score: 0.7260] The essay 'Founder Mode' challenges conventional wisdom about scaling startups. — Chunk 1
  3) [Score: 0.6997] The essay 'Founder Mode' argues that founders should maintain their unique management style. — Chunk 1
  4) [Score: 0.6866] The essay 'Founder Mode' suggests that founders should not adopt traditional corporate practices as their companies grow. — Chunk 1

--- Larger Chunk Retrieval ---
  1) [Score: 0.6448] Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should m... — Chunk 1
  2) [Score: 0.6000] ies grow. Conventional Wisdom vs. Founder Mode The essay argues that the traditional advice given to growing companies—hiring good people and giving t... — Chunk 2
  3) [Score: 0.4989] pi

## 8. Comparison Summary

| **Aspect** | **Proposition-Based Retrieval** | **Larger Chunk Retrieval** |
|---|---|---|
| **Precision** | High: focused, direct answers | Medium: more context, may include noise |
| **Clarity** | High: concise, no extra details | Medium: comprehensive but can overwhelm |
| **Context** | Low: may lack surrounding context | High: preserves narrative flow |
| **Comprehensiveness** | Low: may omit broader details | High: complete view |
| **Best for** | Quick factual queries | Complex queries needing depth |
| **Efficiency** | High: fast, targeted | Medium: more content to process |

In [36]:
def compare_retrieval_with_llm_response(query: str, k:int=4):
    """
    Run the same query against both vector stores and print side-by-side results.
    """
    print(f"\n{'='*80}")
    print(f"Query: {query}")
    print(f"{'='*80}")

    # Proposition retrieval
    print(f"\n--- Proposition-Based Retrieval ---")
    
    res_prop = search(vs_propositions, query, k=k)
    curr_context = []

    for i, r in enumerate(res_prop):
        print(f"  {i+1}) [Score: {r.score:.4f}] {r.document.content} — Chunk {r.document.metadata['chunk_id']}")
        curr_context.append(r.document.content)

    answer = llm.chat_with_context(query=query, context=curr_context)
    print(f"\nAnswer: {answer}")

    # Larger chunk retrieval
    print(f"\n--- Larger Chunk Retrieval ---")
    
    res_larger = search(vs_larger, query, k=min(k, len(doc_splits)))
    curr_context = []
    
    for i, r in enumerate(res_larger):
        content_preview = r.document.content[:150].replace('\n', ' ')
        print(f"  {i+1}) [Score: {r.score:.4f}] {content_preview}... — Chunk {r.document.metadata['chunk_id']}")
        curr_context.append(r.document.content)

    answer = llm.chat_with_context(query=query, context=curr_context)
    print(f"\nAnswer: {answer}")

In [38]:
compare_retrieval_with_llm_response("Who's management approach served as inspiration for Brian Chesky's \"Founder Mode\" at Airbnb?")


Query: Who's management approach served as inspiration for Brian Chesky's "Founder Mode" at Airbnb?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7708] Brian Chesky was advised to run Airbnb in a traditional managerial style. — Chunk 4
  2) [Score: 0.7485] Brian Chesky found success by adopting a different management approach. — Chunk 4
  3) [Score: 0.7286] Brian Chesky's management approach was influenced by Steve Jobs' management style at Apple. — Chunk 4
  4) [Score: 0.7225] Steve Jobs had a management approach at Apple that inspired Brian Chesky. — Chunk 4

Answer: Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb.

--- Larger Chunk Retrieval ---
  1) [Score: 0.7886] structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile... — Chunk 4
  2) [Score: 0.7542] piration for Brian Chesky's "Founder Mode" at Airbnb. One notable pract

In [39]:
compare_retrieval_with_llm_response('What is the essay "Founder Mode" about?')


Query: What is the essay "Founder Mode" about?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7955] The essay 'Founder Mode' argues that founders should maintain their unique management style. — Chunk 1
  2) [Score: 0.7837] The essay 'Founder Mode' challenges conventional wisdom about scaling startups. — Chunk 1
  3) [Score: 0.7625] The essay 'Founder Mode' suggests that founders should not adopt traditional corporate practices as their companies grow. — Chunk 1
  4) [Score: 0.7469] The essay 'Founder Mode' claims that traditional advice often fails when applied to startups. — Chunk 1

Answer: The essay "Founder Mode" argues that founders should maintain their unique management style and challenges conventional wisdom about scaling startups. It suggests that founders should not adopt traditional corporate practices as their companies grow, as traditional advice often fails when applied to startups.

--- Larger Chunk Retrieval ---
  1) [Score: 0.6920] ies grow. Conventional Wisdom

In [40]:
compare_retrieval_with_llm_response("Who is the co-founder of Airbnb?")


Query: Who is the co-founder of Airbnb?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7090] Brian Chesky is the co-founder of Airbnb. — Chunk 4
  2) [Score: 0.5118] Brian Chesky was advised to run Airbnb in a traditional managerial style. — Chunk 4
  3) [Score: 0.4611] The traditional managerial style led to poor outcomes for Airbnb. — Chunk 4
  4) [Score: 0.3719] Brian Chesky found success by adopting a different management approach. — Chunk 4

Answer: Brian Chesky is the co-founder of Airbnb.

--- Larger Chunk Retrieval ---
  1) [Score: 0.4724] structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile... — Chunk 4
  2) [Score: 0.4555] piration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regard... — Chunk 5
  3) [Score: 0.2309] gers. Unique Founder Abilities Founders possess unique insights and ab

In [41]:
compare_retrieval_with_llm_response('When was the essay "Founder Mode" published?')


Query: When was the essay "Founder Mode" published?

--- Proposition-Based Retrieval ---
  1) [Score: 0.7796] Paul Graham published the essay 'Founder Mode' in September 2024. — Chunk 1
  2) [Score: 0.7260] The essay 'Founder Mode' challenges conventional wisdom about scaling startups. — Chunk 1
  3) [Score: 0.6997] The essay 'Founder Mode' argues that founders should maintain their unique management style. — Chunk 1
  4) [Score: 0.6866] The essay 'Founder Mode' suggests that founders should not adopt traditional corporate practices as their companies grow. — Chunk 1

Answer: The essay "Founder Mode" was published in September 2024.

--- Larger Chunk Retrieval ---
  1) [Score: 0.6448] Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should m... — Chunk 1
  2) [Score: 0.6000] ies grow. Conventional Wisdom vs. Founder Mode The essay argues that the traditional advice given to growing companies—h