# Propositions Chunking

# 개요

이 코드는 **Proposition Chunking** 방식을 구현한 것으로, 입력 텍스트를 사실적이고 독립적이며 간결한 성질을 갖는 원자적 단위의 진술(proposition)로 분할한 후, 이러한 진술들을 벡터 스토어에 인코딩하여 나중에 검색할 수 있도록 구성하는 시스템입니다.

---

## 주요 구성 요소

1. **문서 분할(Document Chunking):** 문서를 분석할 수 있는 적절한 크기의 조각으로 나눕니다.
2. **진술 생성(Proposition Generation):** LLM을 사용하여 문서 조각을 사실적이고 독립적인 진술로 분해합니다.
3. **진술 품질 평가(Proposition Quality Check):** 생성된 진술을 정확성, 명확성, 완전성, 간결성 기준으로 평가합니다.
4. **임베딩 및 벡터 스토어(Embedding and Vector Store):** 진술과 큰 문서 조각을 벡터 스토어에 임베딩하여 효율적인 검색이 가능하게 합니다.
5. **검색 및 비교(Retrieval and Comparison):** 다양한 쿼리 크기로 검색 시스템을 테스트하고, 진술 기반 모델과 큰 조각 기반 모델의 결과를 비교합니다.

---

## 동기

Proposition Chunking 방법은 텍스트 문서를 간결하고 사실적인 진술로 분할하여, 보다 세밀하고 정확한 정보 검색을 가능하게 합니다. 진술 단위로 텍스트를 분할하면 특정 쿼리에 대한 더 세밀한 제어가 가능해져, 복잡하거나 방대한 텍스트로부터 지식을 추출할 때 특히 유용합니다. 작은 진술 조각과 큰 문서 조각을 비교함으로써, 세밀한 정보 검색의 효과성을 평가하는 데 목적이 있습니다.

---

## 방법 세부 사항

1. **환경 변수 로드:** 시스템이 필요한 리소스에 접근할 수 있도록 LLM 서비스의 API 키 등 환경 변수를 로드합니다.

2. **문서 분할(Document Chunking):**
   - `RecursiveCharacterTextSplitter`를 사용하여 입력 문서를 LLM이 처리할 수 있는 작은 크기로 분할합니다.

3. **진술 생성(Proposition Generation):**
   - `llama-3.1-70b-versatile` 모델과 같은 LLM을 사용하여 각 문서 조각으로부터 진술을 생성합니다.
   - 결과는 독립적이고 맥락 없이도 이해 가능한 사실적 진술 목록으로 구조화됩니다.

4. **품질 검사(Quality Check):**
   - 두 번째 LLM이 생성된 진술을 정확성, 명확성, 완전성, 간결성 기준으로 평가하여 품질이 일정 수준 이상인 진술만을 유지합니다.

5. **임베딩 프로세스(Embedding Propositions):**
   - 품질 검사를 통과한 진술을 `OllamaEmbeddings` 모델을 사용해 벡터 스토어에 임베딩합니다. 이를 통해 쿼리가 주어질 때 유사성에 기반한 진술 검색이 가능합니다.

6. **검색 및 비교(Retrieval and Comparison):**
   - 진술 기반 조각과 큰 문서 조각을 사용한 두 가지 검색 시스템을 구축하여 다양한 쿼리에 대해 성능을 비교합니다.

---

## 장점

- **세밀성:** 문서를 작은 사실적 진술로 분할하면 세밀한 검색이 가능하여 대규모 또는 복잡한 문서에서 정확한 답변을 쉽게 추출할 수 있습니다.
- **품질 보장:** 품질 점검 LLM을 통해 생성된 진술이 일정 수준의 정확성과 명확성을 갖추도록 하여, 검색된 정보의 신뢰성을 높입니다.
- **유연한 검색:** 진술 기반 검색과 큰 조각 기반 검색 간의 비교를 통해, 검색 결과의 세밀함과 문맥의 폭 사이의 트레이드오프를 평가할 수 있습니다.

---

## 구현

1. **진술 생성(Proposition Generation):** LLM을 사용자 정의 프롬프트와 함께 사용하여 문서 조각에서 사실적 진술을 생성합니다.
2. **품질 검사(Quality Checking):** 생성된 진술을 정확성, 명확성, 완전성, 간결성을 기준으로 평가합니다.
3. **벡터 스토어 통합(Vector Store Integration):** FAISS 벡터 스토어에 진술을 저장하여, 유사성에 기반한 빠르고 효율적인 검색이 가능합니다.
4. **쿼리 테스트(Query Testing):** 다양한 테스트 쿼리를 벡터 스토어(진술 기반 및 큰 조각 기반)에 입력하여 검색 성능을 비교합니다.

---

## 요약

이 코드는 LLM을 이용해 문서를 독립적인 진술로 분할하고 품질 검사를 통해 신뢰성을 확보한 뒤 벡터 스토어에 임베딩하여 검색 시 가장 관련성 높은 정보를 제공하는 방법을 제시합니다. 진술 기반 검색과 큰 문서 조각 기반 검색을 비교하여, 다양한 쿼리 유형에 대해 어떤 방식이 더 정확하거나 유용한 결과를 제공하는지 평가할 수 있습니다. 이 접근 방식은 고품질의 진술 생성과 복잡한 문서로부터의 정확한 정보 추출의 중요성을 강조합니다.


In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY') 

### Test Document

In [3]:
sample_content = """Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's vision and culture.
Graham suggests that founders should leverage these strengths rather than conform to traditional managerial practices. "Founder Mode" is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.
Challenges of Scaling Startups
As startups grow, there is a common belief that they must transition to a more structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile spirit that drove the startup's initial success.
Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale.
"""

### Chunking

In [26]:
### Build Index
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

embedding_model = OllamaEmbeddings(model='nomic-embed-text:v1.5', show_progress=True)

# docs
docs_list = [Document(page_content=sample_content, metadata={"Title": "Paul Graham's Founder Mode Essay", "Source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ"})]

# 200 청크 사이즈로 나눔 중복은 50 
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, chunk_overlap=50
)

doc_splits = text_splitter.split_documents(docs_list)

In [10]:
docs_list

[Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ'}, page_content='Paul Graham\'s essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.\nConventional Wisdom vs. Founder Mode\nThe essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.\nThis approach, suitable for established companies, can be detrimental to startups where the founder\'s vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.\nUniq

In [7]:
for i, doc in enumerate(doc_splits):
    doc.metadata['chunk_id'] = i+1 
# 청크 아이디 생성 
# 메타데이터에 추가

### Generate Propositions
- few-shot을 통해서 문서에 대한 사실적 정보를 아래와 같은 조건으로 generate할 수 있도록 llm에게 명령 
1. 하나의 사실, 문맥 없이도 이해할 수 있도록, 대명사 쓰지 않고 fullname
2. 해당되는 경우 사실을 정확하게 파악할 수 있도록 필요한 날짜, 시간 및 한정어를 포함, 접속사나 여러 절 없이 하나의 주어와 그에 해당하는 행동 또는 속성에 집중


### Function to evaluate metrics for each chunk size

In [8]:
from typing import List
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_groq import ChatGroq

# Data model
class GeneratePropositions(BaseModel):
    """List of all the propositions in a given document"""

    propositions: List[str] = Field(
        description="List of propositions (factual, self-contained, and concise information)"
        # 사실적이고 ,완벽하며 정확한 벙보만 리스트에 넣는다. 
    )


# LLM with function call
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0)
structured_llm= llm.with_structured_output(GeneratePropositions)

# Few shot prompting --- We can add more examples to make it good => 예시 들어주기
proposition_examples = [
    {"document": 
        "In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission.", 
     "propositions": 
        "['Neil Armstrong was an astronaut.', 'Neil Armstrong walked on the Moon in 1969.', 'Neil Armstrong was the first person to walk on the Moon.', 'Neil Armstrong walked on the Moon during the Apollo 11 mission.', 'The Apollo 11 mission occurred in 1969.']"
    },
]

example_proposition_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{document}"), # 인간이 문서를 주었을 때 
        ("ai", "{propositions}"), # ai가 사실적 문장들을 뽑아냄 
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt = example_proposition_prompt, # 인간과 ai에 넣을 형식 
    examples = proposition_examples, # few shot 예
)

# Prompt
# 하나의 사실, 문맥 없이도 이해할 수 있도록, 대명사 쓰지 않고 fullname, 해당되는 경우 사실을 정확하게 파악할 수 있도록 필요한 날짜, 시간 및 한정어를 포함, 접속사나 여러 절 없이 하나의 주어와 그에 해당하는 행동 또는 속성에 집중
system = """Please break down the following text into simple, self-contained propositions. Ensure that each proposition meets the following criteria:

    1. Express a Single Fact: Each proposition should state one specific fact or claim.
    2. Be Understandable Without Context: The proposition should be self-contained, meaning it can be understood without needing additional context.
    3. Use Full Names, Not Pronouns: Avoid pronouns or ambiguous references; use full entity names.
    4. Include Relevant Dates/Qualifiers: If applicable, include necessary dates, times, and qualifiers to make the fact precise.
    5. Contain One Subject-Predicate Relationship: Focus on a single subject and its corresponding action or attribute, without conjunctions or multiple clauses."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        few_shot_prompt,
        ("human", "{document}"),
    ]
)

proposition_generator = prompt | structured_llm


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


In [9]:
propositions = [] # Store all the propositions from the document

for i in range(len(doc_splits)):
    response = proposition_generator.invoke({"document": doc_splits[i].page_content}) # Creating proposition
    for proposition in response.propositions: # 하나의 doc마다 여러개의 사실이 만들어짐 
        propositions.append(Document(page_content=proposition, metadata={"Title": "Paul Graham's Founder Mode Essay", "Source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ", "chunk_id": i+1}))

In [16]:
len(propositions) # 34개의 사실이 만들어짐 
propositions[0]

Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, page_content="Paul Graham published an essay called 'Founder Mode' in September 2024.")

In [19]:
response = proposition_generator.invoke({"document": doc_splits[0].page_content})
response.propositions

["Paul Graham published an essay called 'Founder Mode' in September 2024.",
 "Paul Graham's essay 'Founder Mode' challenges conventional wisdom about scaling startups.",
 'Founders should maintain their unique management style as their companies grow.',
 'Traditional corporate practices can be detrimental to startups.',
 'Hiring good people and giving them autonomy often fails when applied to startups.',
 "The founder's vision and direct involvement are crucial in startups.",
 'Founder Mode is an emerging paradigm.',
 'Founder Mode is not yet fully understood or documented.',
 "Founder Mode contrasts with the conventional 'manager mode'.",
 'Founders possess unique insights and abilities.',
 'Professional managers do not possess the same insights and abilities as founders.',
 "Founders have a deep understanding of their company's vision and culture."]

### Quality Check

In [20]:
# llm이 accuracy,clarity, completeness, conciseness 를 평가한다. 
class GradePropositions(BaseModel):
    """Grade a given proposition on accuracy, clarity, completeness, and conciseness"""

    accuracy: int = Field(
        description="Rate from 1-10 based on how well the proposition reflects the original text."
    ) # 원본 텍스트를 얼마나 잘 반영하였는지 평가 
    
    clarity: int = Field(
        description="Rate from 1-10 based on how easy it is to understand the proposition without additional context."
    ) # 추가적인 문맥 없이 얼마나 잘 이해할 수 있는지 

    completeness: int = Field(
        description="Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers)."
    ) # 필수적인 디테일을 포함했는지 안 했는지 

    conciseness: int = Field(
        description="Rate from 1-10 based on whether the proposition is concise without losing important information."
    ) # 중요한 정보를 얼마나 잃지 않았는지 

# LLM with function call
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0)
structured_llm= llm.with_structured_output(GradePropositions)

# Prompt 
# => 어떤 것을 평가해야 할지 설명 
# => few shot을 통해 어떻게 평가하는지 보여줌 
evaluation_prompt_template = """
Please evaluate the following proposition based on the criteria below:
- **Accuracy**: Rate from 1-10 based on how well the proposition reflects the original text.
- **Clarity**: Rate from 1-10 based on how easy it is to understand the proposition without additional context.
- **Completeness**: Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers).
- **Conciseness**: Rate from 1-10 based on whether the proposition is concise without losing important information.

Example:
Docs: In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission.

Propositons_1: Neil Armstrong was an astronaut.
Evaluation_1: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_2: Neil Armstrong walked on the Moon in 1969.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_3: Neil Armstrong was the first person to walk on the Moon.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_4: Neil Armstrong walked on the Moon during the Apollo 11 mission.
Evaluation_4: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_5: The Apollo 11 mission occurred in 1969.
Evaluation_5: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Format:
Proposition: "{proposition}"
Original Text: "{original_text}"
"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", evaluation_prompt_template),
        ("human", "{proposition}, {original_text}"),
    ]
)

proposition_evaluator = prompt | structured_llm

In [21]:
# Define evaluation categories and thresholds
evaluation_categories = ["accuracy", "clarity", "completeness", "conciseness"]
thresholds = {"accuracy": 7, "clarity": 7, "completeness": 7, "conciseness": 7}

# proposition 평가하는 함수 
def evaluate_proposition(proposition, original_text):
    # 평가한 뒤 응답 
    response = proposition_evaluator.invoke({"proposition": proposition, "original_text": original_text})
    
    # response를 점수 추출을 위해 딕셔너리형태로 넘김 
    scores = {"accuracy": response.accuracy, "clarity": response.clarity, "completeness": response.completeness, "conciseness": response.conciseness}  # Implement function to extract scores from the LLM response
    return scores

# 7을 넘겼다면 True, 못넘기면 False 
def passes_quality_check(scores):
    for category, score in scores.items():
        if score < thresholds[category]:
            return False
    return True

evaluated_propositions = [] # Store all the propositions from the document

# 7 넘긴 것들은 리스트에 넣고 넘기지 못한것들으 출력 
for idx, proposition in enumerate(propositions):
    scores = evaluate_proposition(proposition.page_content, doc_splits[proposition.metadata['chunk_id'] - 1].page_content)
    if passes_quality_check(scores):
        # Proposition passes quality check, keep it
        evaluated_propositions.append(proposition)
    else:
        # Proposition fails, discard or flag for further review
        print(f"{idx+1}) Propostion: {proposition.page_content} \n Scores: {scores}")
        print("Fail")

19) Propostion: Startups often transition to a more structured managerial approach as they grow. 
 Scores: {'accuracy': 8, 'clarity': 9, 'completeness': 6, 'conciseness': 8}
Fail


### Embedding propositions in a vectorstore
- 임계값 넘긴 것들만을 활용해서 임베딩하여 벡터 db에 넣는다. 

In [28]:
from langchain.embeddings.openai import OpenAIEmbeddings

# OpenAI 임베딩 모델 사용
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY') # For LLM
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")


  embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")


In [29]:
# Add to vectorstore

vectorstore_propositions = FAISS.from_documents(evaluated_propositions, embedding_model)
retriever_propositions = vectorstore_propositions.as_retriever(
                search_type="similarity",
                search_kwargs={'k': 4}, # number of documents to retrieve
            )

In [30]:
query = "Who's management approach served as inspiartion for Brian Chesky's \"Founder Mode\" at Airbnb?"
res_proposition = retriever_propositions.invoke(query)

In [31]:
res_proposition # query에 가장 유사한 4개의 증명된 문서를 뽑아냄 
# 결과 모두 3번째 문서에서 파생되어 생성된 결과들이 뽑혔다. 

[Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 3}, page_content="Steve Jobs' management approach at Apple inspired Brian Chesky's approach at Airbnb."),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 3}, page_content='Brian Chesky is the co-founder of Airbnb.'),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 3}, page_content='Brian Chesky was advised to run Airbnb in a traditional managerial style.'),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 3}, page_content='Brian Chesky adopt

In [32]:
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Steve Jobs' management approach at Apple inspired Brian Chesky's approach at Airbnb. --- Chunk_id: 3
2) Content: Brian Chesky is the co-founder of Airbnb. --- Chunk_id: 3
3) Content: Brian Chesky was advised to run Airbnb in a traditional managerial style. --- Chunk_id: 3
4) Content: Brian Chesky adopted a different approach to running Airbnb. --- Chunk_id: 3


### Comparing performance with larger chunks size
- 세부적으로 나누지 않고 원초적인 문단 3개를 활용하여 큰 청크사이즈에서 반환할 경우 어떤 결과가 나올 것인지 비교

In [33]:
# Add to vectorstore_larger_
vectorstore_larger = FAISS.from_documents(doc_splits, embedding_model)
retriever_larger = vectorstore_larger.as_retriever(
                search_type="similarity",
                search_kwargs={'k': 4}, # number of documents to retrieve
            )

In [36]:
res_larger = retriever_larger.invoke(query)
res_larger

[Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ'}, page_content='Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.\nSteve Jobs\' Management Style\nSteve Jobs\' management approach at Apple served as inspiration for Brian Chesky\'s "Founder Mode" at Airbnb. One notable practice was Jobs\' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart\n. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companie

In [37]:
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content}")
    

1) Content: Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale.
2) Content: Paul Graham's essay "Founder Mode,"

#### 결론 
- 너무 방대한 정보를 추출한다는 단점이 있다.

### Testing

#### Test - 1

In [15]:
test_query_1 = "what is the essay \"Founder Mode\" about?"
res_proposition = retriever_propositions.invoke(test_query_1)
res_larger = retriever_larger.invoke(test_query_1)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  6.29it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.06it/s]


In [16]:
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Founder Mode is an emerging paradigm that is not yet fully understood or documented. --- Chunk_id: 2
2) Content: Founder Mode is not yet fully understood or documented. --- Chunk_id: 1
3) Content: Founder Mode is an emerging paradigm. --- Chunk_id: 1
4) Content: Paul Graham's essay 'Founder Mode' challenges conventional wisdom about scaling startups. --- Chunk_id: 1


In [17]:
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's v

#### Test - 2

In [18]:
test_query_2 = "who is the co-founder of Airbnb?"
res_proposition = retriever_propositions.invoke(test_query_2)
res_larger = retriever_larger.invoke(test_query_2)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  3.22it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 15.18it/s]


In [19]:
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Brian Chesky is a co-founder of Airbnb. --- Chunk_id: 3
2) Content: Brian Chesky adopted a different approach to running Airbnb. --- Chunk_id: 3
3) Content: Brian Chesky was advised to run Airbnb in a traditional managerial style. --- Chunk_id: 3
4) Content: Running Airbnb in a traditional managerial style led to poor outcomes. --- Chunk_id: 3


In [20]:
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale. --- Chunk_id: 3
2) Content: Paul Graham's essay

#### Test - 3

In [21]:
test_query_3 = "when was the essay \"founder mode\" published?"
res_proposition = retriever_propositions.invoke(test_query_3)
res_larger = retriever_larger.invoke(test_query_3)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.09it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.71it/s]


In [22]:
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Paul Graham published an essay called 'Founder Mode' in September 2024. --- Chunk_id: 1
2) Content: Founder Mode is an emerging paradigm. --- Chunk_id: 1
3) Content: Founder Mode is an emerging paradigm that is not yet fully understood or documented. --- Chunk_id: 2
4) Content: Founder Mode is not yet fully understood or documented. --- Chunk_id: 1


In [23]:
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

1) Content: Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's v

### 비교

| **측면**                 | **진술 기반 검색**                                                       | **단순 청크 검색**                                                        |
|--------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------|
| **응답의 정확성**         | 높음: 집중적이고 직접적인 답변을 제공.                                   | 중간: 더 많은 문맥을 제공하지만 관련 없는 정보도 포함될 수 있음.           |
| **명확성과 간결함**       | 높음: 명확하고 간결하며 불필요한 세부 사항을 피함.                       | 중간: 더 포괄적이지만 과도한 정보로 인해 부담이 될 수 있음.                |
| **문맥의 풍부함**         | 낮음: 특정 진술에 집중하여 문맥이 부족할 수 있음.                        | 높음: 추가적인 문맥과 세부 사항을 제공.                                    |
| **포괄성**               | 낮음: 넓은 문맥이나 보충 정보를 생략할 수 있음.                           | 높음: 광범위한 정보와 포괄적인 내용을 제공.                                |
| **서사적 흐름**           | 중간: 단편적이거나 불연속적일 수 있음.                                   | 높음: 원본 문서의 논리적 흐름과 일관성을 유지.                             |
| **정보 과부하**           | 낮음: 과도한 정보로 사용자를 압도할 가능성이 적음.                       | 높음: 과도한 정보로 인해 사용자가 압도될 위험이 있음.                      |
| **적합한 사용 사례**      | 빠르고 사실적인 질의에 최적화.                                          | 깊이 있는 이해가 필요한 복잡한 질의에 적합.                                |
| **효율성**               | 높음: 빠르고 타겟화된 응답 제공.                                        | 중간: 추가적인 내용을 걸러내기 위한 더 많은 노력이 필요할 수 있음.         |
| **구체성**               | 높음: 정확하고 타겟화된 응답 제공.                                      | 중간: 넓은 문맥을 포함하여 덜 타겟화된 답변이 제공될 수 있음.              |
