# SummarizeChain
- 문서 집합에 대한 요약을 수행할 때 유용한 클래스
- 요약하는 chain을 구축할 때 문서를 어떻게 LLM에 넣을지에 대한 여러 방법을 지원
    1. Stuff: 단순히 모든 문서를 단일 프롬프트에 삽입하는 방식. 가장 단순함
    1. Map-Reduce: 각 문서를 개별적으로 요약한 다음, 문서의 요약본들을 최종본으로 요약하여 합치는 방식
    1. Refine: 

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

# Stuff
- `chain_type="stuff"`로 설정하며, `StuffDocumentsChain`을 사용하게 된다.
- `StuffDocumentsChain`은 사용자 정의 프롬프트를 사용하지 못하기 때문에 `load_summarize_chain`이 그런 면에서 더 좋다

In [2]:
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.chat_models import ChatOllama
from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")

docs = loader.load()

callbacks = [StreamingStdOutCallbackHandler()]
llm = ChatOllama(model="gemma:7b", temperature=0, streaming=True, callbacks=callbacks, max_new_tokens=2048)

chain = load_summarize_chain(llm, chain_type="stuff")

In [3]:
chain.input_schema.schema()

{'title': 'CombineDocumentsInput',
 'type': 'object',
 'properties': {'input_documents': {'title': 'Input Documents',
   'type': 'array',
   'items': {'$ref': '#/definitions/Document'}}},
 'definitions': {'Document': {'title': 'Document',
   'description': 'Class for storing a piece of text and associated metadata.',
   'type': 'object',
   'properties': {'page_content': {'title': 'Page Content', 'type': 'string'},
    'metadata': {'title': 'Metadata', 'type': 'object'},
    'type': {'title': 'Type',
     'default': 'Document',
     'enum': ['Document'],
     'type': 'string'}},
   'required': ['page_content']}}}

In [4]:
chain.invoke({"input_documents": docs})

## Summary of Conversation

The provided conversation revolves around the limitations of Large Language Models (LLMs) when used to create autonomous agents. The user poses specific challenges faced by LLMs in:

* **Context length:** Limited context window hinders the inclusion of relevant information and learning from past experiences.
* **Long-term planning:** Difficulty in adjusting plans when encountering unexpected errors.
* **Natural language interface:** Reliability issues with model outputs leading to parsing challenges.


## Key Takeaways

- LLMs

The provided text suggests that the conversation is incomplete and requires further context.

{'input_documents': [Document(page_content='\n\n\n\n\n\nLLM Powered Autonomous Agents | Lil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\n\n\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as Auto

# Map Reduce
- 먼저 각 문서를 요약하기 하고, `ReduceDocumentsChain`을 사용하여 요약들을 최종 요약으로 결합한다.
![map_reduce](./image/map_reduce.png)

In [9]:
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.llm import LLMChain
from langchain import hub

map_prompt = hub.pull("teddynote/map-prompt")
print(map_prompt)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

input_variables=['docs'] metadata={'lc_hub_owner': 'teddynote', 'lc_hub_repo': 'map-prompt', 'lc_hub_commit_hash': '5325a713fc858810667d1d1dde32ccc2e93433b8706831560e75360b4993e95f'} template='You are a helpful expert journalist in extracting the main themes from a GIVEN DOCUMENTS below.\nPlease provide a comprehensive summary of the GIVEN DOCUMENTS in numbered list format. \nThe summary should cover all the key points and main ideas presented in the original text, while also condensing the information into a concise and easy-to-understand format. \nPlease ensure that the summary includes relevant details and examples that support the main ideas, while avoiding any unnecessary information or repetition. \nThe length of the summary should be appropriate for the length and complexity of the original text, providing a clear and accurate overview without omitting any important information.\n\nGIVEN DOCUMENTS:\n{docs}\n\nFORMAT:\n1. main theme 1\n2. main theme 2\n3. main theme 3\n...\n\nCAU

In [11]:
reduce_prompt = hub.pull("teddynote/reduce-prompt-korean")
print(reduce_prompt)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

input_variables=['doc_summaries'] metadata={'lc_hub_owner': 'teddynote', 'lc_hub_repo': 'reduce-prompt-korean', 'lc_hub_commit_hash': '01613c7c2988c1e28d025507398b6c4aa4484e4450186e377b8e578bd22077ab'} template='You are a helpful expert in summary writing.\nYou are given numbered lists of summaries.\nExtract top 10 most important insights from the summaries.\nThen, write a summary of the insights in KOREAN.\n\nLIST OF SUMMARIES:\n{doc_summaries}\n\nHelpful Answer:\n'


In [12]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# 매핑된 문서들을 결합하고 반복적으로 축소
reduce_documents_chain = ReduceDocumentsChain(
    # 최종적으로 호출되는 체인입니다.
    combine_documents_chain=combine_documents_chain,
    # `StuffDocumentsChain`의 컨텍스트를 초과하는 문서들을 처리
    collapse_documents_chain=combine_documents_chain,
    # 문서들을 그룹화할 최대 토큰 수.
    token_max=4096,
)

In [15]:
# 문서들을 매핑하여 체인을 거친 후 결과를 결합하는 과정
map_reduce_chain = MapReduceDocumentsChain(
    # 매핑 체인
    llm_chain=map_chain,
    # 리듀스 체인
    reduce_documents_chain=reduce_documents_chain,
    # llm_chain에서 문서들을 넣을 변수 이름
    document_variable_name="docs",
    # 매핑 단계의 결과를 출력에 포함시킴
    return_intermediate_steps=False,
)

# 문자를 기준으로 텍스트를 분할하는 객체 생성
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""],
    length_function=len,
)

# 문서들을 분할
split_docs = text_splitter.split_documents(docs)
print(map_reduce_chain.input_schema.schema())
print(split_docs)

{'title': 'CombineDocumentsInput', 'type': 'object', 'properties': {'input_documents': {'title': 'Input Documents', 'type': 'array', 'items': {'$ref': '#/definitions/Document'}}}, 'definitions': {'Document': {'title': 'Document', 'description': 'Class for storing a piece of text and associated metadata.', 'type': 'object', 'properties': {'page_content': {'title': 'Page Content', 'type': 'string'}, 'metadata': {'title': 'Metadata', 'type': 'object'}, 'type': {'title': 'Type', 'default': 'Document', 'enum': ['Document'], 'type': 'string'}}, 'required': ['page_content']}}}
[Document(page_content="LLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian 

In [16]:
answer = map_reduce_chain.invoke({"input_documents": split_docs})

## Main Themes of "LLM Powered Autonomous Agents":

1. **Autonomous Agent Architecture:**
    - The paper proposes an autonomous agent system powered by Large Language Models (LLMs) for scientific discovery and creative tasks.
    - The system consists of three components: planning, memory, and tool use.


2. **Planning and Task Decomposition:**
    - The planning component decomposes complex tasks into smaller steps.
    - This allows the agent to focus on completing specific actions efficiently.


3. **Memory and Representation:**
    - The memory component utilizes various data structures like Maximum Inner Product Search (MIPS) for efficient representation and retrieval of relevant information.
    - This enables the agent to learn from past experiences and apply them to future situations.


4. **Tool Use and Creativity:**
    - The paper showcases two case studies: Scientific Discovery Agent and Generative Agents Simulation.
    - These examples demonstrate the agent's ability to 

  from .autonotebook import tqdm as notebook_tqdm
Token indices sequence length is longer than the specified maximum sequence length for this model (10913 > 1024). Running this sequence through the model will result in indexing errors


## Summary of Key Concepts:

**Memory and Learning:**

* Short-term memory (STM) stores information for immediate tasks.
* Long-term memory (LTM) retains information for extended periods.
* Sensory memory captures and embeds sensory information.
* STM facilitates short-term learning, while LTM serves as an external vector store for long-term recall.


**Approximate Nearest Neighbors (ANN):**

* Efficiently retrieves information from large datasets.
* Uses hashing functions or tree-based structures to find similar data points.
* Approximate methods sacrifice accuracy for speed.


**Hierarchical Approximate Nearest Neighbors (HNSW):**

* Hierarchical structure speeds up search in high-dimensional spaces.
* Search algorithm progressively narrows the search space.
* Inspired by the “six degrees of separation” phenomenon.


**Clustering in High-Dimensional Space:**

* FAISS and ScaNN utilize quantization to cluster data points in high-dimensional spaces.
* Anisotropic quantization preserves

In [17]:
print(answer)

{'input_documents': [Document(page_content="LLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agent

# Refine
- 입력된 문서들을 순회하면서 답변을 지속적으로 업데이트하며 최종 답변을 생성

In [18]:
# llm을 사용하여 'refine' 유형의 요약 체인을 로드합니다.
chain = load_summarize_chain(llm, chain_type="refine")
# split_docs를 처리하기 위해 체인을 실행합니다.
chain.run(split_docs)

  warn_deprecated(


**LLM Powered Autonomous Agents**

This paper explores the development of autonomous agents powered by large language models (LLMs). These agents possess capabilities for planning, memory storage, and tool utilization.

**Key Highlights:**

- **Planning:** LLMs enable task decomposition and self-reflection, allowing agents to plan and execute actions efficiently.


- **Memory:** The paper discusses different memory types and utilizes Maximum Inner Product Search (MIPS) for efficient retrieval.


- **Tool Use:** LLMs empower agents to utilize tools in various scenarios.


- **Case Studies:** Examples of scientific discovery and generative agents are presented, showcasing the practical applications of these LLMs.


- **Challenges:** The paper outlines challenges associated with the development of autonomous agents, including ethical considerations and computational complexity.## LLM Powered Autonomous Agents

This paper explores the burgeoning field of autonomous agents powered by large 

"## Refined Summary:\n\nLarge language models (LLMs) empower autonomous agents by leveraging planning, memory utilization, and tool augmentation. By balancing immediate action planning with long-term goal optimization, LLMs tackle diverse tasks across scientific discovery and other fields. Effective utilization of external information and memory by agents is crucial for successful implementation.\n\n**Ethical considerations and limitations** must be carefully addressed to mitigate potential misuse. Concrete examples of successful long-term applications of LLMs in various environments would strengthen the paper's arguments.\n\nThe development of a Super Mario game in Python provides a valuable case study for exploring the practical applications of LLMs in interactive environments. This case study showcases the capabilities of LLMs for long-term goal optimization, memory utilization, and action selection in dynamic contexts.\n\n**Challenges remain in utilizing LLMs effectively:**\n\n* **

In [19]:
from langchain.prompts import PromptTemplate

prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in Italian"
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)

  warn_deprecated(


## LLM Powered Autonomous Agents Summary

This article explores the development of autonomous agents powered by large language models (LLMs). These agents can autonomously perform tasks like scientific discovery and creative simulations.

**Key features:**

* **Planning:** LLMs decompose complex tasks into manageable steps and self-reflect on their progress.
* **Memory:** LLMs utilize various memory types to store and retrieve relevant information efficiently.
* **Tool Use:** LLMs can leverage external tools and resources to enhance their capabilities.

**Applications:**

* **Scientific Discovery:** LLMs can autonomously explore scientific concepts and generate hypotheses.
* **Generative Agents Simulation:** LLMs can simulate diverse scenarios and environments, fostering creativity and innovation.

**Challenges:**

* The article lacks specific details regarding the challenges associated with LLM powered autonomous agents.

**Overall:**

This article highlights the potential of LLMs to 

# AnalyzeDocumentChain
- 긴 문서를 분할하는 splitter와 하나의 요약으로 합치는 chain을 단일 chain으로 묶을 수 있다.

In [21]:
from langchain.chains import AnalyzeDocumentChain

# AnalyzeDocumentChain 인스턴스를 생성합니다. 이때, combine_docs_chain과 text_splitter를 인자로 전달합니다.
summarize_document_chain = AnalyzeDocumentChain(
    combine_docs_chain=chain, text_splitter=text_splitter
)
print(summarize_document_chain.input_schem.schema())
# 첫 번째 문서의 페이지 내용을 사용하여 문서 요약 프로세스를 실행합니다.
summarized_result = summarize_document_chain.invoke(
    {"input_document": docs[0].page_content}
)

## LLM Powered Autonomous Agents Summary

This article explores the development of autonomous agents powered by large language models (LLMs). These agents can autonomously perform tasks like scientific discovery and creative simulations.

**Key features:**

* **Planning:** LLMs decompose complex tasks into manageable steps and self-reflect on their progress.
* **Memory:** LLMs utilize various memory types to store and retrieve relevant information efficiently.
* **Tool Use:** LLMs can leverage external tools and resources to enhance their capabilities.

**Applications:**

* **Scientific Discovery:** LLMs can autonomously explore scientific concepts and generate hypotheses.
* **Generative Agents Simulation:** LLMs can simulate diverse scenarios and environments, fostering creativity and innovation.

**Challenges:**

* The article lacks specific details regarding the challenges associated with LLM powered autonomous agents.

**Overall:**

This article highlights the potential of LLMs to 