# STORM

[STORM](https://arxiv.org/abs/2402.14207) is a research assistant by Shao, et. al that extends the idea of "outline-driven RAG" for richer article generation.

It is tasked with generating Wikipedia-like ariticles on a user-provided topic.  It has  a few main stages:

1. Survey related subjects
2. Identify perspectives
3. "Expert Interviews" (between the writer and an agent role-playing as a perspective)
4. Refine article 

The expert interviews stage ocurrs between the article writer and each role-playing agent and itself is a loop, where the "expert" is able to query external knowledge and respond to pointed questions.

Couple hyperparameters to restrict the infinite research breadth:

N: Number of perspectives to survey / use (2->3)
M: Max number of conversation turns in step (3)

The paper uses DSPY and few-shot examples to adapt but we'll just use functioncalling here.

In [None]:
# %pip install langchain_community langchain_openai langgraph wikipedia tavily-python scikit-learn

In [1]:
from langchain_openai import ChatOpenAI

fast_llm = ChatOpenAI(model="gpt-3.5-turbo")
long_context_llm = ChatOpenAI(model="gpt-4-turbo-preview")

In [2]:
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain_core.prompts import ChatPromptTemplate

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
        ),
        ("user", "{topic}"),
    ]
)


class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    description: str = Field(..., title="Content of the subsection")

    @property
    def as_str(self) -> str:
        return f"### {self.subsection_title}\n\n{self.description}".strip()


class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    description: str = Field(..., title="Content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            f"### {subsection.subsection_title}\n\n{subsection.description}"
            for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{self.description}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the Wikipedia page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the Wikipedia page.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


generate_outline_direct = direct_gen_outline_prompt | fast_llm.with_structured_output(
    Outline
)

  warn_beta(


In [3]:
example_topic = "Impact of million-plus token context window language models on RAG"

initial_outline = generate_outline_direct.invoke({"topic": example_topic})

print(initial_outline.as_str)

# Impact of million-plus token context window language models on RAG

## Introduction

Overview of million-plus token context window language models and the RAG (Retrieval-Augmented Generation) architecture.

## Benefits of Million-Plus Token Context Window Language Models

Discuss the advantages of using million-plus token context window language models in natural language processing tasks.

## Challenges of Million-Plus Token Context Window Language Models

Explore the limitations and obstacles associated with million-plus token context window language models.

## Integration of Million-Plus Token Models with RAG

Examine how million-plus token context window language models can be integrated with the RAG architecture for improved performance.

## Applications of RAG with Million-Plus Token Models

Highlight the potential applications and use cases of combining RAG with million-plus token context window language models.


## Expand Topics

While language models do store some Wikipedia-like knowledge in their parameters, you will get better results by incorporating relevant and recent information using a search engine.

We will start our search by generating a list of related topics.

In [4]:
gen_related_topics_prompt = ChatPromptTemplate.from_template(
    """I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
"""
)


class RelatedSubjects(BaseModel):
    topics: List[str] = Field(
        description="Comprehensive list of related subjects as background research.",
    )


expand_chain = gen_related_topics_prompt | fast_llm.with_structured_output(
    RelatedSubjects
)

In [5]:
related_subjects = await expand_chain.ainvoke({"topic": example_topic})
related_subjects

RelatedSubjects(topics=['million-plus token context window language models', 'Retriever-Reader-Generator (RAG) model', 'Impact of RAG on language understanding'])

In [6]:
class Editor(BaseModel):
    affiliation: str = Field(
        description="Primary affiliation of the editor.",
    )
    name: str = Field(
        description="Name of the editor.",
    )
    role: str = Field(
        description="Role of the editor in the context of the topic.",
    )
    description: str = Field(
        description="Description of the editor's focus, concerns, and motives.",
    )

    @property
    def persona(self) -> str:
        return f"Name: {self.name}\nRole: {self.role}\nAffiliation: {self.affiliation}\nDescription: {self.description}\n"


class Perspectives(BaseModel):
    editors: List[Editor] = Field(
        description="Comprehensive list of editors with their roles and affiliations.",
        # Add a pydantic validation/restriction to be at most M editors
    )


gen_perspectives_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
    You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

    Wiki page outlines of related topics for inspiration:
    {examples}""",
        ),
        ("user", "Topic of interest: {topic}"),
    ]
)

gen_perspectives_chain = gen_perspectives_prompt | ChatOpenAI(
    model="gpt-3.5-turbo"
).with_structured_output(Perspectives)

In [7]:
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.runnables import RunnableLambda, chain as as_runnable

wikipedia_retriever = WikipediaRetriever(load_all_available_meta=True, top_k_results=1)


def format_doc(doc, max_length=1000):
    related = "- ".join(doc.metadata["categories"])
    return f"### {doc.metadata['title']}\n\nSummary: {doc.page_content}\n\nRelated\n{related}"[
        :max_length
    ]


def format_docs(docs):
    return "\n\n".join(format_doc(doc) for doc in docs)


@as_runnable
async def survey_subjects(topic: str):
    related_subjects = await expand_chain.ainvoke({"topic": topic})
    retrieved_docs = await wikipedia_retriever.abatch(
        related_subjects.topics, return_exceptions=True
    )
    all_docs = []
    for docs in retrieved_docs:
        if isinstance(docs, BaseException):
            continue
        all_docs.extend(docs)
    formatted = format_docs(all_docs)
    return await gen_perspectives_chain.ainvoke({"examples": formatted, "topic": topic})

In [8]:
perspectives = await survey_subjects.ainvoke(example_topic)



  lis = BeautifulSoup(html).find_all('li')


In [9]:
perspectives.dict()

{'editors': [{'affiliation': 'Research Institution',
   'name': 'Dr. Researcher',
   'role': 'Researcher',
   'description': 'Dr. Researcher will focus on analyzing the impact of million-plus token context window language models on the RAG (Retrieval-Augmented Generation) framework, specifically looking at the efficiency, effectiveness, and potential challenges that arise from integrating such large language models into the RAG framework.'},
  {'affiliation': 'Tech Company',
   'name': 'AI Engineer',
   'role': 'AI Engineer',
   'description': 'AI Engineer will provide insights into the technical aspects of implementing million-plus token context window language models within the RAG framework. They will focus on the practical challenges, optimizations, and enhancements needed to leverage these models effectively in the RAG framework.'},
  {'affiliation': 'Academic Institution',
   'name': 'Prof. Linguist',
   'role': 'Linguist',
   'description': 'Prof. Linguist will examine the lingu

## Expert Dialog

Now the true fun begins, the wikipedia writer will "talk" with expert agents primed to role-play using the perspectives presented above.

In [10]:
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import AnyMessage
from typing import Annotated, Sequence


def add_messages(left, right):
    if not isinstance(left, list):
        left = [left]
    if not isinstance(right, list):
        right = [right]
    return left + right


def update_references(references, new_references):
    if not references:
        references = {}
    references.update(new_references)
    return references


def update_editor(editor, new_editor):
    # Can only set at the outset
    if not editor:
        return new_editor
    return editor


class InterviewState(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]
    references: Annotated[Optional[dict], update_references]
    editor: Annotated[Optional[Editor], update_editor]

In [11]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage


gen_qn_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)


def tag_with_name(ai_message: AIMessage, name: str):
    ai_message.name = name
    return ai_message


def swap_roles(state: InterviewState, name: str):
    converted = []
    for message in state["messages"]:
        if isinstance(message, AIMessage) and message.name != name:
            message = HumanMessage(**message.dict(exclude={"type"}))
        converted.append(message)
    return {"messages": converted}


@as_runnable
async def generate_question(state: InterviewState):
    editor = state["editor"]
    gn_chain = (
        RunnableLambda(swap_roles).bind(name=editor.name)
        | gen_qn_prompt.partial(persona=editor.persona)
        | fast_llm
        | RunnableLambda(tag_with_name).bind(name=editor.name)
    )
    result = await gn_chain.ainvoke(state)
    return {"messages": [result]}

In [12]:
messages = [
    HumanMessage(f"So you said you were writing an article on {example_topic}?")
]
question = await generate_question.ainvoke(
    {
        "editor": perspectives.editors[0],
        "messages": messages,
    }
)

question["messages"][0].content

"Yes, that's correct. I am researching the impact of million-plus token context window language models on the RAG (Retrieval-Augmented Generation) framework. I am particularly interested in understanding how these large language models affect the efficiency, effectiveness, and any potential challenges that may arise when integrating them into the RAG framework. Is there any specific aspect of this topic that you would like to know more about?"

In [13]:
class Queries(BaseModel):
    queries: List[str] = Field(
        description="Comprehensive list of search engine queries to answer the user's questions.",
    )


gen_queries_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful research assistant. Query the search engine to answer the user's questions.",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)
gen_queries_chain = gen_queries_prompt | ChatOpenAI(
    model="gpt-3.5-turbo"
).with_structured_output(Queries, include_raw=True)

In [14]:
queries = await gen_queries_chain.ainvoke(
    {"messages": [HumanMessage(content=question["messages"][0].content)]}
)
queries["parsed"].queries

['Impact of million-plus token context window language models on RAG framework efficiency',
 'Effectiveness of large language models in RAG framework',
 'Challenges of integrating million-plus token context window language models into RAG framework']

In [15]:
class AnswerWithCitations(BaseModel):
    answer: str = Field(
        description="Comprehensive answer to the user's question with citations.",
    )
    cited_urls: List[str] = Field(
        description="List of urls cited in the answer.",
    )

    @property
    def as_str(self) -> str:
        return f"{self.answer}\n\nCitations:\n\n" + "\n".join(
            f"[{i+1}]: {url}" for i, url in enumerate(self.cited_urls)
        )


gen_answer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert who can use information effectively. You are chatting with a Wikipedia writer who wants\
 to write a Wikipedia page on the topic you know. You have gathered the related information and will now use the information to form a response.

Make your response as informative as possible and make sure every sentence is supported by the gathered information.
Each response must be backed up by a citation from a reliable source, formatted as a footnote, reproducing the URLS after your response.""",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)

gen_answer_chain = gen_answer_prompt | fast_llm.with_structured_output(
    AnswerWithCitations, include_raw=True
)

#### Reference Store

The research process uncovers a large number of reference documents that we may want to query during the final article-writing process.
Here, we will createa multi-vector retriever and store all the searched documents inline.

In [30]:
from langchain_community.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper

search_engine = DuckDuckGoSearchAPIWrapper()
from langchain_core.tools import tool


# TODO: remove when i get my api limit bumped
@tool
async def search_engine(query: str):
    """Search engine to the internet."""
    results = DuckDuckGoSearchAPIWrapper()._ddgs_text(query)
    return [{"content": r["body"], "url": r["href"]} for r in results]

In [31]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.runnables import RunnableConfig
import json

# search_engine = TavilySearchResults(max_results=4)


async def gen_answer(
    state: InterviewState,
    config: RunnableConfig | None = None,
    name: str = "Subject Matter Expert",
    max_str_len: int = 15000,
):
    swapped_state = swap_roles(state, name)  # Convert all other AI messages
    queries = await gen_queries_chain.ainvoke(swapped_state)
    query_results = await search_engine.abatch(
        queries["parsed"].queries, config, return_exceptions=True
    )
    successful_results = [
        res for res in query_results if not isinstance(res, Exception)
    ]
    all_query_results = {
        res["url"]: res["content"] for results in successful_results for res in results
    }
    # We could be more precise about handling max token length if we wanted to here
    dumped = json.dumps(all_query_results)[:max_str_len]
    ai_message: AIMessage = queries["raw"]
    tool_call = queries["raw"].additional_kwargs["tool_calls"][0]
    tool_id = tool_call["id"]
    tool_message = ToolMessage(tool_call_id=tool_id, content=dumped)
    swapped_state["messages"].extend([ai_message, tool_message])
    # Only update the shared state with the final answer to avoid
    # polluting the dialogue history with intermediate messages
    generated = await gen_answer_chain.ainvoke(swapped_state)
    cited_urls = set(generated["parsed"].cited_urls)
    # Save the retrieved information to a the shared state for future reference
    cited_references = {k: v for k, v in all_query_results.items() if k in cited_urls}
    formatted_message = AIMessage(name=name, content=generated["parsed"].as_str)
    return {"messages": [formatted_message], "references": cited_references}

In [32]:
example_answer = await gen_answer(
    {"messages": [HumanMessage(content=question["messages"][0].content)]}
)
example_answer["messages"][-1].content

'Large language models with million-plus token context windows, such as Gemini 1.5, have generated discussions in the AI community regarding their impact on the Retrieval-Augmented Generation (RAG) framework. These models are believed to potentially have a negative effect on RAG^(1). The RAG framework typically involves components like Milvus as the vector database, LangChain as the orchestrator, and large language models like GTE-Large for text generation^(2). While large context windows are desirable in language models, the high fine-tuning costs, scarcity of long texts, and challenges like catastrophic values introduced by new token positions limit the current extended context windows to around 128k tokens^(3). Recent advancements, like LongRoPE, have extended the context window of pre-trained large language models significantly to 2048k tokens^(3). The integration of retrieval mechanisms with long context language models, such as GPT-3.5-Turbo-16k and Llama2-7B-chat-4k, has been ex

In [33]:
max_num_turns = 5


def route_messages(state: InterviewState, name: str = "Subject Matter Expert"):
    messages = state["messages"]
    num_responses = len(
        [m for m in messages if isinstance(m, AIMessage) and m.name == name]
    )
    if num_responses >= max_num_turns:
        return END
    last_question = messages[-2]
    if last_question.content.endswith("Thank you so much for your help!"):
        return END
    return "ask_question"


builder = StateGraph(InterviewState)

builder.add_node("ask_question", generate_question)
builder.add_node("answer_question", gen_answer)
builder.add_conditional_edges("answer_question", route_messages)
builder.add_edge("ask_question", "answer_question")

builder.set_entry_point("ask_question")
interview_graph = builder.compile().with_config(run_name="Conduct Interviews")

In [34]:
final_step = None

initial_state = {
    "editor": perspectives.editors[0],
    "messages": [
        AIMessage(
            content=f"So you said you were writing an article on {example_topic}?",
            name="Subject Matter Expert",
        )
    ],
}
async for step in interview_graph.astream(initial_state):
    name = next(iter(step))
    print(name)
    print("-- ", str(step[name]["messages"])[:300])
    if END in step:
        final_step = step

ask_question
--  [AIMessage(content="Yes, that's correct. I am focusing on analyzing the impact of million-plus token context window language models on the RAG (Retrieval-Augmented Generation) framework. This involves looking at how these large language models affect the efficiency, effectiveness, and potential chal
answer_question
--  [AIMessage(content='The impact of million-plus token context window language models on the RAG (Retrieval-Augmented Generation) framework has been a topic of discussion in the AI community. For example, the introduction of Gemini 1.5, with a 1 million token context window, has raised concerns about 
ask_question
--  [AIMessage(content='Thank you for providing such detailed information and relevant citations. Could you elaborate on the specific challenges researchers have encountered when integrating million-plus token context window language models into the RAG framework, and how they have attempted to address 
answer_question
--  [AIMessage(content="Int

In [35]:
final_state = next(iter(final_step.values()))

## Refine Outline

Now that we have all this cool stuff, let's distill it into a refined outline.

In [36]:
refine_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a Wikipedia writer. You have gathered information from experts and search engines. Now, you are refining the outline of the Wikipedia page.\
You need to make sure that the outline is comprehensive and specific.\
Topic you are writing about: {topic}\
Old outline: {old_outline}""",
        ),
        (
            "user",
            "Refine the outline based on your conversations with subject-matter experts:\n\nConversations:\n\n{conversations}\n\nWrite the refined Wikipedia outline:",
        ),
    ]
)

# Using turbo preview since the context can get quite long
refine_outline_chain = refine_outline_prompt | long_context_llm.with_structured_output(
    Outline
)

In [37]:
refined_outline = refine_outline_chain.invoke(
    {
        "topic": example_topic,
        "old_outline": initial_outline.as_str,
        "conversations": "\n\n".join(
            f"### {m.name}\n\n{m.content}" for m in final_state["messages"]
        ),
    }
)

In [38]:
print(refined_outline.as_str)

# Impact of Million-Plus Token Context Window Language Models on RAG

## Introduction

An overview of the development and significance of million-plus token context window language models and the concept of Retrieval-Augmented Generation (RAG).

## The Evolution of Large Context Windows in Language Models

A historical perspective on the growth of context window sizes in language models, including key milestones such as Gemini 1.5, Mixtral, GPT-3.5-Turbo-16k, Llama2-7B-chat-4k, and LongRoPE.

### Key Milestones

Discussion of significant advancements and models that have shaped the current landscape of large context window language models.

### Challenges Overcome

Examination of the technical and theoretical hurdles encountered in expanding the context window sizes of language models.

## Integration Challenges with RAG

Detailed analysis of the specific challenges faced when integrating million-plus token context window language models into the RAG framework, such as high fine-tuning

## Generate Article

Now it's time to generate the full article. We will divide-and-conquer, so that each section can be tackled by an individual llm.

In [39]:
class SubSection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    content: str = Field(
        ...,
        title="Full content of the subsection. Include [#] citations to the cited sources where relevant.",
    )

    @property
    def as_str(self) -> str:
        return f"### {self.subsection_title}\n\n{self.content}".strip()


class WikiSection(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    content: str = Field(..., title="Full content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )
    citations: List[str] = Field(default_factory=list)

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        citations = "\n".join([f" [{i}] {cit}" for i, cit in enumerate(self.citations)])
        return (
            f"## {self.section_title}\n\n{self.content}\n\n{subsections}".strip()
            + f"\n\n{citations}".strip()
        )

In [40]:
from langchain_core.documents import Document

from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
reference_docs = [
    Document(page_content=v, metadata={"source": k})
    for k, v in final_state["references"].items()
]
# This really doesn't need to be a vectorstore.
# could just be a numpy matrix
vectorstore = SKLearnVectorStore.from_documents(
    reference_docs,
    embedding=embeddings,
)
retriever = vectorstore.as_retriever(k=10)

In [41]:
retriever.invoke("What's a long context LLM anyway?")

[Document(page_content='Large Language Models (LLMs) have achieved remarkable success across various tasks. However, they often grapple with a limited context window size due to the high costs of fine-tuning, scarcity of lengthy texts, and the introduction of catastrophic values by new token positions. To address this issue, in a new paper LongRoPE: Extending LLM Context Window', metadata={'id': '20dbbce3-ae12-4a05-94e9-df4a97676098', 'source': 'https://syncedreview.com/2024/02/25/microsofts-longrope-breaks-the-limit-of-context-window-of-llms-extents-it-to-2-million-tokens/'}),
 Document(page_content='Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 20

In [50]:
section_writer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert Wikipedia writer. Complete your assigned WikiSection from the following outline:\n\n"
            "{outline}\n\nCite your sources, using the following references:\n\n<Documents>\n{docs}\n<Documents>",
        ),
        ("user", "Write the full WikiSection for the {section} section."),
    ]
)


async def retrieve(inputs: dict):
    docs = await retriever.ainvoke(inputs["topic"] + ": " + inputs["section"])
    formatted = "\n".join(
        [
            f'<Document href="{doc.metadata["source"]}"/>\n{doc.page_content}\n</Document>'
            for doc in docs
        ]
    )
    return {"docs": formatted, **inputs}


section_writer = (
    retrieve
    | section_writer_prompt
    | long_context_llm.with_structured_output(WikiSection)
)

In [52]:
section = await section_writer.ainvoke(
    {
        "outline": refined_outline.as_str,
        "section": refined_outline.sections[1].section_title,
        "topic": example_topic,
    }
)
print(section.as_str)

Impact of million-plus token context window language models on RAG The Evolution of Large Context Windows in Language Models
## The Evolution of Large Context Windows in Language Models

The evolution of large context windows in language models (LLMs) has been a critical factor in the advancement of natural language processing (NLP) technologies. Initially, LLMs were constrained by smaller context windows, limiting their understanding and generation capabilities. However, the demand for models capable of processing and integrating more extensive sequences of text has led to significant research and development efforts aimed at expanding these context windows.

Over time, this push for larger context windows has seen the emergence of several key milestones that have progressively increased the amount of text LLMs can consider when generating responses or analyses. These milestones include models like Gemini 1.5, Mixtral, GPT-3.5-Turbo-16k, Llama2-7B-chat-4k, and LongRoPE, each contribut

In [53]:
print(section.as_str)

## The Evolution of Large Context Windows in Language Models

The evolution of large context windows in language models (LLMs) has been a critical factor in the advancement of natural language processing (NLP) technologies. Initially, LLMs were constrained by smaller context windows, limiting their understanding and generation capabilities. However, the demand for models capable of processing and integrating more extensive sequences of text has led to significant research and development efforts aimed at expanding these context windows.

Over time, this push for larger context windows has seen the emergence of several key milestones that have progressively increased the amount of text LLMs can consider when generating responses or analyses. These milestones include models like Gemini 1.5, Mixtral, GPT-3.5-Turbo-16k, Llama2-7B-chat-4k, and LongRoPE, each contributing to the landscape of large context window LLMs in unique ways.

Despite the benefits, expanding the context window size br

### Generate final article

In [54]:
from langchain_core.output_parsers import StrOutputParser

writer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert Wikipedia author. Write the complete wiki article on {topic} using the following section drafts:\n\n"
            "{draft}\n\nStrictly follow Wikipedia format guidelines.",
        ),
        (
            "user",
            'Write the complete Wiki article using markdown format. Organize citations using footnotes like "[1]", avoiding duplicates in the footer.',
        ),
    ]
)

writer = writer_prompt | long_context_llm | StrOutputParser()

In [55]:
for tok in writer.stream({"topic": example_topic, "draft": section.as_str}):
    print(tok, end="")

# Impact of Million-Plus Token Context Window Language Models on RAG

The development and implementation of million-plus token context window language models (LLMs) represent a significant milestone in the field of natural language processing (NLP). These models have dramatically enhanced the capabilities of retrieval-augmented generation (RAG) systems, enabling them to generate more accurate, contextually relevant, and nuanced text outputs. This article delves into the evolution of large context windows in LLMs, their impact on RAG, and the challenges faced along the way.

## Contents

1. [The Evolution of Large Context Windows in Language Models](#The-Evolution-of-Large-Context-Windows-in-Language-Models)
    1. [Key Milestones](#Key-Milestones)
    2. [Challenges Overcome](#Challenges-Overcome)
2. [Impact on Retrieval-Augmented Generation](#Impact-on-Retrieval-Augmented-Generation)
    1. [Enhanced Contextual Understanding](#Enhanced-Contextual-Understanding)
    2. [Improved Accura

## Final Flow

Now it's time to string everything together. We will have 3 main stages in sequence:
.
1. Generate the initial outline + perspectives
2. Batch converse with each perspective to expand the content for the article.
3. Refine the outline based on the conversations
4. Write the final wiki.

In [56]:
class ResearchState(TypedDict):
    topic: str
    outline: Outline
    editors: List[Editor]
    interview_results: List[InterviewState]
    # The final sections output
    sections: List[WikiSection]
    article: str

In [57]:
import asyncio


async def initialize_research(state: ResearchState):
    topic = state["topic"]
    coros = (
        generate_outline_direct.ainvoke({"topic": topic}),
        survey_subjects.ainvoke(topic),
    )
    results = await asyncio.gather(*coros)
    return {
        **state,
        "outline": results[0],
        "editors": results[1].editors,
    }


async def conduct_interviews(state: ResearchState):
    topic = state["topic"]
    initial_states = [
        {
            "editor": editor,
            "messages": [
                AIMessage(
                    content=f"So you said you were writing an article on {topic}?",
                    name="Subject Matter Expert",
                )
            ],
        }
        for editor in state["editors"]
    ]
    # We call in to the sub-graph here
    interview_results = await interview_graph.abatch(initial_states)

    return {
        **state,
        "interview_results": interview_results,
    }


def format_conversation(interview_state):
    messages = interview_state["messages"]
    convo = "\n".join(f"{m.name}: {m.content}" for m in final_state["messages"])
    return f'Conversation with {interview_state["editor"].name}\n\n' + convo


async def refine_outline(state: ResearchState):
    convos = "\n\n".join(
        [
            format_conversation(interview_state)
            for interview_state in state["interview_results"]
        ]
    )

    updated_outline = await refine_outline_chain.ainvoke(
        {
            "topic": state["topic"],
            "old_outline": state["outline"].as_str,
            "conversations": convos,
        }
    )
    return {**state, "outline": updated_outline}


async def index_references(state: ResearchState):
    all_docs = []
    for interview_state in state["interview_results"]:
        reference_docs = [
            Document(page_content=v, metadata={"source": k})
            for k, v in interview_state["references"].items()
        ]
        all_docs.extend(reference_docs)
    await vectorstore.aadd_documents(all_docs)
    return state


async def write_sections(state: ResearchState):
    outline = state["outline"]
    sections = await section_writer.abatch(
        [
            {
                "outline": refined_outline.as_str,
                "section": section.section_title,
                "topic": state["topic"],
            }
            for section in outline.sections
        ]
    )
    return {
        **state,
        "sections": sections,
    }


async def write_article(state: ResearchState):
    topic = state["topic"]
    sections = state["sections"]
    draft = "\n\n".join([section.as_str for section in sections])
    article = await writer.ainvoke({"topic": example_topic, "draft": draft})
    return {
        **state,
        "article": article,
    }

In [58]:
builder_of_storm = StateGraph(ResearchState)


nodes = [
    ("init_research", initialize_research),
    ("conduct_interviews", conduct_interviews),
    ("refine_outline", refine_outline),
    ("index_references", index_references),
    ("write_sections", write_sections),
    ("write_article", write_article),
]
for i in range(len(nodes)):
    name, node = nodes[i]
    builder_of_storm.add_node(name, node)
    if i > 0:
        builder_of_storm.add_edge(nodes[i - 1][0], name)

builder_of_storm.set_entry_point(nodes[0][0])
builder_of_storm.set_finish_point(nodes[-1][0])
storm = builder_of_storm.compile()

In [59]:
async for step in storm.astream(
    {
        "topic": "NVIDIA 2024 Q1 earnings report",
    }
):
    name = next(iter(step))
    print(name)
    print("-- ", str(step[name])[:300])
    if END in step:
        results = step

init_research
--  {'topic': 'NVIDIA 2024 Q1 earnings report', 'outline': Outline(page_title='NVIDIA 2024 Q1 Earnings Report', sections=[Section(section_title='Overview', description='Brief introduction to NVIDIA and an overview of the 2024 Q1 earnings report', subsections=None), Section(section_title='Financial Perfo
conduct_interviews
--  {'topic': 'NVIDIA 2024 Q1 earnings report', 'outline': Outline(page_title='NVIDIA 2024 Q1 Earnings Report', sections=[Section(section_title='Overview', description='Brief introduction to NVIDIA and an overview of the 2024 Q1 earnings report', subsections=None), Section(section_title='Financial Perfo
refine_outline
--  {'topic': 'NVIDIA 2024 Q1 earnings report', 'outline': Outline(page_title='Impact of Million-Plus Token Context Window Language Models on RAG', sections=[Section(section_title='Introduction', description='An overview of the article, including the significance of million-plus token context window lan
index_references
--  {'topic': 'NVIDI

In [62]:
article = results[END]["article"]

In [65]:
print(article)

# Impact of Million-Plus Token Context Window Language Models on RAG

The **Impact of Million-Plus Token Context Window Language Models on Retrieval-Augmented Generation (RAG)** reflects a significant advancement in the field of natural language processing (NLP) and artificial intelligence (AI). This development has broadened the capabilities of AI systems in understanding and generating human-like text by integrating large-scale language models with external knowledge sources to produce contextually rich responses.

## Contents

- [Introduction](#Introduction)
- [Background](#Background)
- [Impact of Million-Plus Token Context Window Language Models on RAG](#Impact-of-Million-Plus-Token-Context-Window-Language-Models-on-RAG)
  - [The Evolution of Large Context Windows in Language Models](#The-Evolution-of-Large-Context-Windows-in-Language-Models)
  - [Integration Challenges with RAG](#Integration-Challenges-with-RAG)
  - [Enhancing RAG with External Knowledge Sources](#Enhancing-RAG-w