# STORM

[STORM](https://arxiv.org/abs/2402.14207) is a research assistant by Shao, et. al that extends the idea of "outline-driven RAG" for richer article generation.

It is tasked with generating Wikipedia-like ariticles on a user-provided topic.  It has  a few main stages:

1. Survey related subjects
2. Identify perspectives
3. "Expert Interviews" (between the writer and an agent role-playing as a perspective)
4. Refine article 

The expert interviews stage ocurrs between the article writer and each role-playing agent and itself is a loop, where the "expert" is able to query external knowledge and respond to pointed questions.

Couple hyperparameters to restrict the infinite research breadth:

N: Number of perspectives to survey / use (2->3)
M: Max number of conversation turns in step (3)

The paper uses DSPY and few-shot examples to adapt but we'll just use functioncalling here.

In [None]:
# %pip install langchain_community langchain_openai langgraph wikipedia tavily-python scikit-learn

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain_core.prompts import ChatPromptTemplate

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
        ),
        ("user", "{topic}"),
    ]
)


class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    description: str = Field(..., title="Content of the subsection")

    @property
    def as_str(self) -> str:
        return f"### {self.subsection_title}\n\n{self.description}".strip()


class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    description: str = Field(..., title="Content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            f"### {subsection.subsection_title}\n\n{subsection.description}"
            for subsection in self.subsections  or []
        )
        return f"## {self.section_title}\n\n{self.description}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the Wikipedia page")
    sections: List[Section] = Field(
        default_factory=list, title="Titles and descriptions for each section of the Wikipedia page."
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


generate_outline_direct = direct_gen_outline_prompt | ChatOpenAI(
    model="gpt-3.5-turbo"
).with_structured_output(Outline)

  warn_beta(


In [2]:
example_topic = "Impact of million-plus token context window language models on RAG"

initial_outline = generate_outline_direct.invoke({"topic": example_topic})

print(initial_outline.as_str)

# Impact of million-plus token context window language models on RAG

## Introduction

Overview of million-plus token context window language models and RAG (Retrieval-Augmented Generation).

## Million-Plus Token Context Window Language Models

Explanation of million-plus token context window language models, including architecture, training data, benefits, and challenges.

## RAG (Retrieval-Augmented Generation)

Explanation of RAG, its components, and how it integrates with million-plus token context window language models.

## Impact on RAG

Discuss the implications of using million-plus token context window language models with RAG, including improvements in performance, challenges, and future research directions.


## Expand Topics

While language models do store some Wikipedia-like knowledge in their parameters, you will get better results by incorporating relevant and recent information using a search engine.

We will start our search by generating a list of related topics.

In [3]:
gen_related_topics_prompt = ChatPromptTemplate.from_template(
    """I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
"""
)

class RelatedSubjects(BaseModel):
    topics: List[str] = Field(
        description="Comprehensive list of related subjects as background research.",
    )


expand_chain = (
   gen_related_topics_prompt 
    | ChatOpenAI(model="gpt-3.5-turbo").with_structured_output(RelatedSubjects)
)


In [4]:
related_subjects = await expand_chain.ainvoke({"topic": example_topic})
related_subjects

RelatedSubjects(topics=['Impact of million-plus token context window language models', 'RAG'])

In [5]:
class Editor(BaseModel):
    affiliation: str = Field(
        description="Primary affiliation of the editor.",
    )
    name: str = Field(
        description="Name of the editor.",
    )
    role: str = Field(
        description="Role of the editor in the context of the topic.",
    )
    description: str = Field(
        description="Description of the editor's focus, concerns, and motives.",
    )

    @property
    def persona(self) -> str:
        return f"Name: {self.name}\nRole: {self.role}\nAffiliation: {self.affiliation}\nDescription: {self.description}\n"


class Perspectives(BaseModel):
    editors: List[Editor] = Field(
        description="Comprehensive list of editors with their roles and affiliations.",
        # Add a pydantic validation/restriction to be at most M editors
    )


gen_perspectives_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
    You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

    Wiki page outlines of related topics for inspiration:
    {examples}""",
        ),
        ("user", "Topic of interest: {topic}"),
    ]
)

gen_perspectives_chain = gen_perspectives_prompt | ChatOpenAI(
    model="gpt-3.5-turbo"
).with_structured_output(Perspectives)

In [6]:
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.runnables import RunnableLambda, chain as as_runnable

wikipedia_retriever = WikipediaRetriever(load_all_available_meta=True, top_k_results=1)


def format_doc(doc, max_length=1000):
    related = "- ".join(doc.metadata["categories"])
    return f"### {doc.metadata['title']}\n\nSummary: {doc.page_content}\n\nRelated\n{related}"[
        :max_length
    ]


def format_docs(docs):
    return "\n\n".join(format_doc(doc) for doc in docs)


@as_runnable
async def survey_subjects(topic: str):
    related_subjects = await expand_chain.ainvoke({"topic": topic})
    retrieved_docs = await wikipedia_retriever.abatch(related_subjects.topics, return_exceptions=True)
    all_docs = []
    for docs in retrieved_docs:
        if isinstance(docs, BaseException):
            continue
        all_docs.extend(docs)
    formatted = format_docs(all_docs)
    return await gen_perspectives_chain.ainvoke({"examples": formatted, "topic": topic})

In [7]:
perspectives = await survey_subjects.ainvoke(example_topic)

In [8]:
perspectives.dict()

{'editors': [{'affiliation': 'Language model research institute',
   'name': 'Alice',
   'role': 'Language model researcher',
   'description': 'Alice is a language model researcher focusing on the impact of million-plus token context window language models on RAG. She is interested in analyzing the advancements in language models and their implications on the RAG (Retrieve, Analyze, Generate) framework.'},
  {'affiliation': 'Vector database company',
   'name': 'Bob',
   'role': 'Vector database expert',
   'description': 'Bob is a vector database expert who will provide insights on how vector databases can support the storage and retrieval of large language models like the million-plus token context window language models used in RAG systems.'},
  {'affiliation': 'Natural language processing organization',
   'name': 'Charlie',
   'role': 'NLP specialist',
   'description': 'Charlie is an NLP specialist who will focus on the application of million-plus token context window language m

## Expert Dialog

Now the true fun begins, the wikipedia writer will "talk" with expert agents primed to role-play using the perspectives presented above.

In [9]:
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import AnyMessage
from typing import Annotated, Sequence


def add_messages(left, right):
    if not isinstance(left, list):
        left = [left]
    if not isinstance(right, list):
        right = [right]
    return left + right


def update_references(references, new_references):
    if not references:
        references = {}
    references.update(new_references)
    return references


def update_editor(editor, new_editor):
    # Can only set at the outset
    if not editor:
        return new_editor
    return editor


class InterviewState(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]
    references: Annotated[Optional[dict], update_references]
    editor: Annotated[Optional[Editor], update_editor]

In [10]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage


gen_qn_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)


def tag_with_name(ai_message: AIMessage, name: str):
    ai_message.name = name
    return ai_message


def swap_roles(state: InterviewState, name: str):
    converted = []
    for message in state["messages"]:
        if isinstance(message, AIMessage) and message.name != name:
            message = HumanMessage(**message.dict(exclude={"type"}))
        converted.append(message)
    return {"messages": converted}


@as_runnable
async def generate_question(state: InterviewState):
    editor = state["editor"]
    gn_chain = (
        RunnableLambda(swap_roles).bind(name=editor.name)
        | gen_qn_prompt.partial(persona=editor.persona)
        | ChatOpenAI(model="gpt-3.5-turbo")
        | RunnableLambda(tag_with_name).bind(name=editor.name)
    )
    result = await gn_chain.ainvoke(state)
    return {
        "messages": [result]
    }

In [11]:
messages = [
    HumanMessage(f"So you said you were writing an article on {example_topic}?")
]
question = await generate_question.ainvoke(
    {
        "editor": perspectives.editors[0],
        "messages": messages,
    }
)

question["messages"][0].content

"Yes, that's correct. I am focusing on the impact of million-plus token context window language models on the RAG framework. These language models have significantly expanded the scope of information that can be considered when retrieving, analyzing, and generating content. Is there a specific aspect of this topic that you would like to explore further?"

In [12]:
class Queries(BaseModel):
    queries: List[str] = Field(
        description="Comprehensive list of search engine queries to answer the user's questions.",
    )


gen_queries_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful research assistant. Query the search engine to answer the user's questions.",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)
gen_queries_chain = gen_queries_prompt | ChatOpenAI(
    model="gpt-3.5-turbo"
).with_structured_output(Queries, include_raw=True)

In [13]:
queries = await gen_queries_chain.ainvoke({"messages": [HumanMessage(content=question["messages"][0].content)]})
queries['parsed'].queries

['Impact of million-plus token context window language models on the RAG framework',
 'Benefits of using million-plus token context window language models in the RAG framework',
 'Challenges of integrating million-plus token context window language models with the RAG framework',
 'Comparison of different million-plus token context window language models in the RAG framework']

In [14]:
class AnswerWithCitations(BaseModel):
    answer: str = Field(
        description="Comprehensive answer to the user's question with citations.",
    )
    cited_urls: List[str] = Field(
        description="List of urls cited in the answer.",
    )

    @property
    def as_str(self) -> str:
        return f"{self.answer}\n\nCitations:\n\n" + "\n".join(f"[{i+1}]: {url}" for i, url in enumerate(self.cited_urls))

gen_answer_prompt = ChatPromptTemplate.from_messages(
    [("system", """You are an expert who can use information effectively. You are chatting with a Wikipedia writer who wants\
 to write a Wikipedia page on the topic you know. You have gathered the related information and will now use the information to form a response.

Make your response as informative as possible and make sure every sentence is supported by the gathered information.
Each response must be backed up by a citation from a reliable source, formatted as a footnote, reproducing the URLS after your response."""),
    MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)

gen_answer_chain = gen_answer_prompt | ChatOpenAI(model="gpt-3.5-turbo").with_structured_output(AnswerWithCitations, include_raw=True)

#### Reference Store

The research process uncovers a large number of reference documents that we may want to query during the final article-writing process.
Here, we will createa multi-vector retriever and store all the searched documents inline.

In [15]:
from langchain_community.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
search_engine = DuckDuckGoSearchAPIWrapper()
from langchain_core.tools import tool
# TODO: remove when i get my api limit bumped
@tool
async def search_engine(query: str):
    """Search engine to the internet."""
    results = DuckDuckGoSearchAPIWrapper()._ddgs_text("beijing olympics")
    return [
        {"content": r["body"],
    "url": r["href"]} for r in results]


In [16]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.runnables import RunnableConfig
import json

# search_engine = TavilySearchResults(max_results=4)


async def gen_answer(
    state: InterviewState,
    config: RunnableConfig | None = None,
    name: str = "Subject Matter Expert",
    max_str_len: int = 15000,
):
    swapped_state = swap_roles(state, name)  # Convert all other AI messages
    queries = await gen_queries_chain.ainvoke(swapped_state)
    query_results = await search_engine.abatch(queries["parsed"].queries, config, return_exceptions=True)
    successful_results = [res for res in query_results if not isinstance(res, Exception)]
    all_query_results = {res["url"]: res["content"]  for results in successful_results for res in results}
    # We could be more precise about handling max token length if we wanted to here
    dumped = json.dumps(all_query_results)[:max_str_len]
    ai_message: AIMessage = queries["raw"]
    tool_call = queries["raw"].additional_kwargs["tool_calls"][0]
    tool_id = tool_call["id"]
    tool_message = ToolMessage(
        tool_call_id=tool_id,
        content=dumped
    )
    swapped_state["messages"].extend([ai_message, tool_message])
    # Only update the shared state with the final answer to avoid
    # polluting the dialogue history with intermediate messages
    generated = await gen_answer_chain.ainvoke(swapped_state)
    cited_urls = set(generated["parsed"].cited_urls)
    # Save the retrieved information to a the shared state for future reference
    cited_references = {k: v for k, v in all_query_results.items() if k in cited_urls}
    formatted_message = AIMessage(name=name, content=generated["parsed"].as_str)
    return {
        "messages": [formatted_message],
        "references": cited_references
    }

In [17]:
example_answer = await gen_answer(
    {"messages": [HumanMessage(content=question["messages"][0].content)]}
)
example_answer['messages'][-1].content

'Million-plus token context window language models have revolutionized the RAG (Retrieval-Augmented Generation) framework by significantly expanding the scope of information considered during content retrieval, analysis, and generation. These advanced language models, with their extensive context windows, allow for more nuanced understanding of text and context, leading to more accurate and contextually relevant content generation within the RAG framework.\n\nCitations:\n\n[1]: https://www.teamusa.com/olympic-games-beijing-2022\n[2]: https://apnews.com/article/beijing-olympics-nhl-d32ef9ddd57b6be68f3ae5b55b47c3c4\n[3]: https://www.britannica.com/topic/2008-Beijing-Olympic-Games-1702245\n[4]: https://olympics.com/ioc/news/final-report-highlights-legacy-of-olympic-winter-games-beijing-2022'

In [18]:
max_num_turns = 5


def route_messages(state: InterviewState, name: str = "Subject Matter Expert"):
    messages = state["messages"]
    num_responses = len(
        [m for m in messages if isinstance(m, AIMessage) and m.name == name]
    )
    if num_responses >= max_num_turns:
        return END
    last_question = messages[-2]
    if last_question.content.endswith("Thank you so much for your help!"):
        return END
    return "ask_question"

builder = StateGraph(InterviewState)

builder.add_node("ask_question", generate_question)
builder.add_node("answer_question", gen_answer)
builder.add_conditional_edges("answer_question", route_messages)
builder.add_edge("ask_question", "answer_question")

builder.set_entry_point("ask_question")
interview_graph = builder.compile().with_config(run_name="Conduct Interviews")

In [None]:
final_step = None

initial_state = {
    "editor": perspectives.editors[0],
    "messages": [AIMessage(content=f"So you said you were writing an article on {example_topic}?", name="Subject Matter Expert")]
}
async for step in interview_graph.astream(initial_state):
    name = next(iter(step))
    print(name)
    print("-- ", str(step[name]['messages'])[:300])
    if END in step:
        final_step = step

ask_question
--  [AIMessage(content="Yes, that's correct. I am interested in understanding how million-plus token context window language models are influencing the RAG (Retrieve, Analyze, Generate) framework. Do you have insights on how these advanced language models are changing the way information is retrieved, a
answer_question
--  [AIMessage(content='Million-plus token context window language models have significantly impacted the RAG (Retrieve, Analyze, Generate) framework in natural language processing tasks. These advanced language models, such as GPT-3 with 175 billion parameters, have revolutionized the way information i


In [None]:
final_state = next(iter(final_step.values()))

## Refine Outline

Now that we have all this cool stuff, let's distill it into a refined outline.

In [None]:
refine_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a Wikipedia writer. You have gathered information from experts and search engines. Now, you are refining the outline of the Wikipedia page.\
You need to make sure that the outline is comprehensive and specific.\
Topic you are writing about: {topic}\
Old outline: {old_outline}""",
        ),
        ("user", "Refine the outline based on your conversations with subject-matter experts:\n\nConversations:\n\n{conversations}\n\nWrite the refined Wikipedia outline:"),
    ]
)

# Using turbo preview since the context can get quite long
refine_outline_chain = refine_outline_prompt | ChatOpenAI(model="gpt-4-turbo-preview").with_structured_output(Outline)


In [None]:
refined_outline = refine_outline_chain.invoke(
    {
        "topic": example_topic,
        "old_outline": initial_outline.as_str,
        "conversations": "\n\n".join(
            f"### {m.name}\n\n{m.content}" for m in final_state["messages"]
        ),
    }
)

In [None]:
print(refined_outline.as_str)

## Generate Article

Now it's time to generate the full article. We will divide-and-conquer, so that each section can be tackled by an individual llm.

In [None]:
class SubSection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    content: str = Field(..., title="Full content of the subsection. Include [#] citations to the cited sources where relevant.")

    @property
    def as_str(self) -> str:
        
        return f"### {self.subsection_title}\n\n{self.content}".strip()


class WikiSection(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    content: str = Field(..., title="Full content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )
    citations: List[str] = Field(default_factory=list)

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str
            for subsection in self.subsections  or []
        )
        citations = "\n".join([f" [{i}] {cit}" for i, cit in enumerate(self.citations)])
        return f"## {self.section_title}\n\n{self.content}\n\n{subsections}".strip() + f"\n\n{citations}".strip()

In [None]:
from langchain_core.documents import Document

from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
reference_docs = [Document(page_content=v, metadata={"source": k}) for k, v in final_state["references"].items()]
# This really doesn't need to be a vectorstore.
# could just be a numpy matrix
vectorstore = SKLearnVectorStore.from_documents(
    reference_docs,
    embedding=embeddings,
)
retriever = vectorstore.as_retriever(k=10)

In [None]:
section_writer_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert Wikipedia writer. Complete your assigned WikiSection from the following outline:\n\n"
         "{outline}\n\nCite your sources, using the following references:\n\n<Documents>\n{docs}\n<Documents>"),
        ("user", "Write the full WikiSection for the {section} section.")
    ]
)


async def retrieve(inputs: dict):
    docs = await retriever.ainvoke(inputs['topic'] + ": " + inputs["section"])
    formatted = "\n".join([f'<Document href="{doc.metadata["source"]}"/>\n{doc.page_content}\n</Document>' for doc in docs])
    return {
        "docs": formatted,
        **inputs
    }

section_writer = (
    retrieve
    | section_writer_prompt
    | ChatOpenAI(model="gpt-4-turbo-preview").with_structured_output(WikiSection)
)

In [None]:
section = await section_writer.abatch(
    {"outline": refined_outline.as_str,
     "section" : refined_outline.sections[1].section_title,
     "topic": example_topic,
}
)
print(section.as_str)

In [None]:
print(section.as_str)

### Generate final article

In [None]:
from langchain_core.output_parsers import StrOutputParser
writer_prompt = ChatPromptTemplate.from_messages(
     [
        ("system", "You are an expert Wikipedia author. Write the complete wiki article on {topic} using the following section drafts:\n\n"
         "{draft}\n\nStrictly follow Wikipedia format guidelines."),
        ("user", 'Write the complete Wiki article using markdown format. Organize citations using footnotes like "[1]", avoiding duplicates in the footer.')
    ]
)

writer = writer_prompt | ChatOpenAI(model="gpt-4-turbo-preview") | StrOutputParser()

In [None]:
for tok in writer.stream({"topic": example_topic, "draft": section.as_str}):
    print(tok, end="")

## Final Flow

Now it's time to string everything together. We will have 3 main stages in sequence:
.
1. Generate the initial outline + perspectives
2. Batch converse with each perspective to expand the content for the article.
3. Refine the outline based on the conversations
4. Write the final wiki.

In [None]:
class ResearchState(TypedDict):
    topic: str
    outline: Outline
    editors: List[Editor]
    interview_results: List[InterviewState]
    # The final sections output
    sections: List[WikiSection]
    article: str

In [None]:
import asyncio

async def initialize_research(state: ResearchState):
    topic = state["topic"]
    coros = (
        generate_outline_direct.ainvoke({"topic": topic}),
        survey_subjects.ainvoke(topic)
    )
    results = await asyncio.gather(*coros)
    return {
        **state,
        "outline": results[0],
        "editors": results[1].editors,
    }

async def conduct_interviews(state: ResearchState):
    topic = state["topic"]
    initial_states = [{
        "editor": editor,
        "messages": [AIMessage(content=f"So you said you were writing an article on {topic}?", name="Subject Matter Expert")]
    } for editor in state["editors"]]
    # We call in to the sub-graph here
    interview_results = await interview_graph.abatch(initial_states)
        
    return {
        **state,
        "interview_results": interview_results,
    }


def format_conversation(interview_state):
    messages = interview_state["messages"]
    convo = "\n".join(
        f"{m.name}: {m.content}" for m in final_state["messages"]
    )
    return f'Conversation with {interview_state["editor"].name}\n\n' + convo


async def refine_outline(state: ResearchState):
    convos = "\n\n".join([format_conversation(interview_state) for interview_state in state["interview_results"]])
    
    updated_outline = await refine_outline_chain.ainvoke(
            {
            "topic": state["topic"],
            "old_outline": state["outline"].as_str,
            "conversations": convos,
        }
    )
    return {
        **state,
        "outline": updated_outline
    }

async def index_references(state: ResearchState):
    all_docs = []
    for interview_state in state["interview_results"]:
        reference_docs = [Document(page_content=v, metadata={"source": k}) for k, v in interview_state["references"].items()]
        all_docs.extend(reference_docs)
    await vectorstore.aadd_documents(all_docs)
    return state

async def write_sections(state: ResearchState):
    outline = state["outline"]
    sections = await section_writer.abatch(
        [
            {
                "outline": refined_outline.as_str,
                "section": section.section_title,
                "topic": state["topic"],
            }
            for section in outline.sections
        ]
    )
    return {
        **state,
        "sections": sections,
    }

async def write_article(state: ResearchState):
    topic = state["topic"]
    sections = state["sections"]
    draft = "\n\n".join([section.as_str for section in sections])
    article = writer.ainvoke({"topic": example_topic, "draft": draft})
    return {
        **state,
        "article": article,
    }

In [None]:
builder_of_storm = StateGraph(ResearchState)


nodes = [
    ("init_research", initialize_research),
    ("conduct_interviews", conduct_interviews),
    ("refine_outline", refine_outline),
    ("index_references", index_references),
    ("write_sections", write_sections),
    ("write_article", write_article),
]
for i in range(len(nodes)):
    name, node = nodes[i]
    builder_of_storm.add_node(name, node)
    if i > 0:
        builder_of_storm.add_edge(nodes[i-1][0], name)

builder_of_storm.set_entry_point(nodes[0][0])
builder_of_storm.set_finish_point(nodes[-1][0])
storm = builder_of_storm.compile()

In [None]:
async for step in storm.astream({
        "topic": "NVIDIA 2024 Q1 earnings report",
    }):
    name = next(iter(step))
    print(name)
    print("-- ", str(step[name])[:300])
    if END in step:
        results = step