<br>
<a href="https://www.nvidia.com/en-us/training/">
    <div style="width: 55%; background-color: white; margin-top: 50px;">
    <img src="https://dli-lms.s3.amazonaws.com/assets/general/nvidia-logo.png"
         width="400"
         height="186"
         style="margin: 0px -25px -5px; width: 300px"/>
</a>
<h1 style="line-height: 1.4;"><font color="#76b900"><b>Building Agentic AI Applications with LLMs</h1>
<h2><b>Assessment:</b> Creating A Basic Researching Agent</h2>
<br>

**Welcome To The Assessment!** We hope you're ready to apply some of the skills you've learned so far towards building something you've probably seen floating around; a "researching" chatbot. The overall idea should be pretty familiar:

- **The chatbot should look at your question and look around the internet for some resources.**
- **Based on those resources, the chatbot should make an educated guess based on its retrieved information.**

This is implemented often in conjunction with LLM interfaces like ChatGPT and Perplexity, and various open-source efforts have cropped up to simplify the process. With that being said, they usually don't rely on the likes of 8B models due to the finicky nature of routing them properly. As such, we will merely be testing you on your ability to implement the following primitives: 
- **A structured output interface to produce a parseable list.**
- **A function to search for web snippets and filter out the most relevant results.**
- **A mechanism for accumulating message beyond the control of the user.**
- **Some basic prompt engineering artifacts.**

Of note, there are many extensions which you should be able to imagine at this point. Perhaps we could have a requerying mechanism somewhere? Or maybe either the user or the agent could criticize and remove entries from the history? Long-term memory does sound appealing, after all. However, we will be focusing on just our simple features as required for two key reasons:
- **First, we really don't want to force you to do more engineering than you have to.** Frameworks like LangGraph may have many levers and introduce new primitives very quickly in an attempt to simplify the interface, so any overengineering we do now may become deprecated by the time you're reading this with some simpler off-the-shelf options.
- **Secondly, our Llama-3.1-8B model inherently makes this more challenging for us due to its limitations.** This level of challenge is important to understand and work with, since you are better-equipped to decompose harder challenges and leverage your tools to their fullest as you scale up. With that said, a multi-turn long-term-memory research agent implemented with Llama-8B is quite tedious at the moment, with many of the streamlines interfaces assuming a stronger model.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_nvidia import ChatNVIDIA

llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct", base_url="http://llm_client:9000/v1")

<hr><br>

## **Part 1:** Define The Planner

For the initial system, please make a minimal-viable "supervisor"-style element which tries to delegate tasks. This is a very vague definition, so technically a module that generates a list of tasks is technically viable. So let's start with that!

In [None]:
from pydantic import BaseModel, Field
from functools import partial
from typing import List

from course_utils import SCHEMA_HINT

##################################################################
## TODO: Create an LLM client with the sole intention of generating a plan.

class Plan(BaseModel):
    ## TODO: Define a variable of choice, including useful prompt engineering/restrictions
    pass

planning_prompt = ChatPromptTemplate.from_messages([
    ("system", (
        "You are a master planner system who charts out a plan for how to solve a problem."
        ## TODO: Perform some more prompt engineering. Maybe consider including the schema_hint
    )),
    ("placeholder", "{messages}"),
])

## TODO: Construct the necessary components to create the chain
planning_chain = None

input_msgs = {"messages": [("user", "Can you help me learn more about LangGraph?")]}

## For convenience, we have defined a 
step_buffer = []
for chunk in planning_chain.stream(input_msgs):
    if "steps" in chunk:
        if len(chunk.get("steps")) > len(step_buffer):
            if step_buffer:
                print(flush=True)
            step_buffer += [""]
            print(" - ", end='', flush=True)
        dlen = len(chunk.get("steps")[-1]) - len(step_buffer[-1])
        step_buffer[-1] = chunk.get("steps")[-1]
        print(step_buffer[-1][-dlen:], end="", flush=True)

<br>

In an effort to help modularize this process for later, feel free to use this generator wrapper. This is effectively just the same process, but now yielding the results out to be processed by its caller:

In [None]:
def generate_thoughts(input_msgs, config=None):
    step_buffer = [""]
    for chunk in planning_chain.stream(input_msgs, config=config):
        if "steps" in chunk and chunk.get("steps"):
            if len(chunk.get("steps")) > len(step_buffer):
                yield step_buffer[-1]
                step_buffer += [""]
            dlen = len(chunk.get("steps")[-1]) - len(step_buffer[-1])
            step_buffer[-1] = chunk.get("steps")[-1]
    yield step_buffer[-1]
    print("FINISHED", flush=True)

from time import sleep

for thought in generate_thoughts(input_msgs):
    
    print("-", thought)
    
    ## Example Use-Case: Slowing down the generation
    # for token in thought:
    #     print(token, end="", flush=True)
    #     sleep(0.02)
    # print(flush=True)
    # sleep(0.1)

<hr><br>

## **Task 2:** Define The Retrieval Sub-Process Mechanism

Now that we have a list of steps that we would like to consider, let's use them as a basis for searching the internet. Try implementing a searching mechanism of choice, and try to parallelize/batch this process if possible. 

- Feel free to implement `search_internet` and `retrieve_via_query` in a manner consistent with the warm-up (`DDGS` + `NVIDIARerank`), or maybe write up your own scheme that you think would be interesting. It may be interesting to implement a loop (agent-as-a-tool?) where you search, expand context, filter, and search again. Conceptually easy, but implementationally more involved.
- You may use the `tools` format if you want, but it will not be necessary. Do as you think is interesting.
- Our solutions did use `RunnableLambda(...).batch` at some point. Some solutions may also try to leverage `RunnableParallel`. Either-or may be useful, but are not required.

In [None]:
# from langchain_core.runnables import RunnableLambda
# from ddgs import DDGS
# import functools

####################################################################
## TODO: Implement a "step researcher" mechanism of choice
## We incorporated a 2-step process similar to the example notebook.

# @functools.cache  # <- useful for caching duplicate results
# def search_internet(final_query: str): 
#     ## OPTIONAL: We ended up defining this method
#     pass 
     
def research_options(steps):
    return [] ## TODO

search_retrievals = research_options(step_buffer)
# search_retrievals

In [None]:
# from langchain_nvidia import NVIDIARerank
# from langchain_core.documents import Document

## Optional Scaffold
def retrieve_via_query(context_rets, query: str, k=5):
    return [] ## TODO

filtered_results = [retrieve_via_query(search_retrievals, step) for step in step_buffer]
# filtered_results

<hr><br>

## **Part 3:** Creating The Research Pipeline

Now that we have some minimum-viable semblance of a supervisor/subordinate system, let's go ahead and orchestrate them in an interesting way. Feel free to come up with your own mechanism for "reasoning" about the question and "researching" the results. If you don't see a straightforward way to make it work, a default pool of prompts is offered below (possibly the ones we used).

In [None]:
## TODO: Define the structured prompt template. Doesn't have to be this!
agent_prompt = ChatPromptTemplate.from_messages([
    ("system", 
     "You are an agent. Please help the user out! Questions will be paired with relevant context."
     " At the end, output the most relevant sources for your outputs, being specific."
    ),
    ("placeholder", "{messages}"),
])

intermediate_prompt = "I can help you look into it. Here's the retrieval: {action} -> {result}" 
final_question = "Great! Now use this information to solve the original question: {question}"

In [None]:
question = "Can you help me learn more about LangGraph?"
# question = "Can you help me learn more about LangGraph? Specifically, can you tell me about Memory Management?"
# question = "Can you help me learn more about LangGraph? Specifically, can you tell me about Pregel?"
# question = "Can you help me learn more about LangGraph? Specifically, can you tell me about subgraphs?"
# question = "Can you help me learn more about LangGraph? Specifically, can you tell me about full-duplex communication?"
# question = "Can you help me learn more about LangGraph? Specifically, can you tell me about productionalization?"
## TODO: Try your own highly-specialized questions that shouldn't be answerable from priors alone. 

input_msgs = {"messages": [("user", question)]}

#########################################################################
## TODO: Organize a systen  to reason about your question progressively.
## Feel free to use LangChain or LangGraph. Make sure to wind up with 
## a mechanism that that remembers the reasoning steps for your system

sequence_of_actions = [thought for thought in generate_thoughts(input_msgs)]
## ...

## HINT: We ended up with a for-loop that accumulated intermediate "question-answer" pairs
## You may also consider a map-reduce-style approach to operate on each step independently.

# for action, result in zip(sequence_of_actions, filtered_results):  ## <- possible start-point
#     pass

input_msgs["messages"] += []

# ## HINT: If you wind up with a chain, this may be easy to work with...
# print("*"*64)
# for token in chain.stream(input_msgs):
#     if "\n" in token:
#         print(flush=True)
#     else: 
#         print(token, end="", flush=True)

<hr><br>

## **Part 4:** Accumulating Your Reasoning Traces

Depending on the structure of your system, the last requirement may be trivial or might take a bit of extra effort. Please aggregate the answers to 8 diverse and reasonable questions, while also accumulating the trace (i.e. the "reasoning", projected to an understandable format). 

This output will be evaluated by an LLM to assess whether the response seems to exhibit reasonable behavior (reasoning makes sense, final output addresses question, sources are cited, etc).

In [None]:
## TODO: Aggregate 8 question-trace-answer triples. 
# [ 
#   {"question": str, "trace": list or dict or str, "answer": str}, 
#   ...
# ]

submission = [{}]

<hr>
<br>

## **Part 5:** Running The Assessment

To assess your submission, run the following cells to save your results and the one after to query the assessment runner.

**Follow the instructions and make sure it all passes.**

In [None]:
import requests

## Send the submission over to the assessment runner
response = requests.post(
    "http://docker_router:8070/run_assessment", 
    json={"submission": globals().get("submission", {})},
)

response.raise_for_status()

try: 
    print(response.json().get("result"))
    if response.json().get("messages"):
        print("MESSAGES:", "\n  - ".join([""] + response.json().get("messages")))
    if response.json().get("exceptions"):
        print("EXCEPTIONS:", "\n[!] ".join([""] + [str(v) for v in response.json().get("exceptions")]))
except:
    print("Failed To Process Assessment Response")
    print(response.__dict__)

<br>

If you passed the assessment, please return to the course page (shown below) and click the **"ASSESS TASK"** button, which will generate your certificate for the course.

<img src="./images/assess_task.png" style="width: 800px;">

<hr>
<br>

## **Part 6:** Wrapping Up

### <font color="#76b900">**Congratulations On Completing The Course!!**</font>

Before concluding the course, we highly recommend downloading the course material for later reference, and checking over the **"Next Steps"** and **Feedback** sections of the course. **We appreciate you taking the time to go through the course, and look forward to seeing you again for the next courses in the series!**

<a href="https://www.nvidia.com/en-us/training/">
    <div style="width: 55%; background-color: white; margin-top: 50px;">
    <img src="https://dli-lms.s3.amazonaws.com/assets/general/nvidia-logo.png"
         width="400"
         height="186"
         style="margin: 0px -25px -5px; width: 300px"/>
</a>