<p> <center> <a href="../start_here.ipynb.ipynb">Home Page</a> </center> </p>

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="rag_nim_endpoints.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="rag_nim_endpoints.ipynb">1</a>
        <a >2</a>
        <a href="nim_lora_adapter.ipynb">3</a>
        <a href="nim_lora_adapter.ipynb">4</a>
        <a href="nim_lora_adapter.ipynb">5</a>
        <!-- <a href="challenge.ipynb">4</a> -->
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="nim_lora_adapter.ipynb">Next Notebook</a></span>
</div>

## Getting Started

NIMs are quickly accessible via easy-to-use open APIs available at [NVIDIA API Catalog](https://build.nvidia.com/explore/discover), a platform for accessing a wide range of microservices online. To start with NIMs, you need an `NVIDIA API Key` which requires registration. You can register by `clicking on the login button to enter your email address`, as shown in the screenshot below, and follow the rest process or attempt to generate the API Key via the [NVIDIA NGC](https://ngc.nvidia.com/signin) registration (*click on your account name -> setup -> Generate Personal Key*). After completing the process, please save your API Key somewhere you can access for future use. A sample API Key should start with `nvapi-` and 64 other characters, including underscore `_`.

If you already have an account please follow this step to get your NVIDIA API KEY:

- Login to your account from [here](https://build.nvidia.com/explore/discover).
- Click on your model of choice.
- Under Input select the Python tab, and click Get API Key and then click Generate Key.
- Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.

<div style="text-align: center;">
  <!--<img src="imgs/builder_catalog.jpg" style="width: 800px; height: auto;">-->
  <img src="../images/nim-catalog.png" style="width: 900px; height: auto;">
</div>

## Setting up NVIDIA API Key

Because we want to access NIM outside the NGC environment, it is important to test our API Key by setting it as an environment variable and using it to send requests to NIM models. Please run the two cells below to test the process.

In [1]:
import os
import getpass

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
    os.environ["NGC_API_KEY"] = nvapi_key

Enter your NVIDIA API key:  ········


In [2]:
from typing import Annotated
from langchain.chat_models import init_chat_model
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages
from pydantic import BaseModel
from typing import Literal

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


In [3]:
class State(TypedDict):
    messages: Annotated[list, add_messages]

In [4]:
model_id = 'nvidia/llama-3.3-nemotron-super-49b-v1'

In [5]:
graph_builder = StateGraph(State)

In [6]:
llm = init_chat_model(model=model_id,model_provider="nvidia")

In [7]:
def chatbot(state: State):
    return {"messages": [llm.invoke(state["messages"])]}

The first argument is the unique node name
The second argument is the function or object that will be called whenever the node is reached. 
The first argument refers to the start node
The second argument refers to the end node
The graph needs to be compiled before use

In [None]:
graph_builder.add_node("chatbot", chatbot)
graph_builder.add_edge(START, "chatbot")
graph = graph_builder.compile()

<langgraph.graph.state.StateGraph at 0x1126f9bb0>

graph.stream is a method that streams responses from the graph. 
i.e. there is no need to wait for the entire response to be generated before it can be returned to the application/user.
This improves the user experience in a chatbot as responses can be returned token by token

In [11]:
def stream_graph_updates(user_input: str):
    for event in graph.stream({"messages": [{"role": "user", "content": user_input}]}):
        for value in event.values():
            print("Assistant:", value["messages"][-1].content)

In [12]:
stream_graph_updates("what is langgraph?")

Assistant: LangGraph is a relatively new concept that has emerged with the rapid advancements in Natural Language Processing (NLP) and graph-based data structures. Since it's a somewhat niche and evolving area, the definition can vary slightly depending on the context (e.g., research paper, specific application, or educational material). However, I'll provide a comprehensive overview based on the general understanding and known applications of LangGraph as of my last update:

### **Definition of LangGraph**

**LangGraph** can refer to one of two closely related concepts, depending on the emphasis:

1. **As a Data Structure/Model**:
   - **LangGraph** might describe a graph-based data structure designed to represent and encode linguistic information. In this sense, it's a graph where nodes (vertices) and edges are utilized to model various aspects of language, such as:
     - **Nodes**: Could represent words, entities, parts of speech, semantic roles, or even higher-level concepts (like

In [13]:
class UserIntent(BaseModel):
    """The user's current intent in the conversation"""
    intent: Literal["naruto", "bleach"]

Applications need output from LLMs to be parseable
The most common format is json.
Different providers have different ways of enabling this.
For NVIDIA NIMs, this is done through extra_body={"nvext": {"guided_json": json_schema}}
Refer to the following link for more information
https://docs.nvidia.com/nim/large-language-models/latest/structured-generation.html

In [14]:
llm_structured = init_chat_model(model=model_id, model_provider="nvidia").with_structured_output(
    UserIntent, strict=True
)

In [15]:
res = llm_structured.invoke([
    {'role':'system','content':'You are an anime encyclopedia. Classify if the user is asking a question on naruto or bleach.'},
    {'role':'user','content':'who is sasuke?'}
])

In [16]:
res

UserIntent(intent='naruto')