# Using NVIDIA's API Playground Connector

This notebook will guide you through understanding the basic usage of the `NvidiaAIPlayground` connector.

With this connector, you'll be able to connect to and generate from compatible models available at the NVIDIA [API Catalog](https://build.nvidia.com/explore/discover), such as:

- Google's [gemma-7b](https://build.nvidia.com/google/gemma-7b)
- Mistal AI's [mistral-7b-instruct-v0.2](https://build.nvidia.com/mistralai/mistral-7b-instruct-v2)
- And more!

We'll begin by ensuring `llama-index` and associated packages are installed.

In [10]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-embeddings-openai



## API Keys and Boilerplate

During the next cell we'll run some boilerplate to allow the examples to be executed smoothly in a notebook environment. 

We'll also provide our API keys. 

> NOTE: You can create your NVIDIA API key using the `Get API Key` button in the code example window.

In [1]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

import os

# Using OpenAI API for embeddings
os.environ["OPENAI_API_KEY"] = "sk-"

# Using NVIDIA API Playground API Key for LLM
os.environ["NVIDIA_AI_PLAYGROUND_API_KEY"] = "nvapi-"

## Loading Playground LLM

Now we can load our `NvidiaAIPlayground` LLM by passing in the model name, as found in the code example on `build.nvidia.com`.

> NOTE: The default model will be `playground_nemotron_steerlm_8b`.

In [25]:
from llama_index.llms.nvidia_ai_playground import NvidiaAIPlayground
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

llm = NvidiaAIPlayground(model="playground_nemotron_steerlm_8b")

Settings.llm = llm

We can observe which model our `llm` object is currently associated with the `.model` attribute.

In [26]:
llm.model

'playground_nemotron_steerlm_8b'

## Loading API Catalogue LLM

We can also load models using their API Catalogue address.

Let's use `gemma-7b` as an example!

1. Navigate to the [model page](https://build.nvidia.com/google/gemma-7b)
2. Find the address in the `model` parameter (e.g. `"google/gemma-7b"`)
3. Set the `model` parameter to the same name for `NvidiaAIPlayground`

Let's see this in the code.

In [28]:
llm = NvidiaAIPlayground(model="google/gemma-7b")

Let's confirm we've associated our `NvidiaAIPlayground` LLM with the correct model!

In [29]:
llm.model

'google/gemma-7b'

## Basic Functionality

Now we can explore the different ways you can use the connector within the LlamaIndex ecosystem!

Before we begin, lets set up a list of `ChatMessage` objects - which is the expected input for some of the methods.

In [18]:
from llama_index.core.llms import ChatMessage, MessageRole

chat_messages = [
    ChatMessage(
        role=MessageRole.SYSTEM,
        content=(
            "You are a helpful assistant."
        )
    ),
    ChatMessage(
        role=MessageRole.USER,
        content=(
            "How do I get to Paris from London?"
        )
    ),
]

We'll follow the same basic pattern for each example: 

1. We'll point our endpoint to a model hosted on `build.nvidia.com`.
2. We'll examine how to use the endpoint to achieve the desired task!

### Complete: `.complete()`

We can use `.complete()`/`.acomplete()` (which takes a string) to prompt a response from the selected model.

Let's use our default model for this task.

In [47]:
completion_llm = NvidiaAIPlayground()

We can verify this is the expected default by checking the `.model` attribute.

In [48]:
completion_llm.model

'playground_nemotron_steerlm_8b'

Let's call `.complete()` on our model with a string, in this case `"Hello!"`, and observe the response.

In [49]:
completion_llm.complete("Hello!")

CompletionResponse(text='Hello! I am NV Assistant, a language model developed by NVIDIA, designed to answer any questions or help with a variety of tasks. I am continually learning from interactions with users, so I would be happy to assist you with whatever you need.\n\nHere are a few examples of the types of tasks and questions I can help with:\n\n    Personal assistance:\n        - What is the weather like today?\n        - What is the current time in my city?\n        - What is the best way to cook a chicken breast?\n\n    Information retrieval:\n        - What is the capital of France?\n        - How many countries are in the European Union?\n        - Who was the president of the United States in 2009?\n\n    Helpful tools and tips:\n        - How do I create a Google account?\n        - How do I install software on my computer?\n        - How do I troubleshoot a Wi-Fi connection?\n\n    General conversation:\n        - Tell me a joke.\n        - What is your favorite book or mov

As is expected by LlamaIndex - we get a `CompletionResponse` in response.

#### Async Complete: `.acomplete()`

There is also an async implementation which can be leveraged in the same way!

In [51]:
await completion_llm.acomplete("Hello!")

CompletionResponse(text='Hello! I am NV Assistant, a language model developed by NVIDIA, designed to answer any questions or help with a variety of tasks. I am continually learning from interactions with users, so I would be happy to assist you with whatever you need.\n\nHere are a few examples of the types of tasks and questions I can help with:\n\n    Personal assistance: I can answer questions, provide information, and perform tasks to help you with daily activities, such as scheduling appointments, making travel arrangements, or providing weather forecasts.\n\n    Technical support: I can provide information and troubleshooting tips for a variety of technical issues, such as computer or software problems, internet connectivity issues, or hardware problems.\n\n    Customer service: I can assist with customer service inquiries, such as placing orders, tracking shipments, or resolving billing issues.\n\n    Writing assistance: I can provide suggestions for improving the clarity and ef

#### Chat: `.chat()`

Now we can try the same thing using the `.chat()` method. This method expects a list of chat messages - so we'll use the one we created above.

We'll use the `playground_mixtral_8x7b` model for the example.

In [52]:
chat_llm = NvidiaAIPlayground(model="playground_mixtral_8x7b")

All we need to do now is call `.chat()` on our list of `ChatMessages` and observe our response.

In [53]:
chat_llm.chat(chat_messages)

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='There are several ways to get from London to Paris:\n\n1. **By Eurostar Train**: This is the fastest and most convenient way to travel between the two cities. The Eurostar train departs from St Pancras International Station in London and arrives at Gare du Nord in Paris. The journey takes approximately 2 hours and 15 minutes.\n\n2. **By Plane**: There are numerous flights between London and Paris every day. The flight duration is about 1 hour, but you need to add time for getting to and from the airports, security checks, and baggage claim. The major airports in London are Heathrow, Gatwick, Stansted, and Luton, and in Paris, they are Charles de Gaulle and Orly.\n\n3. **By Bus**: This is the cheapest but slowest option. Buses depart from Victoria Coach Station in London and arrive at Gallieni Porte de Bagnolet in Paris. The journey can take up to 8 hours, depending on traffic.\n\n4. **By Car**: If you 

As expected, we receive a `ChatResponse` in response.

#### Async Chat: (`achat`)

We also have an async implementation of the `.chat()` method which can be called in the following way.

In [54]:
await chat_llm.achat(chat_messages)

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='There are several ways to get from London to Paris:\n\n1. **By Eurostar Train**: This is the fastest and most convenient way to travel between the two cities. The Eurostar train departs from St Pancras International Station in London and arrives at Gare du Nord in Paris. The journey takes approximately 2 hours and 15 minutes.\n\n2. **By Plane**: There are numerous flights between London and Paris every day. The flight duration is about 1 hour, but you need to add time for getting to and from the airports, security checks, and baggage claim. The major airports in London are Heathrow, Gatwick, Stansted, and Luton, and in Paris, they are Charles de Gaulle and Orly.\n\n3. **By Bus**: This is the cheapest but slowest option. Buses depart from Victoria Coach Station in London and arrive at Gallieni Porte de Bagnolet in Paris. The journey can take up to 8 hours, depending on traffic.\n\n4. **By Car**: If you 

### Stream: `.stream_chat()`

We can also use the models found on `build.nvidia.com` for streaming use-cases!

Let's select another model and observe this behaviour. We'll use Google's `gemma-7b` model for this task.

In [55]:
stream_llm = NvidiaAIPlayground(model="google/gemma-7b")

Let's call our model with `.stream_chat()`, which again expects a list of `ChatMessage` objects, and capture the response.

In [56]:
streamed_response = stream_llm.stream_chat(chat_messages)

In [57]:
streamed_response

<generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x7989c61a0260>

As we can see, the response is a generator with the streamed response. 

Let's take a look at the final response once the generation is complete.

In [45]:
last_element = None
for last_element in streamed_response:
    pass

print(last_element)

assistant: Sure, here's how you can get to Paris from London:

**By Train:**

* The most convenient way to travel between London and Paris is by train. There are several direct train routes available between the two cities, operated by Eurostar, National Express, and Thalys.
* The journey takes approximately 2 hours and 15 minutes, and the cost varies depending on the time of travel and the operator you choose.
* To book your train tickets, you can visit the official website of the train operator you choose.

**By Ferry:**

* You can also travel between London and Paris by ferry. There are several ferry companies that offer regular services between the two cities, including Brittany Ferries, DFDS, and Eurotunnel.
* The journey takes approximately 2-3 hours, and the cost varies depending on the ferry company and the time of travel.
* To book your ferry tickets, you can visit the official website of the ferry company you choose.

**By Car:**

* If you have your own car, you can drive fro

#### Async Stream: `.astream_chat()`

We have the equivalent async method for streaming as well, which can be used in a similar way to the sync implementation.

In [58]:
streamed_response = await stream_llm.astream_chat(chat_messages)

In [59]:
streamed_response

<async_generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_async_llm_chat.<locals>.wrapped_gen at 0x7989c61a3cd0>

In [65]:
last_element = None
async for last_element in streamed_response:
    pass

print(last_element)

assistant: Sure, here's how you can get to Paris from London:

**By Train:**

* The most convenient way to travel between London and Paris is by train. There are several direct train routes available, operated by Eurostar, National Express, and Thalys. The journey takes around 2 hours and costs between £20-50.
* To get to the train station, you can take a tube or taxi to London St Pancras International station.

**By Ferry:**

* You can also travel to Paris by ferry, which takes around 2 hours and costs between £20-40. Ferries depart from London's Tower Bridge and Canary Wharf.
* To get to the ferry terminal, you can take a tube or taxi to the terminal.

**By Car:**

* If you prefer driving, you can take a road trip to Paris, which takes around 2 hours and costs between £20-40 for tolls and parking.
* To get to Paris by car, you can take the M25 motorway.

**Additional Tips:**

* It is recommended to book your train tickets in advance, especially during peak season.
* You can find the 

## Streaming Query Engine Responses

Let's look at a slightly more involved example using a query engine!

We'll start by loading some data (we'll be using the [Hitchhiker's Guide to the Galaxy](https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt)).

### Loading Data

Let's first create a directory where our data can live.

In [6]:
!mkdir -p 'data/hhgttg'

We'll download our data from the above source.

In [22]:
!wget 'https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt' -O 'data/hhgttg/hhgttg.txt'

--2024-04-01 14:39:38--  https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt
Resolving web.eecs.utk.edu (web.eecs.utk.edu)... 160.36.127.165
Connecting to web.eecs.utk.edu (web.eecs.utk.edu)|160.36.127.165|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1534289 (1.5M) [text/plain]
Saving to: ‘data/hhgttg/hhgttg.txt’


2024-04-01 14:39:39 (6.75 MB/s) - ‘data/hhgttg/hhgttg.txt’ saved [1534289/1534289]



We'll need to have an embedding model for this step! We'll use OpenAI's `text-embedding-03-small` model to achieve this, and save it in our `Settings`.

In [68]:
from llama_index.embeddings.openai import OpenAIEmbedding

openai_embedding = OpenAIEmbedding(model="text-embedding-3-small")

Settings.embed_model = openai_embedding

Now we can load our document and create an index leveraging the above created `OpenAIEmbedding()`.

In [75]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data/hhgttg").load_data()
index = VectorStoreIndex.from_documents(documents)

Now we can create a simple query engine and set our `streaming` parameter to `True`.

In [76]:
streaming_qe = index.as_query_engine(streaming=True)

Let's send a query to our query engine, and then stream the response.

In [77]:
streaming_response = streaming_qe.query(
    "What is the significance of the number 42?",
)

In [78]:
streaming_response.print_response_stream()

The significance of the number 42 is a central theme in "The Hitchhiker's Guide to the Galaxy" by Douglas Adams. The book is a comedic science fiction satire that follows the adventures of two intergalactic travelers, Arthur Dent and Ford Prefect, as they try to escape the destruction of Earth and uncover the true meaning of the number 42.

Throughout the book, the number 42 is presented as the ultimate answer to the ultimate question of life, the universe, and everything. The question itself is never explicitly stated, but it is implied to be a deeply profound and existential one that has been sought after by philosophers, scientists, and thinkers throughout history.

The idea of the number 42 as the ultimate answer is a playful jab at the idea of seeking ultimate knowledge and understanding, which is often seen as an impossible task. The number 42 is also a reference to the famous "42" answer in the "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, which is a comedic science f