# NVIDIA AI Playground ChatModel

>[NVIDIA AI Playground](https://www.nvidia.com/en-us/research/ai-playground/) gives users easy access to hosted endpoints for generative AI models like Llama-2, SteerLM, Mistral, etc. Using the API, you can query NVCR (NVIDIA Container Registry) function endpoints and get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster.

This example goes over how to use LangChain to interact with supported AI Playground models.

In [None]:
from langchain.llms.nv_aiplay import NVCRModel, NVAIPlayClient  ## Core backbone interface clients
from langchain.chat_models import NVAIPlayChat                  ## Generic NVAIPlay Models
from langchain.chat_models.nv_aiplay import LlamaChat           ## Llama-default NVAIPlay Models

## Setup

**To get started:**
1. Create a free account with the [NVIDIA GPU Cloud](https://catalog.ngc.nvidia.com/) service, which hosts AI solution catalogs, containers, models, etc.
2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.
3. Select the `API` option and click `Generate Key`.
4. Save the generated key as `NVAPI_KEY`. From there, you should have access to the endpoints.

In [3]:
import getpass
import os

## API Key can be found by going to NVIDIA NGC -> AI Playground -> (some model) -> Get API Code or similar.
## 10K free queries to any endpoint (which is a lot actually).

# del os.environ['NVAPI_KEY']  ## delete
if os.environ.get('NVAPI_KEY', '').startswith('nvapi-'):
    print('Valid NVAPI_KEY already in environment. Delete to reset')
else:
    nvapi_key = getpass.getpass('NVAPI Key (starts with nvapi-): ')
    assert nvapi_key.startswith('nvapi-'), \
        f"{nvapi_key[:5]}... is not a valid key"
    os.environ['NVAPI_KEY'] = nvapi_key

NVAPI Key (starts with nvapi-): ··········


## Underlying Requests API

A selection of useful models are hosted in a DGX-powered service known as NVIDIA GPU Cloud (NGC). In this service, containers with exposed model endpoints are deployed and listed on the NVIDIA Container Registry service (NVCR). These systems are accessible via simple HTTP requests and can be utilized by a variety of systems.

The `NVCRModel` class implements the basic interfaces to communicate with NVCR, limiting the utility functions to those relevant for AI Playground. For example, the following list is populated by querying the function list endpoint with a key-loaded GET request:

In [4]:
NVCRModel().available_models

{'playground_llama2_code_13b': 'f6a96af4-8bf9-4294-96d6-d71aa787612e',
 'playground_neva_22b': '8bf70738-59b9-4e5f-bc87-7ab4203be7a0',
 'playground_nvolveqa_40k': '091a03bb-7364-4087-8090-bd71e9277520',
 'playground_gpt_qa_8b': '0c60f14d-46cb-465e-b994-227e1c3d5047',
 'playground_llama2_code_34b': 'df2bee43-fb69-42b9-9ee5-f4eabbeaf3a8',
 'playground_gpt_steerlm_8b': '1423ff2f-d1c7-4061-82a7-9e8c67afd43a',
 'playground_fuyu_8b': '9f757064-657f-4c85-abd7-37a7a9b6ee11',
 'playground_clip': '8c21289c-0b18-446d-8838-011b7249c513',
 'playground_llama2_13b': 'e0bb7fb9-5333-4a27-8534-c6288f921d3f',
 'playground_sdxl': '89848fb8-549f-41bb-88cb-95d6597044a4',
 'playground_llama2_70b': '0e349b44-440a-44e1-93e9-abe8dcb27158',
 'playground_mistral': '35ec3354-2681-4d0e-a8dd-80325dcf7c63'}

From this, you can easily send over a request in the style shown in the AI Playground API window for Python. For this example, we will use a model which we is not currently in our LangChain support matrix (though we plan to add first-class support later).

In [5]:
client = NVCRModel()

model = 'neva'
payload = {
  "messages": [
    {
      "content": "Hi! What is in this image? <img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII==\" />",
      "role": "user"
    },
    {
      "labels": {
        "creativity": 6,
        "helpfulness": 6,
        "humor": 0,
        "quality": 6
      },
      "role": "assistant"
    }
  ],
  "temperature": 0.2,
  "top_p": 0.7,
  "max_tokens": 512,
  "stream": True
}

def print_with_newlines(generator):
    buffer = ""
    for response in generator:
        content = response.get('content')
        if len(buffer) > 80 and content.startswith(' '):
            buffer = ""
            print()
        elif content.startswith('\n'):
            buffer = ""
        buffer += content
        print(content, end='')

## Generate-style response
# print(client.get_req_generation(model, payload))
# print()
## NOTE: if an invalid name is specified, it will try to find a model that contains the provided name

## Stream-style response
print_with_newlines(client.get_req_stream(model, payload))
print()

async def print_with_newlines_async(responses):
    buffer = ""
    async for response in responses:
        content = response['content']
        if len(buffer) > 80 and content.startswith(' '):
            buffer = ""
            print()
        elif '\n' in content:
            buffer = ""
        buffer += content
        print(content, end='')

## Stream-style response
await print_with_newlines_async(client.get_req_astream(model, payload))


The image is a gray scale photograph of a checkered pattern, possibly a portion of
 a chessboard or a security camera image. The pattern consists of a series of white
 and black squares, creating a visually striking design. The squares are organized
 in a grid-like pattern, covering the entire image from top to bottom and left to
 right. The contrast between the white and black squares is quite noticeable, emphasizing
 the checkered pattern and making it the central focus of the image.
The image is a gray scale photograph of a checkered pattern, possibly a portion of
 a chessboard or a security camera image. The pattern consists of a series of white
 and black squares, creating a visually striking design. The squares are organized
 in a grid-like pattern, covering the entire image from top to bottom and left to
 right. The contrast between the white and black squares is quite noticeable, emphasizing
 the checkered pattern and making it the central focus of the image.

As we can see, this is a general-purpose backbone API which can be built upon quite nicely to facilitate the LangChain generation/streaming/astreaming APIs.

## Integration With LangChain

Based on this core support, we have a base connector `NVAIPlayBaseModel` which implements all of the components necessary to interface with both the `LLM` and `SimpleChatModel` classes via inheritance. This notebook will demonstrate the `ChatModel` portion with key features.

### **Supported Models**

Querying `available_models` will still give you all of the models offered by your API credentials:

In [6]:
NVAIPlayLLM().available_models

['playground_llama2_70b',
 'playground_sdxl',
 'playground_gpt_steerlm_8b',
 'playground_nvolveqa_40k',
 'playground_mistral',
 'playground_clip',
 'playground_gpt_qa_8b',
 'playground_llama2_code_34b',
 'playground_llama2_code_13b',
 'playground_fuyu_8b',
 'playground_neva_22b',
 'playground_llama2_13b']

All of these models are *technically* supported and can all be accessed via `NVCRModel`, but some models have first-class LangChain support and others are more experimental.

**Ready-To-Use Chat Models** have been tested and are top-priority for our LangChain support. They're useful for external and internal reasoning, and responses always come in with a chat format and with a common seed for consistent and reproducible trial results. There is no text completion API for these models for AI Playground, though support for raw query endpoints exists with NeMo Service and other NVCR functions.
- `llama2_13b`/`llama2_70b`: Chat-trained variants of Llama-2
- `llama2_code_13b`/`llama2_code_43b`: Code-trained variants of Llama-2
- `mistral`: Instruction-tuned variant of Mistral.

In [25]:
# from langchain.chat_models.nv_aiplay import LlamaChat
from langchain.schema import HumanMessage, SystemMessage

# Single prompt
llm = LlamaChat()
print(llm(HumanMessage(content="Hey, we've just met! How's your day going?")))
print(llm("Hey, we've just met! How's your day going?"))

{'role': 'user', 'content': "Hey, we've just met! How's your day going?"}
content="Hello! *smile* I'm doing well, thank you for asking! It's great to meet you too! How about you, how's your day going so far? Is there anything you'd like to talk about or ask? I'm here to help with any questions you might have. *pleasant and respectful tone*"
{'role': 'user', 'content': "Hey, we've just met! How's your day going?"}
content="Hello! *smile* I'm doing well, thank you for asking! It's great to meet you too! How about you, how's your day going so far? Is there anything you'd like to talk about or ask? I'm here to help with any questions you might have. *pleasant and respectful tone*"


We currently also support streaming and asynchronous streaming in a similar fashion as before:

In [26]:
async def print_with_newlines_async(responses):
    buffer = ""
    async for content in responses:
        content = content.content ## Difference from LLM
        if len(buffer) > 80 and content.startswith(' '):
            buffer = ""
            print()
        elif '\n' in content:
            buffer = ""
        buffer += content
        print(content, end='')

await print_with_newlines_async(llm.astream("Who's the best quarterback in the NFL?"))

{'role': 'user', 'content': "Who's the best quarterback in the NFL?"}
As a helpful and respectful assistant, I cannot provide a subjective opinion on who
 the "best" quarterback in the NFL is, as this is a matter of personal opinion and can be influenced by a variety of factors such as team loyalty, personal bias, and individual performance. However, I can
 provide some information on some of the top-performing quarterbacks in the NFL this
 season, based on their statistics and achievements.

Some of the top-performing quarterbacks in the NFL this season include:

1. Lamar Jackson, Baltimore Ravens: Jackson has had an MVP-caliber season, leading
 the Ravens to a 10-2 record and setting numerous records for rushing yards by a quarterback.
 He has thrown for 3,107 yards and 32 touchdowns, while also rushing for 1,008 yards
 and 7 touchdowns.
2. Russell Wilson, Seattle Seahawks: Wilson has had another strong season, leading
 the Seahawks to a 9-3 record and throwing for 3,875 yards and 30

At the same time, there are also some specific APIs that we support for the sake of convenience since the underlying requests API is chat-oriented. For example:

In [29]:
print(llm("""
///ROLE SYS: Only generate python code. Do not add any discussions about it.
///ROLE USER: Please implement Fibanocci in python without recursion. Your response should start and end in ```
""").content)

{'role': 'system', 'content': 'Only generate python code. Do not add any discussions about it.\n'}
{'role': 'user', 'content': 'Please implement Fibanocci in python without recursion. Your response should start and end in ```'}
```
def fibonacci(n):
    if n <= 1:
        return n
    else:
        a, b = 0, 1
        for i in range(n-1):
            a, b = b, a + b
        return a
```


You can add your own custom support for such a system by subclassing the `NVAIPlayBaseModel` class.

# Conversation Chains

Like any other integration, NVAIPlayClients are fine to support chat utilities like conversation buffers by default. Below, we show the [LangChain ConversationBufferMemory](https://python.langchain.com/docs/modules/memory/types/buffer) example applied to the LlamaLLM model.

In [30]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = LlamaLLM(
    temperature = 0.1,
    max_tokens = 100,
    top_p = 1.0
)

conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

In [31]:
conversation.predict(input="Hi there!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m
{'role': 'user', 'content': 'The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n\nCurrent conversation:\n\nHuman: Hi there!\nAI:'}

[1m> Finished chain.[0m


'Hello! How can I assist you today?'

In [32]:
conversation.predict(input="I'm doing well! Just having a conversation with an AI.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hello! How can I assist you today?
Human: I'm doing well! Just having a conversation with an AI.
AI:[0m
{'role': 'user', 'content': "The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n\nCurrent conversation:\nHuman: Hi there!\nAI: Hello! How can I assist you today?\nHuman: I'm doing well! Just having a conversation with an AI.\nAI:"}

[1m> Finished chain.[0m


"That's great! I'm here to help and provide information to the best of my ability. Is there anything specific you'd like to know or discuss?"

In [33]:
conversation.predict(input="Tell me about yourself.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hello! How can I assist you today?
Human: I'm doing well! Just having a conversation with an AI.
AI: That's great! I'm here to help and provide information to the best of my ability. Is there anything specific you'd like to know or discuss?
Human: Tell me about yourself.
AI:[0m
{'role': 'user', 'content': "The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n\nCurrent conversation:\nHuman: Hi there!\nAI: Hello! How can I assist you today?\nHuman: I'm 

"Hello! I'm just an AI, I don't have a physical body, but I exist as a program that can process and analyze vast amounts of data. I am designed to be helpful and assist with a wide range of tasks, from answering questions to providing information on a variety of topics. I am constantly learning and improving, so I can become more knowledgeable and helpful over time. I am also designed to be respectful and socially unbiased, and I"