# NVIDIA NIMs

The `llama-index-llms-nvidia-text-completion` package contains LlamaIndex integrations building applications with models on 
NVIDIA NIM inference microservice that support /completion API. 

NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, 
NIMs can be exported from NVIDIAâ€™s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, 
giving enterprises ownership and full control of their IP and AI application.

# NVIDIA's LLM Text Completion API

With this connector, you'll be able to connect to below models:

- bigcode/starcoder2-7b
- bigcode/starcoder2-15b

## Installation

In [None]:
!pip install --force-reinstall llama_index-llms-nvidia-text-completion

## Setup

**To get started:**

1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.

2. Click on your model of choice.

3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.

4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.

In [None]:
import getpass
import os

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith(
        "nvapi-"
    ), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

In [None]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

## Working with NVIDIA API Catalog

In [None]:
from llama_index.llms.nvidia_text_completion import NVIDIATextCompletion
from llama_index.core.llms import ChatMessage, MessageRole

llm = NVIDIATextCompletion(model="bigcode/starcoder2-15b")

In [None]:
llm.available_models

In [None]:
llm.model

## Working with NVIDIA NIMs

In addition to connecting to hosted [NVIDIA NIMs](https://ai.nvidia.com), this connector can be used to connect to local microservice instances. This helps you take your applications local when necessary.

For instructions on how to setup local microservice instances, see https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/

In [None]:
from llama_index.llms.nvidia_text_completion import NVIDIATextCompletion

# connect to an chat NIM running at localhost:8080, spcecifying a specific model
llm = NVIDIATextCompletion(base_url="http://localhost:8080/v1")

### Complete: `.complete()`

We can use `.complete()`/`.acomplete()` (which takes a string) to prompt a response from the selected model.

Let's use our default model for this task.

In [None]:
print(llm.complete("# Function that does quicksort:"))

As is expected by LlamaIndex - we get a `CompletionResponse` in response.

#### Async Complete: `.acomplete()`

There is also an async implementation which can be leveraged in the same way!

In [None]:
await llm.acomplete("# Function that does quicksort:")