# NVIDIA NIMs
The langchain-nvidia-ai-endpoints package contains LangChain integrations building applications with models on NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure.

- https://python.langchain.com/v0.2/docs/integrations/chat/nvidia_ai_endpoints/
- https://build.nvidia.com/explore/discover

https://pypi.org/project/langchain-nvidia-ai-endpoints/


# PHI3 128k

Phi-3-Small is a lightweight, state-of-the-art open model built upon datasets used for Phi-2 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family, and the small version comes in two variants 8K and 128K which is the context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. This model is ready for commercial and research use.

In [None]:
# ! python -m pip install -r requirements.txt --user --quiet

In [None]:
from dotenv import load_dotenv

load_dotenv(".env")

In [None]:
import getpass
import os

## API Key can be found by going to NVIDIA NGC -> AI Foundation Models -> (some model) -> Get API Code or similar.
## 10K free queries to any endpoint (which is a lot actually).

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

In [None]:
nvapi_key=  os.getenv("NVIDIA_API_KEY")


In [None]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model="meta/llama3-70b-instruct", max_tokens=419)
[model.id for model in llm.available_models if model.model_type]

In [None]:
# test run and see that you can genreate a respond successfully
from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(model="microsoft/phi-3-small-128k-instruct", nvidia_api_key=nvapi_key, max_tokens=1024)

result = llm.invoke("Write a ballad about LangChain.")
print(result.content)

In [None]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embedder = NVIDIAEmbeddings(model="nvidia/nv-embedqa-mistral-7b-v2", model_type="passage")

# Alternatively, if you want to specify whether it will use the query or passage type
# embedder = NVIDIAEmbeddings(model="ai-embed-qa-4", model_type="passage")

In [None]:
embedder.available_models

In [None]:
import os
from tqdm import tqdm
from pathlib import Path
import re
# Here we read in the text data and prepare them into vectorstore
ps = os.listdir("./data/")
data = []
sources = []
for p in ps:
    if p.endswith('.txt'):
        path2file="./data/"+p
        print(path2file)
        with open(path2file, encoding="utf-8") as f:
            lines=f.readlines()
            for line in lines:
                text = line.replace("\n", "")
                text = text.replace(" ", "")
                if len(line)>=1 and len(text) >1:
                    data.append(line)
                    sources.append(path2file)

In [None]:
documents=[d for d in data if d != '\n']
len(data), len(documents), data[0]

In [None]:
documents[:10]

In [None]:
import time

print("Single Document Embedding: ")
s = time.perf_counter()
q_embedding  = embedder.embed_documents([documents[0]])
elapsed = time.perf_counter() - s
print("\033[1m" + f"Executed in {elapsed:0.2f} seconds." + "\033[0m")
print("Shape:", (len(q_embedding),))

print("\nBatch Document Embedding: ")
s = time.perf_counter()
d_embeddings = embedder.embed_documents(documents[:10])
elapsed = time.perf_counter() - s
print("\033[1m" + f"Executed in {elapsed:0.2f} seconds." + "\033[0m")
print("Shape:",len(d_embeddings[0]))

In [None]:
# Here we create a vector store from the documents and save it to disk.
from operator import itemgetter
from langchain.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import CharacterTextSplitter
from langchain_nvidia_ai_endpoints import ChatNVIDIA
import faiss
# create my own uuid
text_splitter = CharacterTextSplitter(chunk_size=400, separator=" ")
docs = []
metadatas = []

for i, d in enumerate(documents):
    splits = text_splitter.split_text(d)
    #print(len(splits))
    docs.extend(splits)
    metadatas.extend([{"source": sources[i]}] * len(splits))

store = FAISS.from_texts(docs, embedder , metadatas=metadatas)
store.save_local('./data/nv_embedding')

# you will only need to do this once, later on we will restore the already saved vectorstore

In [None]:
# Load the vectorestore back.

store = FAISS.load_local("./data/nv_embedding", embedder, allow_dangerous_deserialization=True)

In [None]:
retriever = store.as_retriever()

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer solely based on the following context:\n<Documents>\n{context}\n</Documents>",
        ),
        ("user", "{question}"),
    ]
)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke("Tell me about Great Gatsby.")

In [None]:
chain.invoke("Can you explain me the role of Jordan?")

In [None]:
chain.invoke("Who is Rich Draves?")