# NVIDIA NIMs
The langchain-nvidia-ai-endpoints package contains LangChain integrations building applications with models on NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure.

- https://python.langchain.com/v0.2/docs/integrations/chat/nvidia_ai_endpoints/
- https://build.nvidia.com/explore/discover

https://pypi.org/project/langchain-nvidia-ai-endpoints/


# PHI3 128k

Phi-3-Small is a lightweight, state-of-the-art open model built upon datasets used for Phi-2 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family, and the small version comes in two variants 8K and 128K which is the context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. This model is ready for commercial and research use.

In [12]:
# ! python -m pip install -r requirements.txt --user --quiet

In [1]:
from dotenv import load_dotenv

load_dotenv(".env")

True

In [2]:
import getpass
import os

## API Key can be found by going to NVIDIA NGC -> AI Foundation Models -> (some model) -> Get API Code or similar.
## 10K free queries to any endpoint (which is a lot actually).

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

Valid NVIDIA_API_KEY already in environment. Delete to reset


In [3]:
nvapi_key=  os.getenv("NVIDIA_API_KEY")


In [42]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model="meta/llama3-70b-instruct", max_tokens=419)
[model.id for model in llm.available_models if model.model_type]

['01-ai/yi-large',
 'liuhaotian/llava-v1.6-mistral-7b',
 'microsoft/kosmos-2',
 'microsoft/phi-3-vision-128k-instruct',
 'google/recurrentgemma-2b',
 'google/codegemma-7b',
 'writer/palmyra-med-70b-32k',
 'writer/palmyra-med-70b',
 'microsoft/phi-3-small-128k-instruct',
 'google/deplot',
 'ibm/granite-34b-code-instruct',
 'nv-mistralai/mistral-nemo-12b-instruct',
 'seallms/seallm-7b-v2.5',
 'meta/llama3-70b-instruct',
 'mediatek/breeze-7b-instruct',
 'deepseek-ai/deepseek-coder-6.7b-instruct',
 'meta/llama-3.1-8b-instruct',
 'nvidia/llama3-chatqa-1.5-8b',
 'meta/codellama-70b',
 'mistralai/mixtral-8x22b-instruct-v0.1',
 'meta/llama2-70b',
 'upstage/solar-10.7b-instruct',
 'mistralai/mixtral-8x7b-instruct-v0.1',
 'google/gemma-2-27b-it',
 'databricks/dbrx-instruct',
 'aisingapore/sea-lion-7b-instruct',
 'meta/llama3-8b-instruct',
 'microsoft/phi-3-medium-128k-instruct',
 'google/gemma-2-9b-it',
 'meta/llama-3.1-70b-instruct',
 'snowflake/arctic',
 'ibm/granite-8b-code-instruct',
 'googl

In [6]:
# test run and see that you can genreate a respond successfully
from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(model="microsoft/phi-3-small-128k-instruct", nvidia_api_key=nvapi_key, max_tokens=1024)

result = llm.invoke("Write a ballad about LangChain.")
print(result.content)

 (Verse 1)
  In the realm of code and thought,
  A chain of Lang was wrought,
  With links of language, strong and tight,
  It rose to wondrous height.

  A bard of bytes, a weaver of words,
  Its tapestry of text unfurled,
  From the depths of data's sea,
  LangChain emerged to be.

  (Chorus)
  Oh, sing the song of LangChain's might,
  A beacon in the digital night,
  Where algorithms dance and play,
  And knowledge blooms in endless array.

  (Verse 2)
  With each link forged in logic's fire,
  It grew in wisdom, higher and higher,
  A symphony of syntax spun,
  A masterpiece of the mind's work done.

  It spoke in tongues of old and new,
  A polyglot of every hue,
  And through the labyrinth of language,
  It found the path to truth's true range.

  (Chorus)
  Oh, sing the song of LangChain's might,
  A beacon in the digital night,
  Where algorithms dance and play,
  And knowledge blooms in endless array.

  (Verse 3)
  The bards of bytes, they tell the tale,
  Of how LangChain se

In [24]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embedder = NVIDIAEmbeddings(model="nvidia/nv-embedqa-mistral-7b-v2", model_type="passage")

# Alternatively, if you want to specify whether it will use the query or passage type
# embedder = NVIDIAEmbeddings(model="ai-embed-qa-4", model_type="passage")

In [44]:
embedder.available_models

[Model(id='NV-Embed-QA', model_type='embedding', client='NVIDIAEmbeddings', endpoint='https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings', aliases=['ai-embed-qa-4', 'playground_nvolveqa_40k', 'nvolveqa_40k'], supports_tools=False, base_model=None),
 Model(id='nvidia/nv-embedqa-mistral-7b-v2', model_type='embedding', client='NVIDIAEmbeddings', endpoint=None, aliases=None, supports_tools=False, base_model=None),
 Model(id='nvidia/nv-embed-v1', model_type='embedding', client='NVIDIAEmbeddings', endpoint=None, aliases=['ai-nv-embed-v1'], supports_tools=False, base_model=None),
 Model(id='snowflake/arctic-embed-l', model_type='embedding', client='NVIDIAEmbeddings', endpoint=None, aliases=['ai-arctic-embed-l'], supports_tools=False, base_model=None),
 Model(id='nvidia/nv-embedqa-e5-v5', model_type='embedding', client='NVIDIAEmbeddings', endpoint=None, aliases=None, supports_tools=False, base_model=None)]

In [31]:
import os
from tqdm import tqdm
from pathlib import Path
import re
# Here we read in the text data and prepare them into vectorstore
ps = os.listdir("./data/")
data = []
sources = []
for p in ps:
    if p.endswith('.txt'):
        path2file="./data/"+p
        print(path2file)
        with open(path2file, encoding="utf-8") as f:
            lines=f.readlines()
            for line in lines:
                text = line.replace("\n", "")
                text = text.replace(" ", "")
                if len(line)>=1 and len(text) >1:
                    data.append(line)
                    sources.append(path2file)

./data/book.txt
./data/worked.txt


In [32]:
documents=[d for d in data if d != '\n']
len(data), len(documents), data[0]

(6138, 6138, 'Project Gutenberg eBook of The Great Gatsby\n')

In [33]:
documents[:10]

['Project Gutenberg eBook of The Great Gatsby\n',
 'This ebook is for the use of anyone anywhere in the United States and\n',
 'most other parts of the world at no cost and with almost no restrictions\n',
 'whatsoever. You may copy it, give it away or re-use it under the terms\n',
 'of the Project Gutenberg License included with this ebook or online\n',
 'at www.gutenberg.org. If you are not located in the United States,\n',
 'you will have to check the laws of the country where you are located\n',
 'before using this eBook.\n',
 'Title: The Great Gatsby\n',
 'Author: F. Scott Fitzgerald\n']

In [34]:
import time

print("Single Document Embedding: ")
s = time.perf_counter()
q_embedding  = embedder.embed_documents([documents[0]])
elapsed = time.perf_counter() - s
print("\033[1m" + f"Executed in {elapsed:0.2f} seconds." + "\033[0m")
print("Shape:", (len(q_embedding),))

print("\nBatch Document Embedding: ")
s = time.perf_counter()
d_embeddings = embedder.embed_documents(documents[:10])
elapsed = time.perf_counter() - s
print("\033[1m" + f"Executed in {elapsed:0.2f} seconds." + "\033[0m")
print("Shape:",len(d_embeddings[0]))

Single Document Embedding: 
[1mExecuted in 2.07 seconds.[0m
Shape: (1,)

Batch Document Embedding: 
[1mExecuted in 4.69 seconds.[0m
Shape: 4096


In [35]:
# Here we create a vector store from the documents and save it to disk.
from operator import itemgetter
from langchain.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import CharacterTextSplitter
from langchain_nvidia_ai_endpoints import ChatNVIDIA
import faiss
# create my own uuid
text_splitter = CharacterTextSplitter(chunk_size=400, separator=" ")
docs = []
metadatas = []

for i, d in enumerate(documents):
    splits = text_splitter.split_text(d)
    #print(len(splits))
    docs.extend(splits)
    metadatas.extend([{"source": sources[i]}] * len(splits))

store = FAISS.from_texts(docs, embedder , metadatas=metadatas)
store.save_local('./data/nv_embedding')

# you will only need to do this once, later on we will restore the already saved vectorstore

In [37]:
# Load the vectorestore back.

store = FAISS.load_local("./data/nv_embedding", embedder, allow_dangerous_deserialization=True)

In [38]:
retriever = store.as_retriever()

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer solely based on the following context:\n<Documents>\n{context}\n</Documents>",
        ),
        ("user", "{question}"),
    ]
)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke("Tell me about Great Gatsby.")

' "The Great Gatsby" is a novel written by F. Scott Fitzgerald. The book explores themes of decadence, idealism, resistance to change, and social upheaval. It is set in the summer of 1922 and follows the mysterious millionaire Jay Gatsby as he pursues his lost love, Daisy Buchanan, who is now married to the wealthy but unfaithful Tom Buchanan.\n\n  In the provided context, there are four documents related to "The Great Gatsby." The first and second documents mention the title of the book. The third document contains a quote from the book, where a character corrects someone by saying, "Not Gatsby," and the fourth document also contains a quote from the book, where a character mentions that someone told them about Gatsby.\n\n  Overall, "The Great Gatsby" is considered a classic of American literature and is widely studied in schools and universities. The book has been adapted into several films and stage productions, and its themes and characters continue to resonate with readers today.'

In [39]:
chain.invoke("Can you explain me the role of Jordan?")

" Based on the provided context, it appears that Jordan is a character in a story or book. From the snippets of text, we can gather some information about Jordan's interactions with another character. Here's a summary of the context:\n\n1. Jordan smiled.\n2. Another character called out to Jordan, asking them to come closer.\n3. The text cuts off, but it seems that the other character was addressing Jordan and possibly their husband.\n\nFrom this limited context, we can infer that Jordan is likely a significant character in the story, as they are being called upon by another character. However, without more information, it is difficult to determine the exact role of Jordan in the narrative."

In [40]:
chain.invoke("Who is Rich Draves?")

' Rich Draves is a friend of the narrator who, in 1988, granted permission to the narrator and another person to use something, although the specific nature of what they were allowed to use is not mentioned in the provided context.\n\n  The context is taken from two documents. The first document, with the source \'./data/worked.txt\', mentions that the narrator visited Rich Draves in a dissatisfied state in 1988 and that they got permission to use something. The second document, with the source \'./data/book.txt\', contains two lines of dialogue: "What is?" and "I\'ve heard of it." The connection between these documents and Rich Draves is not explicitly stated in the provided context.'