# NVIDIA AI Enterprise 

NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications.  Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production for enterprises that run their businesses on AI.

# NVIDIA  NIM

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of accelerated inference microservices that allow organizations to run AI models on NVIDIA GPUs anywhere—in the cloud, data center, workstations, and PCs. Using industry-standard APIs, developers can deploy AI models with NIM using just a few lines of code. NIM containers seamlessly integrate with the Kubernetes (K8s) ecosystem, allowing efficient orchestration and management of containerized AI applications. Accelerate the development of your AI applications today with NIM.

https://docs.api.nvidia.com/nim/reference/llm-overview

https://venturebeat.com/ai/meta-unleashes-its-most-powerful-ai-model-llama-3-1-with-405b-parameters/


Llama 3.1 is multilingual and support English, Portuguese, Spanish, Italian, German, French, Hindi, and Thai prompts. The smaller Llama 3 models will also become multilingual starting today.

Llama 3.1’s context window has been expanded to 128,000 tokens — which means users can feed it as much text as goes into a nearly 400 page novel.

In [None]:
from IPython.display import Image
from IPython.core.display import HTML 
import os
PATH = os.getcwd()
Image(filename = os.path.join(PATH, "coT.png"), width=600, height=800)

In [None]:
from dotenv import load_dotenv

load_dotenv(".env")

In [None]:
from openai import OpenAI
nvapi_key=  os.getenv("NVIDIA_API_KEY")
client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = nvapi_key
)


In [None]:
Prompt= """   Read the Instruction below and provide an answer. 
Always provide an answer. If you dont know how to calculate something, respond please I dont know

 INSTRUCTION:
Question:
I bought a car 20 years ago, and its cost was 100000$.
Car's anual depreciation it is 5%.
Using the Percentage (Declining Balance) method, what it is the value of the car now ?

### RESPONSE:
"""

# meta/llama-3.1-405b-instruct

https://docs.api.nvidia.com/nim/reference/meta-llama-3_1-405b

In [None]:
completion = client.chat.completions.create(
  model="meta/llama-3.1-405b-instruct",
  messages=[{"role":"user","content":Prompt}],
  temperature=0.5,
  top_p=0.7,
  max_tokens=2048,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

# microsoft/phi-3-mini-4k-instruct

https://docs.api.nvidia.com/nim/reference/microsoft-phi-3-mini-4k

In [None]:
completion = client.chat.completions.create(
  model="microsoft/phi-3-mini-4k-instruct",
  messages=[{"role":"user","content":Prompt}],
  temperature=0.5,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

# mistral-7b-instruct-v0.3

https://docs.api.nvidia.com/nim/reference/mistralai-mistral-7b-instruct-v03

In [None]:
completion = client.chat.completions.create(
  model="mistralai/mistral-7b-instruct-v0.3",
  messages=[{"role":"user","content":Prompt}],
  temperature=0.5,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

# google/gemma-2-9b-it

https://docs.api.nvidia.com/nim/reference/google-gemma-2-9b-it

In [None]:
completion = client.chat.completions.create(
  model="google/gemma-2-9b-it",
  messages=[{"role":"user","content":Prompt}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")