# Azure AI Search s integracijom NVIDIA NIM i LlamaIndex

U ovom bilje≈æniku pokazat ƒáemo kako iskoristiti NVIDIA-ine AI modele i LlamaIndex za stvaranje moƒánog Retrieval-Augmented Generation (RAG) sustava. Koristit ƒáemo NVIDIA-ove LLM-ove i ugraƒëivanja, integrirati ih s Azure AI Search kao spremi≈°tem vektora te provesti RAG kako bismo pobolj≈°ali kvalitetu i uƒçinkovitost pretra≈æivanja.

## Prednosti
- **Skalabilnost**: Iskoristite NVIDIA-ove velike jeziƒçne modele i Azure AI Search za skalabilno i uƒçinkovito pretra≈æivanje.
- **Uƒçinkovitost tro≈°kova**: Optimizirajte pretra≈æivanje i dohvaƒáanje pomoƒáu uƒçinkovitog spremi≈°ta vektora i hibridnih tehnika pretra≈æivanja.
- **Visoke performanse**: Kombinirajte moƒáne LLM-ove s vektoriziranim pretra≈æivanjem za br≈æe i preciznije odgovore.
- **Kvaliteta**: Odr≈æavajte visoku kvalitetu pretra≈æivanja uz utemeljenje odgovora LLM-a na relevantnim dohvaƒáenim dokumentima.

## Preduvjeti
- üêç Python 3.9 ili noviji
- üîó [Azure AI Search Service](https://learn.microsoft.com/azure/search/)
- üîó NVIDIA API kljuƒç za pristup NVIDIA-inim LLM-ovima i ugraƒëivanjima putem NVIDIA NIM mikroservisa

## Obuhvaƒáene znaƒçajke
- ‚úÖ Integracija NVIDIA LLM-a (koristit ƒáemo [Phi-3.5-MOE](https://build.nvidia.com/microsoft/phi-3_5-moe))
- ‚úÖ NVIDIA ugraƒëivanja (koristit ƒáemo [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5))
- ‚úÖ Napredni naƒçini dohvaƒáanja u Azure AI Search
- ‚úÖ Indeksiranje dokumenata pomoƒáu LlamaIndex
- ‚úÖ RAG koristeƒái Azure AI Search i LlamaIndex s NVIDIA LLM-ovima

Krenimo!


In [None]:
!pip install azure-search-documents==11.5.1
!pip install --upgrade llama-index
!pip install --upgrade llama-index-core
!pip install --upgrade llama-index-readers-file
!pip install --upgrade llama-index-llms-nvidia
!pip install --upgrade llama-index-embeddings-nvidia
!pip install --upgrade llama-index-postprocessor-nvidia-rerank
!pip install --upgrade llama-index-vector-stores-azureaisearch
!pip install python-dotenv

## Instalacija i zahtjevi
Kreirajte Python okru≈æenje koristeƒái Python verziju >3.10.

## Poƒçetak!


Da biste zapoƒçeli, trebate `NVIDIA_API_KEY` za kori≈°tenje NVIDIA AI Foundation modela:  
1) Kreirajte besplatan raƒçun na [NVIDIA](https://build.nvidia.com/explore/discover).  
2) Kliknite na ≈æeljeni model.  
3) Pod Input, odaberite Python karticu, zatim kliknite **Get API Key** i potom **Generate Key**.  
4) Kopirajte i spremite generirani kljuƒç kao NVIDIA_API_KEY. Nakon toga, trebali biste imati pristup endpointima.  


In [3]:
import getpass
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key


## Primjer RAG-a koristeƒái LLM i Ugraƒëivanje
### 1) Inicijalizacija LLM-a
`llama-index-llms-nvidia`, poznat i kao NVIDIA-ov LLM konektor, omoguƒáuje povezivanje i generiranje iz kompatibilnih modela dostupnih u NVIDIA API katalogu. Pogledajte ovdje popis modela za dovr≈°avanje razgovora: https://build.nvidia.com/search?term=Text-to-Text

Ovdje ƒáemo koristiti **mixtral-8x7b-instruct-v0.1**


In [75]:
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA

# Here we are using mixtral-8x7b-instruct-v0.1 model from API Catalog
Settings.llm = NVIDIA(model="microsoft/phi-3.5-moe-instruct", api_key=os.getenv("NVIDIA_API_KEY"))

### 2) Inicijalizirajte Ugraƒëivanje
`llama-index-embeddings-nvidia`, takoƒëer poznat kao NVIDIA-ov konektor za ugraƒëivanja, omoguƒáuje vam povezivanje s kompatibilnim modelima i generiranje iz njih, dostupnim u NVIDIA API katalogu. Odabrali smo `nvidia/nv-embedqa-e5-v5` kao model za ugraƒëivanje. Ovdje mo≈æete pronaƒái popis modela za tekstualna ugraƒëivanja: https://build.nvidia.com/nim?filters=usecase%3Ausecase_text_to_embedding%2Cusecase%3Ausecase_image_to_embedding


In [6]:
from llama_index.embeddings.nvidia import NVIDIAEmbedding

Settings.embed_model = NVIDIAEmbedding(model="nvidia/nv-embedqa-e5-v5", api_key=os.getenv("NVIDIA_API_KEY"))

### 3) Stvorite Azure AI Search Vector Store


In [76]:
import logging
import sys
import os
import getpass
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore, IndexManagement


search_service_api_key = os.getenv('AZURE_SEARCH_ADMIN_KEY') or getpass.getpass('Enter your Azure Search API key: ')
search_service_endpoint = os.getenv('AZURE_SEARCH_SERVICE_ENDPOINT') or getpass.getpass('Enter your Azure Search service endpoint: ')
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-nvidia-azureaisearch-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstrate using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

In [None]:
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1024, # dimensionality for nv-embedqa-e5-v5 model
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
    # compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)

In [20]:
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.text_splitter import TokenTextSplitter

# Configure text splitter (nv-embedqa-e5-v5 model has a limit of 512 tokens per input size)
text_splitter = TokenTextSplitter(separator=" ", chunk_size=500, chunk_overlap=10)

# Load documents
documents = SimpleDirectoryReader(
    input_files=["data/txt/state_of_the_union.txt"]
).load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index with text splitter
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    storage_context=storage_context,
)

### 5) Kreirajte upitni mehanizam za postavljanje pitanja o va≈°im podacima

Ovdje je upit koji koristi ƒçisto pretra≈æivanje vektora u Azure AI Search i temelji odgovor na na≈°em LLM-u (Phi-3.5-MOE)


In [69]:
query_engine = index.as_query_engine()
response = query_engine.query("Who did the speaker mention as being present in the chamber?")
display(Markdown(f"{response}"))

 The speaker mentioned the Ukrainian Ambassador to the United States, along with other members of Congress, the Cabinet, and various officials such as the Vice President, the First Lady, and the Second Gentleman, as being present in the chamber.

Evo upita koji koristi hibridno pretra≈æivanje u Azure AI Search.


In [70]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from IPython.display import Markdown, display
from llama_index.core.schema import MetadataMode

# Initialize hybrid retriever and query engine
hybrid_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.HYBRID)
hybrid_query_engine = RetrieverQueryEngine(retriever=hybrid_retriever)

# Query execution
query = "What were the exact economic consequences mentioned in relation to Russia's stock market?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))

 The Russian stock market experienced a significant drop, losing 40% of its value. Additionally, trading had to be suspended due to the ongoing situation.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia‚Äôs lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 

We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 

Together with our allies ‚Äìwe are right now enforcing powerful economic sanctions. 

We are cutting off Russia‚Äôs largest banks from the international financial syste

#### Analiza pretra≈æivanja vektora
Odgovor LLM-a toƒçno prenosi kljuƒçne ekonomske posljedice navedene u izvornom tekstu vezano za rusko tr≈æi≈°te dionica. Konkretno, navodi da je rusko tr≈æi≈°te dionica do≈æivjelo znaƒçajan pad, izgubiv≈°i 40% svoje vrijednosti, te da je trgovanje obustavljeno zbog trenutne situacije. Ovaj odgovor dobro se podudara s informacijama iz izvora, ≈°to ukazuje na to da je LLM ispravno identificirao i sa≈æeo relevantne detalje o utjecaju na tr≈æi≈°te dionica kao rezultat ruskih postupaka i nametnutih sankcija.

#### Komentar izvora ƒçvorova
Izvorni ƒçvorovi pru≈æaju detaljan prikaz ekonomskih posljedica s kojima se Rusija suoƒçila zbog meƒëunarodnih sankcija. Tekst istiƒçe da je rusko tr≈æi≈°te dionica izgubilo 40% svoje vrijednosti, a trgovanje je obustavljeno. Osim toga, spominje i druge ekonomske posljedice, poput devalvacije rublje i ≈°ire izolacije ruskog gospodarstva. Odgovor LLM-a uƒçinkovito je sa≈æeo kljuƒçne toƒçke iz ovih ƒçvorova, fokusirajuƒái se na utjecaj na tr≈æi≈°te dionica, kako je zatra≈æeno u upitu.


Sada, pogledajmo upit u kojem Hibridno pretra≈æivanje ne daje dobro utemeljen odgovor:


In [71]:
# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it does mention that the events discussed are happening in the current era and that the actions taken are in response to Putin's aggression. For the precise date, one would need to refer to external sources or historical records.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies ‚Äì in the event that Putin decides to keep moving west.  

For that purpose we‚Äôve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 

As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  

And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  

Putin has unleashed violence and chaos.  But while he may make gains on the battlefield ‚Äì he will pay a continuing high price over the long run. 

And a proud Ukrainian people, who have known 30 years  of indepen

### Hibridno pretra≈æivanje: Analiza odgovora LLM-a
Odgovor LLM-a u primjeru hibridnog pretra≈æivanja ukazuje na to da pru≈æeni kontekst ne navodi toƒçan datum ruske invazije na Ukrajinu. Ovaj odgovor sugerira da LLM koristi informacije dostupne u izvornim dokumentima, ali priznaje nedostatak preciznih detalja u tekstu.

Odgovor je toƒçan u prepoznavanju da kontekst spominje dogaƒëaje povezane s ruskom agresijom, ali ne navodi specifiƒçan datum invazije. Ovo pokazuje sposobnost LLM-a da razumije pru≈æene informacije, istovremeno prepoznajuƒái praznine u sadr≈æaju. LLM uƒçinkovito potiƒçe korisnika da potra≈æi vanjske izvore ili povijesne zapise za toƒçan datum, pokazujuƒái odreƒëenu razinu opreza kada su informacije nepotpune.

### Analiza izvora ƒçvorova
Izvorni ƒçvorovi u primjeru hibridnog pretra≈æivanja sadr≈æe isjeƒçke iz govora koji raspravlja o odgovoru SAD-a na ruske akcije u Ukrajini. Ovi ƒçvorovi nagla≈°avaju ≈°iri geopolitiƒçki utjecaj i korake koje su SAD i njihovi saveznici poduzeli kao odgovor na invaziju, ali ne spominju toƒçan datum invazije. Ovo je u skladu s odgovorom LLM-a, koji ispravno prepoznaje da kontekst ne sadr≈æi informacije o preciznom datumu.


In [72]:
# Initialize hybrid retriever and query engine
semantic_reranker_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID)
semantic_reranker_query_engine = RetrieverQueryEngine(retriever=semantic_reranker_retriever)

# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = semantic_reranker_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it mentions that the event occurred six days before the speech was given. To determine the precise date, one would need to know the date of the speech.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia‚Äôs Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world

### Hibridno s ponovnim rangiranjem: Analiza odgovora LLM-a
U primjeru hibridnog pristupa s ponovnim rangiranjem, odgovor LLM-a pru≈æa dodatni kontekst napominjuƒái da se dogaƒëaj dogodio ≈°est dana prije odr≈æavanja govora. To pokazuje da LLM mo≈æe zakljuƒçiti datum invazije na temelju vremena odr≈æavanja govora, iako mu je i dalje potrebno znati toƒçan datum govora za preciznost.

Ovaj odgovor demonstrira pobolj≈°anu sposobnost kori≈°tenja kontekstualnih tragova za pru≈æanje informativnijeg odgovora. Nagla≈°ava prednost ponovnog rangiranja, gdje LLM mo≈æe pristupiti i prioritizirati relevantnije informacije kako bi dao pribli≈æniji detalj (npr. datum invazije).

### Analiza izvora ƒçvorova
Izvori ƒçvorova u ovom primjeru ukljuƒçuju reference na vrijeme ruske invazije, konkretno spominjuƒái da se dogodila ≈°est dana prije govora. Iako toƒçan datum jo≈° uvijek nije eksplicitno naveden, ƒçvorovi pru≈æaju vremenski kontekst koji omoguƒáuje LLM-u da pru≈æi nijansiraniji odgovor. Ukljuƒçivanje ovog detalja pokazuje kako ponovno rangiranje mo≈æe pobolj≈°ati sposobnost LLM-a da izvuƒçe i zakljuƒçi informacije iz pru≈æenog konteksta, rezultirajuƒái toƒçnijim i informativnijim odgovorom.


**Napomena:**
U ovom bilje≈æniku koristili smo NVIDIA NIM mikrousluge iz NVIDIA API kataloga. Gore navedeni API-ji, `NVIDIA (llms)`, `NVIDIAEmbedding`, i [Azure AI Search Semantic Hybrid Retrieval (ugraƒëeno ponovno rangiranje)](https://learn.microsoft.com/azure/search/semantic-search-overview). Napomena, gore navedeni API-ji takoƒëer podr≈æavaju samostalno hostirane mikrousluge.

**Primjer:**
```python
NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")```



---

**Odricanje od odgovornosti**:  
Ovaj dokument je preveden kori≈°tenjem AI usluge za prevoƒëenje [Co-op Translator](https://github.com/Azure/co-op-translator). Iako nastojimo osigurati toƒçnost, imajte na umu da automatski prijevodi mogu sadr≈æavati pogre≈°ke ili netoƒçnosti. Izvorni dokument na izvornom jeziku treba smatrati mjerodavnim izvorom. Za kljuƒçne informacije preporuƒçuje se profesionalni prijevod od strane struƒçnjaka. Ne preuzimamo odgovornost za bilo kakva nesporazuma ili pogre≈°na tumaƒçenja koja mogu proizaƒái iz kori≈°tenja ovog prijevoda.
