# Azure AI Search z integracijo NVIDIA NIM in LlamaIndex

V tej beležnici bomo prikazali, kako uporabiti NVIDIA-jeve AI modele in LlamaIndex za ustvarjanje zmogljivega cevovoda za pridobivanje z izboljšano generacijo (RAG). Uporabili bomo NVIDIA-jeve velike jezikovne modele (LLM) in vektorske predstavitve, jih integrirali z Azure AI Search kot vektorsko shrambo ter izvedli RAG za izboljšanje kakovosti in učinkovitosti iskanja.

## Prednosti
- **Prilagodljivost**: Uporabite NVIDIA-jeve velike jezikovne modele in Azure AI Search za prilagodljivo in učinkovito pridobivanje.
- **Stroškovna učinkovitost**: Optimizirajte iskanje in pridobivanje z učinkovito vektorsko shrambo in hibridnimi tehnikami iskanja.
- **Visoka zmogljivost**: Združite zmogljive LLM-je z vektoriziranim iskanjem za hitrejše in natančnejše odgovore.
- **Kakovost**: Ohranite visoko kakovost iskanja z utemeljevanjem odgovorov LLM z ustreznimi pridobljenimi dokumenti.

## Predpogoji
- 🐍 Python 3.9 ali novejši
- 🔗 [Azure AI Search Service](https://learn.microsoft.com/azure/search/)
- 🔗 NVIDIA API ključ za dostop do NVIDIA-jevih LLM-jev in vektorskih predstavitev prek mikroservisov NVIDIA NIM

## Pokrite funkcionalnosti
- ✅ Integracija NVIDIA LLM (uporabili bomo [Phi-3.5-MOE](https://build.nvidia.com/microsoft/phi-3_5-moe))
- ✅ NVIDIA-jeve vektorske predstavitve (uporabili bomo [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5))
- ✅ Napredni načini pridobivanja v Azure AI Search
- ✅ Indeksiranje dokumentov z LlamaIndex
- ✅ RAG z uporabo Azure AI Search in LlamaIndex z NVIDIA LLM-ji

Začnimo!


In [None]:
!pip install azure-search-documents==11.5.1
!pip install --upgrade llama-index
!pip install --upgrade llama-index-core
!pip install --upgrade llama-index-readers-file
!pip install --upgrade llama-index-llms-nvidia
!pip install --upgrade llama-index-embeddings-nvidia
!pip install --upgrade llama-index-postprocessor-nvidia-rerank
!pip install --upgrade llama-index-vector-stores-azureaisearch
!pip install python-dotenv

## Namestitev in zahteve
Ustvarite Python okolje z uporabo različice Pythona >3.10.

## Začetek!


Za začetek potrebujete `NVIDIA_API_KEY`, da lahko uporabljate modele NVIDIA AI Foundation:
1) Ustvarite brezplačen račun pri [NVIDIA](https://build.nvidia.com/explore/discover).
2) Kliknite na model, ki ga želite uporabljati.
3) Pod možnostjo Vnos izberite zavihek Python in kliknite **Get API Key**, nato pa kliknite **Generate Key**.
4) Kopirajte in shranite ustvarjeni ključ kot NVIDIA_API_KEY. Od tu naprej boste imeli dostop do končnih točk.


In [3]:
import getpass
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key


## Primer RAG z uporabo LLM in vdelave
### 1) Inicializacija LLM
`llama-index-llms-nvidia`, znan tudi kot NVIDIA-jev LLM konektor, vam omogoča povezovanje in generiranje iz združljivih modelov, ki so na voljo v katalogu API-jev NVIDIA. Tukaj je seznam modelov za dokončanje pogovorov: https://build.nvidia.com/search?term=Text-to-Text

Tukaj bomo uporabili **mixtral-8x7b-instruct-v0.1**


In [75]:
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA

# Here we are using mixtral-8x7b-instruct-v0.1 model from API Catalog
Settings.llm = NVIDIA(model="microsoft/phi-3.5-moe-instruct", api_key=os.getenv("NVIDIA_API_KEY"))

### 2) Inicializirajte vdelavo
`llama-index-embeddings-nvidia`, znan tudi kot NVIDIA-jev konektor za vdelave, omogoča povezovanje in generiranje iz združljivih modelov, ki so na voljo v katalogu API-jev NVIDIA. Izbrali smo `nvidia/nv-embedqa-e5-v5` kot model za vdelavo. Tukaj si lahko ogledate seznam modelov za vdelavo besedila: https://build.nvidia.com/nim?filters=usecase%3Ausecase_text_to_embedding%2Cusecase%3Ausecase_image_to_embedding


In [6]:
from llama_index.embeddings.nvidia import NVIDIAEmbedding

Settings.embed_model = NVIDIAEmbedding(model="nvidia/nv-embedqa-e5-v5", api_key=os.getenv("NVIDIA_API_KEY"))

### 3) Ustvarite Azure AI Search Vector Store


In [76]:
import logging
import sys
import os
import getpass
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore, IndexManagement


search_service_api_key = os.getenv('AZURE_SEARCH_ADMIN_KEY') or getpass.getpass('Enter your Azure Search API key: ')
search_service_endpoint = os.getenv('AZURE_SEARCH_SERVICE_ENDPOINT') or getpass.getpass('Enter your Azure Search service endpoint: ')
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-nvidia-azureaisearch-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstrate using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

In [None]:
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1024, # dimensionality for nv-embedqa-e5-v5 model
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
    # compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)

In [20]:
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.text_splitter import TokenTextSplitter

# Configure text splitter (nv-embedqa-e5-v5 model has a limit of 512 tokens per input size)
text_splitter = TokenTextSplitter(separator=" ", chunk_size=500, chunk_overlap=10)

# Load documents
documents = SimpleDirectoryReader(
    input_files=["data/txt/state_of_the_union.txt"]
).load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index with text splitter
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    storage_context=storage_context,
)

### 5) Ustvarite iskalnik za postavljanje vprašanj o vaših podatkih

Tukaj je poizvedba, ki uporablja čisto iskanje vektorjev v Azure AI Search in utemeljuje odgovor na našem LLM (Phi-3.5-MOE)


In [69]:
query_engine = index.as_query_engine()
response = query_engine.query("Who did the speaker mention as being present in the chamber?")
display(Markdown(f"{response}"))

 The speaker mentioned the Ukrainian Ambassador to the United States, along with other members of Congress, the Cabinet, and various officials such as the Vice President, the First Lady, and the Second Gentleman, as being present in the chamber.

Tukaj je poizvedba z uporabo hibridnega iskanja v Azure AI Search.


In [70]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from IPython.display import Markdown, display
from llama_index.core.schema import MetadataMode

# Initialize hybrid retriever and query engine
hybrid_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.HYBRID)
hybrid_query_engine = RetrieverQueryEngine(retriever=hybrid_retriever)

# Query execution
query = "What were the exact economic consequences mentioned in relation to Russia's stock market?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))

 The Russian stock market experienced a significant drop, losing 40% of its value. Additionally, trading had to be suspended due to the ongoing situation.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia’s lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 

We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 

Together with our allies –we are right now enforcing powerful economic sanctions. 

We are cutting off Russia’s largest banks from the international financial system.  



#### Analiza iskanja vektorjev
Odgovor LLM natančno zajema ključne ekonomske posledice, omenjene v izvorni besedilni vsebini glede ruskega borznega trga. Konkretno navaja, da je ruski borzni trg doživel znaten padec, saj je izgubil 40 % svoje vrednosti, in da je bilo trgovanje ustavljeno zaradi trenutne situacije. Ta odgovor se dobro ujema z informacijami, podanimi v viru, kar kaže, da je LLM pravilno identificiral in povzel ustrezne podrobnosti o vplivu na borzni trg kot posledici ruskih dejanj in uvedenih sankcij.

#### Komentar o izvornih vozliščih
Izvorna vozlišča podajajo podroben opis ekonomskih posledic, s katerimi se je Rusija soočila zaradi mednarodnih sankcij. Besedilo poudarja, da je ruski borzni trg izgubil 40 % svoje vrednosti in da je bilo trgovanje ustavljeno. Poleg tega omenja druge ekonomske posledice, kot so razvrednotenje rublja in širša izolacija ruskega gospodarstva. Odgovor LLM je učinkovito povzel ključne točke iz teh vozlišč, pri čemer se je osredotočil na vpliv na borzni trg, kot je bilo zahtevano v poizvedbi.


Zdaj si oglejmo poizvedbo, kjer hibridno iskanje ne ponudi dobro utemeljenega odgovora:


In [71]:
# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it does mention that the events discussed are happening in the current era and that the actions taken are in response to Putin's aggression. For the precise date, one would need to refer to external sources or historical records.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.  

For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 

As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  

And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  

Putin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. 

And a proud Ukrainian people, who have known 30 years  of independence,

### Hibridno iskanje: Analiza odziva LLM
Odziv LLM v primeru hibridnega iskanja nakazuje, da podani kontekst ne vsebuje natančnega datuma ruske invazije na Ukrajino. Ta odziv kaže, da LLM uporablja informacije iz izvornih dokumentov, hkrati pa priznava odsotnost natančnih podrobnosti v besedilu.

Odziv je natančen pri ugotavljanju, da kontekst omenja dogodke, povezane z rusko agresijo, vendar ne določa specifičnega datuma invazije. To prikazuje sposobnost LLM, da razume podane informacije, hkrati pa prepozna vrzeli v vsebini. LLM učinkovito spodbuja uporabnika, naj poišče zunanje vire ali zgodovinske zapise za natančen datum, kar kaže na določeno previdnost, kadar so informacije nepopolne.

### Analiza izvornih vozlišč
Izvorna vozlišča v primeru hibridnega iskanja vsebujejo odlomke iz govora, ki obravnava odziv ZDA na ruske ukrepe v Ukrajini. Ta vozlišča poudarjajo širši geopolitični vpliv in korake, ki so jih ZDA in njihovi zavezniki sprejeli kot odziv na invazijo, vendar ne omenjajo specifičnega datuma invazije. To je skladno z odzivom LLM, ki pravilno ugotavlja, da kontekst ne vsebuje natančnih informacij o datumu.


In [72]:
# Initialize hybrid retriever and query engine
semantic_reranker_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID)
semantic_reranker_query_engine = RetrieverQueryEngine(retriever=semantic_reranker_retriever)

# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = semantic_reranker_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it mentions that the event occurred six days before the speech was given. To determine the precise date, one would need to know the date of the speech.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

### Hibridni pristop z vnovičnim razvrščanjem: Analiza odziva LLM
V primeru hibridnega pristopa z vnovičnim razvrščanjem odziv LLM ponuja dodatni kontekst, saj omenja, da se je dogodek zgodil šest dni pred govorom. To kaže, da je LLM sposoben sklepati datum invazije na podlagi časa govora, čeprav za natančnost še vedno potrebuje točen datum govora.

Ta odziv prikazuje izboljšano sposobnost uporabe kontekstualnih namigov za zagotavljanje bolj informativnega odgovora. Izpostavlja prednost vnovičnega razvrščanja, kjer lahko LLM dostopa do bolj relevantnih informacij in jim daje prednost, da poda bolj natančno oceno želenega podatka (tj. datuma invazije).

### Analiza izvornih vozlišč
Izvorna vozlišča v tem primeru vključujejo sklice na časovni okvir ruske invazije, pri čemer posebej omenjajo, da se je zgodila šest dni pred govorom. Čeprav točen datum še vedno ni izrecno naveden, vozlišča zagotavljajo časovni kontekst, ki omogoča LLM, da poda bolj niansiran odgovor. Vključitev te podrobnosti prikazuje, kako lahko vnovično razvrščanje izboljša sposobnost LLM za pridobivanje in sklepanje informacij iz danega konteksta, kar vodi do bolj natančnega in informativnega odgovora.


**Opomba:**
V tem zvezku smo uporabili mikro storitve NVIDIA NIM iz kataloga API-jev NVIDIA. 
Zgornji API-ji, `NVIDIA (llms)`, `NVIDIAEmbedding` in [Azure AI Search Semantic Hybrid Retrieval (vgrajeno ponovno razvrščanje)](https://learn.microsoft.com/azure/search/semantic-search-overview). Opomba, zgornji API-ji lahko podpirajo tudi mikro storitve, ki jih gostite sami.

**Primer:**
```python
NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")```



---

**Omejitev odgovornosti**:  
Ta dokument je bil preveden z uporabo storitve za prevajanje z umetno inteligenco [Co-op Translator](https://github.com/Azure/co-op-translator). Čeprav si prizadevamo za natančnost, vas prosimo, da upoštevate, da lahko avtomatizirani prevodi vsebujejo napake ali netočnosti. Izvirni dokument v njegovem maternem jeziku je treba obravnavati kot avtoritativni vir. Za ključne informacije priporočamo profesionalni človeški prevod. Ne prevzemamo odgovornosti za morebitna nesporazumevanja ali napačne razlage, ki bi nastale zaradi uporabe tega prevoda.
