# Azure AI Search koos NVIDIA NIM-i ja LlamaIndexi integratsiooniga

Selles m√§rkmikus n√§itame, kuidas kasutada NVIDIA AI-mudeleid ja LlamaIndexi v√µimsa Retrieval-Augmented Generation (RAG) t√∂√∂voo loomiseks. Kasutame NVIDIA LLM-e ja embeddinguid, integreerime need Azure AI Searchi kui vektorpoodi ning viime l√§bi RAG-i, et parandada otsingu kvaliteeti ja t√µhusust.

## Eelised
- **Skaalautuvus**: Kasutage NVIDIA suuri keelemudeleid ja Azure AI Searchi skaleeritava ja t√µhusa p√§ringu jaoks.
- **Kuluefektiivsus**: Optimeerige otsingut ja p√§ringut t√µhusate vektorpoodide ja h√ºbriidotsingu tehnikatega.
- **K√µrge j√µudlus**: √úhendage v√µimsad LLM-id vektoriseeritud otsinguga kiiremate ja t√§psemate vastuste saamiseks.
- **Kvaliteet**: S√§ilitage otsingute k√µrge kvaliteet, toetades LLM-i vastuseid asjakohaste leitud dokumentidega.

## Eeltingimused
- üêç Python 3.9 v√µi uuem
- üîó [Azure AI Search teenus](https://learn.microsoft.com/azure/search/)
- üîó NVIDIA API-v√µti juurdep√§√§suks NVIDIA LLM-idele ja embeddingutele NVIDIA NIM mikroteenuste kaudu

## Kaetud funktsioonid
- ‚úÖ NVIDIA LLM integratsioon (kasutame [Phi-3.5-MOE](https://build.nvidia.com/microsoft/phi-3_5-moe))
- ‚úÖ NVIDIA Embeddings (kasutame [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5))
- ‚úÖ Azure AI Search t√§iustatud p√§ringure≈æiimid
- ‚úÖ Dokumentide indekseerimine LlamaIndexiga
- ‚úÖ RAG kasutades Azure AI Searchi ja LlamaIndexi koos NVIDIA LLM-idega

Alustame!


In [None]:
!pip install azure-search-documents==11.5.1
!pip install --upgrade llama-index
!pip install --upgrade llama-index-core
!pip install --upgrade llama-index-readers-file
!pip install --upgrade llama-index-llms-nvidia
!pip install --upgrade llama-index-embeddings-nvidia
!pip install --upgrade llama-index-postprocessor-nvidia-rerank
!pip install --upgrade llama-index-vector-stores-azureaisearch
!pip install python-dotenv

## Paigaldamine ja n√µuded
Loo Pythoni keskkond, kasutades Pythoni versiooni >3.10.

## Alustamine!


Alustamiseks vajate NVIDIA AI Foundation mudelite kasutamiseks `NVIDIA_API_KEY`:
1) Loo tasuta konto [NVIDIA](https://build.nvidia.com/explore/discover).
2) Kl√µpsake valitud mudelil.
3) Jaotise 'Input' alt valige vahekaart Python, kl√µpsake **Hangi API-v√µti** ja seej√§rel **Genereeri v√µti**.
4) Kopeerige ja salvestage genereeritud v√µti nimega NVIDIA_API_KEY. Seej√§rel peaksite p√§√§sema l√µpp-punktidele.


In [3]:
import getpass
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key


## RAG-i n√§ide, kasutades LLM-i ja embedimist
### 1) LLM-i initsialiseerimine
`llama-index-llms-nvidia`, tuntud ka kui NVIDIA LLM-√ºhendaja, v√µimaldab sul √ºhendada ja genereerida √ºhilduvatest mudelitest, mis on saadaval NVIDIA API kataloogis. Siit leiad nimekirja vestluste l√µpuleviimise mudelitest: https://build.nvidia.com/search?term=Text-to-Text

Siin kasutame **mixtral-8x7b-instruct-v0.1**


In [75]:
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA

# Here we are using mixtral-8x7b-instruct-v0.1 model from API Catalog
Settings.llm = NVIDIA(model="microsoft/phi-3.5-moe-instruct", api_key=os.getenv("NVIDIA_API_KEY"))

### 2) Embedingu initsialiseerimine
`llama-index-embeddings-nvidia`, tuntud ka kui NVIDIA Embeddingsi pistik, v√µimaldab √ºhendada ja genereerida √ºhilduvate mudelite abil, mis on saadaval NVIDIA API kataloogis. Valisime `nvidia/nv-embedqa-e5-v5` embedingu mudeliks. Siit leiate teksti embedingu mudelite nimekirja: https://build.nvidia.com/nim?filters=usecase%3Ausecase_text_to_embedding%2Cusecase%3Ausecase_image_to_embedding


In [6]:
from llama_index.embeddings.nvidia import NVIDIAEmbedding

Settings.embed_model = NVIDIAEmbedding(model="nvidia/nv-embedqa-e5-v5", api_key=os.getenv("NVIDIA_API_KEY"))

### 3) Loo Azure AI Search vektoriandmehoidla


In [76]:
import logging
import sys
import os
import getpass
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore, IndexManagement


search_service_api_key = os.getenv('AZURE_SEARCH_ADMIN_KEY') or getpass.getpass('Enter your Azure Search API key: ')
search_service_endpoint = os.getenv('AZURE_SEARCH_SERVICE_ENDPOINT') or getpass.getpass('Enter your Azure Search service endpoint: ')
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-nvidia-azureaisearch-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstrate using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

In [None]:
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1024, # dimensionality for nv-embedqa-e5-v5 model
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
    # compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)

### 4) Dokumentide laadimine, t√ºkeldamine ja √ºleslaadimine


In [20]:
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.text_splitter import TokenTextSplitter

# Configure text splitter (nv-embedqa-e5-v5 model has a limit of 512 tokens per input size)
text_splitter = TokenTextSplitter(separator=" ", chunk_size=500, chunk_overlap=10)

# Load documents
documents = SimpleDirectoryReader(
    input_files=["data/txt/state_of_the_union.txt"]
).load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index with text splitter
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    storage_context=storage_context,
)

### 5) Loo p√§ringu mootor, et esitada k√ºsimusi oma andmete kohta

Siin on p√§ring, mis kasutab puhtalt vektorip√µhist otsingut Azure AI Searchis ja seob vastuse meie LLM-iga (Phi-3.5-MOE)


In [69]:
query_engine = index.as_query_engine()
response = query_engine.query("Who did the speaker mention as being present in the chamber?")
display(Markdown(f"{response}"))

 The speaker mentioned the Ukrainian Ambassador to the United States, along with other members of Congress, the Cabinet, and various officials such as the Vice President, the First Lady, and the Second Gentleman, as being present in the chamber.

Siin on p√§ring, mis kasutab Azure AI Searchi h√ºbriidotsingut.


In [70]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from IPython.display import Markdown, display
from llama_index.core.schema import MetadataMode

# Initialize hybrid retriever and query engine
hybrid_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.HYBRID)
hybrid_query_engine = RetrieverQueryEngine(retriever=hybrid_retriever)

# Query execution
query = "What were the exact economic consequences mentioned in relation to Russia's stock market?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))

 The Russian stock market experienced a significant drop, losing 40% of its value. Additionally, trading had to be suspended due to the ongoing situation.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia‚Äôs lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 

We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 

Together with our allies ‚Äìwe are right now enforcing powerful economic sanctions. 

We are cutting off Russia‚Äôs largest banks from the international financial syste

#### Vektoriotsingu anal√º√ºs
LLM-i vastus kajastab t√§pselt allikteksti mainitud peamisi majanduslikke tagaj√§rgi seoses Venemaa aktsiaturuga. T√§psemalt v√§idab see, et Venemaa aktsiaturg langes m√§rkimisv√§√§rselt, kaotades 40% oma v√§√§rtusest, ja et kauplemine peatati j√§tkuva olukorra t√µttu. See vastus vastab h√§sti allikas esitatud teabele, mis viitab sellele, et LLM tuvastas ja kokku v√µttis √µigesti asjakohased √ºksikasjad seoses aktsiaturu m√µjuga, mis tulenes Venemaa tegevusest ja kehtestatud sanktsioonidest.

#### Allika s√µlmede kommentaar
Allika s√µlmed annavad √ºksikasjaliku √ºlevaate majanduslikest tagaj√§rgedest, millega Venemaa pidi silmitsi seisma rahvusvaheliste sanktsioonide t√µttu. Tekst r√µhutab, et Venemaa aktsiaturg kaotas 40% oma v√§√§rtusest ning kauplemine peatati. Lisaks mainitakse muid majanduslikke tagaj√§rgi, nagu rubla devalveerumine ja Venemaa majanduse laiem isoleerumine. LLM-i vastus destilleeris nendest s√µlmedest t√µhusalt kriitilised punktid, keskendudes aktsiaturu m√µjule, nagu p√§ring n√µudis.


N√º√ºd vaatame p√§ringut, kus Hybrid Search ei anna h√§sti p√µhjendatud vastust:


In [71]:
# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it does mention that the events discussed are happening in the current era and that the actions taken are in response to Putin's aggression. For the precise date, one would need to refer to external sources or historical records.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies ‚Äì in the event that Putin decides to keep moving west.  

For that purpose we‚Äôve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 

As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  

And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  

Putin has unleashed violence and chaos.  But while he may make gains on the battlefield ‚Äì he will pay a continuing high price over the long run. 

And a proud Ukrainian people, who have known 30 years  of indepen

### H√ºbriidotsing: LLM vastuse anal√º√ºs
LLM-i vastus h√ºbriidotsingu n√§ites viitab sellele, et antud kontekst ei t√§psusta Venemaa sissetungi Ukrainasse t√§pset kuup√§eva. See vastus viitab, et LLM kasutab l√§hte¬≠dokumentides olevat teavet, kuid tunnistab teksti t√§psete detailide puudumist. Vastus tuvastab √µigesti, et kontekst mainib Venemaa agressiooniga seotud s√ºndmusi, kuid ei m√§√§ra t√§pset sissetungi kuup√§eva. See n√§itab LLM-i v√µimet m√µista esitatud teavet ja samal ajal m√§rkida sisus olevaid l√ºnki. LLM juhib kasutajat t√µhusalt otsima t√§pset kuup√§eva v√§lisallikatest v√µi ajaloolistest andmetest, n√§idates ettevaatlikkust, kui teave on puudulik.

### Allika s√µlmede anal√º√ºs
H√ºbriidotsingu n√§ite allika¬≠s√µlmed sisaldavad v√§ljav√µtteid k√µnest, mis k√§sitleb USA vastust Venemaa tegevusele Ukrainas. Need s√µlmed r√µhutavad laiemat geopoliitilist m√µju ja samme, mida USA ja tema liitlased invasiooni vastuseks astusid, kuid ei maini konkreetset sissetungi kuup√§eva. See vastab LLM-i vastusele, mis √µigesti tuvastab, et kontekstist puudub t√§pne kuup√§eva teave.


In [72]:
# Initialize hybrid retriever and query engine
semantic_reranker_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID)
semantic_reranker_query_engine = RetrieverQueryEngine(retriever=semantic_reranker_retriever)

# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = semantic_reranker_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it mentions that the event occurred six days before the speech was given. To determine the precise date, one would need to know the date of the speech.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia‚Äôs Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world

### Hybrid w/Reranking: LLM vastuse anal√º√ºs
Selles Hybrid w/Reranking n√§ites annab LLM-i vastus lisakonteksti, m√§rkides, et s√ºndmus toimus kuus p√§eva enne k√µne toimumist. See n√§itab, et LLM suudab k√µne ajastuse p√µhjal sissetungi kuup√§eva j√§reldada, kuigi t√§psuse saavutamiseks peab ta ikkagi teadma k√µne t√§pset kuup√§eva.

See vastus demonstreerib paranenud v√µimet kasutada konteksti vihjeid, et anda informatiivsem vastus. See toob esile rerankingu eelise, kus LLM saab juurde p√§√§seda ja eelistada asjakohasemat teavet, et anda soovitud detaili (s.t. sissetungi kuup√§eva) l√§hemale osutav hinnang.

### Allikate s√µlmede anal√º√ºs
Selles n√§ites sisaldavad allika s√µlmed viiteid Venemaa sissetungi ajastusele, mainides t√§psemalt, et see toimus kuus p√§eva enne k√µnet. Kuigi t√§pset kuup√§eva ei ole siiski otseselt v√§lja toodud, annavad s√µlmed ajakonteksti, mis v√µimaldab LLM-il anda n√ºansirikkama vastuse. Selle detaili kaasamine n√§itab, kuidas reranking v√µib parandada LLM-i v√µimet v√§ljav√µtta ja j√§reldada teavet antud kontekstist, mille tulemuseks on t√§psem ja informatiivsem vastus.


**M√§rkus:**
Selles m√§rkmikus kasutasime NVIDIA NIM mikroteenuseid NVIDIA API kataloogist.
√úlaltoodud API-d, `NVIDIA (llms)`, `NVIDIAEmbedding`, ja [Azure AI Search semantiline h√ºbriidotsing (sisseehitatud √ºmberj√§rjestamine)](https://learn.microsoft.com/azure/search/semantic-search-overview). Pange t√§hele, et √ºlaltoodud API-d v√µivad toetada ka enesehostitud mikroteenuseid. 

**N√§ide:**
```python
NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")


---

<!-- CO-OP TRANSLATOR DISCLAIMER START -->
Vastutusest loobumine:
See dokument on t√µlgitud tehisintellekti t√µlketeenuse Co-op Translator (https://github.com/Azure/co-op-translator) abil. Kuigi me p√º√ºame tagada t√§psust, arvestage palun, et automatiseeritud t√µlked v√µivad sisaldada vigu v√µi ebat√§psusi. Algkeeles olevat originaaldokumenti tuleks pidada autoriteetseks allikaks. Olulise teabe puhul soovitatakse kasutada professionaalset inimt√µlget. Me ei vastuta mistahes arusaamatuste v√µi v√§√§rt√µlgenduste eest, mis tulenevad selle t√µlke kasutamisest.
<!-- CO-OP TRANSLATOR DISCLAIMER END -->
