# Azure AI Search med NVIDIA NIM og LlamaIndex Integration

I denne notebook vil vi demonstrere, hvordan man kan udnytte NVIDIA's AI-modeller og LlamaIndex til at skabe en kraftfuld Retrieval-Augmented Generation (RAG)-pipeline. Vi vil bruge NVIDIA's LLM'er og embeddings, integrere dem med Azure AI Search som vektorlagring, og udføre RAG for at forbedre søgekvalitet og effektivitet.

## Fordele
- **Skalerbarhed**: Udnyt NVIDIA's store sprogmodeller og Azure AI Search for skalerbar og effektiv søgning.
- **Omkostningseffektivitet**: Optimer søgning og hentning med effektiv vektorlagring og hybrid søgeteknikker.
- **Høj ydeevne**: Kombiner kraftfulde LLM'er med vektoriseret søgning for hurtigere og mere præcise svar.
- **Kvalitet**: Oprethold høj søgekvalitet ved at forankre LLM-svar med relevante hentede dokumenter.

## Forudsætninger
- 🐍 Python 3.9 eller nyere
- 🔗 [Azure AI Search Service](https://learn.microsoft.com/azure/search/)
- 🔗 NVIDIA API-nøgle for adgang til NVIDIA's LLM'er og Embeddings via NVIDIA NIM-mikrotjenester

## Dækkede funktioner
- ✅ NVIDIA LLM Integration (vi vil bruge [Phi-3.5-MOE](https://build.nvidia.com/microsoft/phi-3_5-moe))
- ✅ NVIDIA Embeddings (vi vil bruge [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5))
- ✅ Avancerede hentningstilstande i Azure AI Search
- ✅ Dokumentindeksering med LlamaIndex
- ✅ RAG ved hjælp af Azure AI Search og LlamaIndex med NVIDIA LLM'er

Lad os komme i gang!


In [None]:
!pip install azure-search-documents==11.5.1
!pip install --upgrade llama-index
!pip install --upgrade llama-index-core
!pip install --upgrade llama-index-readers-file
!pip install --upgrade llama-index-llms-nvidia
!pip install --upgrade llama-index-embeddings-nvidia
!pip install --upgrade llama-index-postprocessor-nvidia-rerank
!pip install --upgrade llama-index-vector-stores-azureaisearch
!pip install python-dotenv

## Installation og krav
Opret et Python-miljø ved hjælp af Python version >3.10.

## Kom godt i gang!


For at komme i gang, skal du bruge en `NVIDIA_API_KEY` for at anvende NVIDIA AI Foundation-modeller:  
1) Opret en gratis konto hos [NVIDIA](https://build.nvidia.com/explore/discover).  
2) Klik på den model, du ønsker at bruge.  
3) Under Input skal du vælge Python-fanen, klikke på **Get API Key** og derefter klikke på **Generate Key**.  
4) Kopiér og gem den genererede nøgle som NVIDIA_API_KEY. Derfra bør du have adgang til endepunkterne.  


In [3]:
import getpass
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key


## RAG Eksempel med LLM og Indlejring
### 1) Initialiser LLM
`llama-index-llms-nvidia`, også kendt som NVIDIAs LLM-connector, giver dig mulighed for at oprette forbindelse til og generere fra kompatible modeller, der er tilgængelige i NVIDIAs API-katalog. Se her for en liste over chat-kompletteringsmodeller: https://build.nvidia.com/search?term=Text-to-Text

Her vil vi bruge **mixtral-8x7b-instruct-v0.1**


In [75]:
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA

# Here we are using mixtral-8x7b-instruct-v0.1 model from API Catalog
Settings.llm = NVIDIA(model="microsoft/phi-3.5-moe-instruct", api_key=os.getenv("NVIDIA_API_KEY"))

### 2) Initialiser Embedding
`llama-index-embeddings-nvidia`, også kendt som NVIDIAs Embeddings connector, giver dig mulighed for at oprette forbindelse til og generere fra kompatible modeller, der er tilgængelige i NVIDIAs API-katalog. Vi valgte `nvidia/nv-embedqa-e5-v5` som embedding-modellen. Se her for en liste over tekst-embedding-modeller: https://build.nvidia.com/nim?filters=usecase%3Ausecase_text_to_embedding%2Cusecase%3Ausecase_image_to_embedding


In [6]:
from llama_index.embeddings.nvidia import NVIDIAEmbedding

Settings.embed_model = NVIDIAEmbedding(model="nvidia/nv-embedqa-e5-v5", api_key=os.getenv("NVIDIA_API_KEY"))

### 3) Opret en Azure AI Search Vector Store


In [76]:
import logging
import sys
import os
import getpass
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore, IndexManagement


search_service_api_key = os.getenv('AZURE_SEARCH_ADMIN_KEY') or getpass.getpass('Enter your Azure Search API key: ')
search_service_endpoint = os.getenv('AZURE_SEARCH_SERVICE_ENDPOINT') or getpass.getpass('Enter your Azure Search service endpoint: ')
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-nvidia-azureaisearch-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstrate using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

In [None]:
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1024, # dimensionality for nv-embedqa-e5-v5 model
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
    # compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)

In [20]:
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.text_splitter import TokenTextSplitter

# Configure text splitter (nv-embedqa-e5-v5 model has a limit of 512 tokens per input size)
text_splitter = TokenTextSplitter(separator=" ", chunk_size=500, chunk_overlap=10)

# Load documents
documents = SimpleDirectoryReader(
    input_files=["data/txt/state_of_the_union.txt"]
).load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index with text splitter
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    storage_context=storage_context,
)

### 5) Opret en forespørgselsmotor til at stille spørgsmål om dine data

Her er en forespørgsel, der bruger ren vektorsøgning i Azure AI Search og forankrer svaret til vores LLM (Phi-3.5-MOE)


In [69]:
query_engine = index.as_query_engine()
response = query_engine.query("Who did the speaker mention as being present in the chamber?")
display(Markdown(f"{response}"))

 The speaker mentioned the Ukrainian Ambassador to the United States, along with other members of Congress, the Cabinet, and various officials such as the Vice President, the First Lady, and the Second Gentleman, as being present in the chamber.

Her er en forespørgsel ved hjælp af hybrid søgning i Azure AI Search.


In [70]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from IPython.display import Markdown, display
from llama_index.core.schema import MetadataMode

# Initialize hybrid retriever and query engine
hybrid_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.HYBRID)
hybrid_query_engine = RetrieverQueryEngine(retriever=hybrid_retriever)

# Query execution
query = "What were the exact economic consequences mentioned in relation to Russia's stock market?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))

 The Russian stock market experienced a significant drop, losing 40% of its value. Additionally, trading had to be suspended due to the ongoing situation.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia’s lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 

We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 

Together with our allies –we are right now enforcing powerful economic sanctions. 

We are cutting off Russia’s largest banks from the international financial system.  



#### Analyse af vektorsøgning
LLM-svaret fanger præcist de vigtigste økonomiske konsekvenser nævnt i kildeteksten vedrørende det russiske aktiemarked. Det angiver specifikt, at det russiske aktiemarked oplevede et markant fald, hvor det mistede 40% af sin værdi, og at handel blev suspenderet på grund af den aktuelle situation. Dette svar stemmer godt overens med de oplysninger, der er givet i kilden, hvilket indikerer, at LLM korrekt identificerede og opsummerede de relevante detaljer om aktiemarkedets påvirkning som følge af Ruslands handlinger og de pålagte sanktioner.

#### Kommentarer til kildenoder
Kildenoderne giver en detaljeret beskrivelse af de økonomiske konsekvenser, som Rusland stod overfor på grund af internationale sanktioner. Teksten fremhæver, at det russiske aktiemarked mistede 40% af sin værdi, og at handel blev suspenderet. Derudover nævnes andre økonomiske følger, såsom devalueringen af Rubelen og den bredere isolation af Ruslands økonomi. LLM-svaret destillerede effektivt de centrale punkter fra disse noder og fokuserede på aktiemarkedets påvirkning, som forespurgt i forespørgslen.


Lad os se på en forespørgsel, hvor Hybrid Search ikke giver et velbegrundet svar:


In [71]:
# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it does mention that the events discussed are happening in the current era and that the actions taken are in response to Putin's aggression. For the precise date, one would need to refer to external sources or historical records.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.  

For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 

As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  

And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  

Putin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. 

And a proud Ukrainian people, who have known 30 years  of independence,

### Hybrid Search: LLM-responsanalyse
LLM-responsen i Hybrid Search-eksemplet indikerer, at den givne kontekst ikke specificerer den præcise dato for Ruslands invasion af Ukraine. Denne respons antyder, at LLM udnytter de tilgængelige oplysninger i kildedokumenterne, men anerkender manglen på præcise detaljer i teksten.

Responsen er korrekt i at identificere, at konteksten nævner begivenheder relateret til Ruslands aggression, men ikke angiver den specifikke invasionsdato. Dette viser LLM's evne til at forstå de givne oplysninger, samtidig med at den erkender mangler i indholdet. LLM opfordrer effektivt brugeren til at søge eksterne kilder eller historiske optegnelser for den præcise dato, hvilket demonstrerer en vis forsigtighed, når informationen er ufuldstændig.

### Analyse af kildenoder
Kildenoderne i Hybrid Search-eksemplet indeholder uddrag fra en tale, der diskuterer USA's reaktion på Ruslands handlinger i Ukraine. Disse noder fremhæver den bredere geopolitiske indvirkning og de skridt, som USA og dets allierede har taget som reaktion på invasionen, men de nævner ikke den specifikke invasionsdato. Dette stemmer overens med LLM-responsen, som korrekt identificerer, at konteksten mangler den præcise datoinformation.


In [72]:
# Initialize hybrid retriever and query engine
semantic_reranker_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID)
semantic_reranker_query_engine = RetrieverQueryEngine(retriever=semantic_reranker_retriever)

# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = semantic_reranker_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it mentions that the event occurred six days before the speech was given. To determine the precise date, one would need to know the date of the speech.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

### Hybrid w/Reranking: Analyse af LLM-svar
I eksemplet Hybrid w/Reranking giver LLM-svaret yderligere kontekst ved at bemærke, at begivenheden fandt sted seks dage før talen blev holdt. Dette indikerer, at LLM er i stand til at udlede datoen for invasionen baseret på tidspunktet for talen, selvom det stadig kræver at kende den præcise dato for talen for at være helt nøjagtig.

Dette svar demonstrerer en forbedret evne til at bruge kontekstuelle spor til at give et mere informativt svar. Det fremhæver fordelen ved reranking, hvor LLM kan tilgå og prioritere mere relevant information for at give en tættere approximation af den ønskede detalje (dvs. datoen for invasionen).

### Analyse af kildenoder
Kildenoderne i dette eksempel inkluderer referencer til tidspunktet for Ruslands invasion, specifikt med nævnelse af, at det skete seks dage før talen. Selvom den præcise dato stadig ikke er angivet eksplicit, giver noderne en tidsmæssig kontekst, der gør det muligt for LLM at give et mere nuanceret svar. Inklusionen af denne detalje viser, hvordan reranking kan forbedre LLM's evne til at udtrække og udlede information fra den givne kontekst, hvilket resulterer i et mere præcist og informativt svar.


**Bemærk:**
I denne notebook har vi brugt NVIDIA NIM-mikrotjenester fra NVIDIA API-kataloget.  
De ovennævnte API'er, `NVIDIA (llms)`, `NVIDIAEmbedding`, og [Azure AI Search Semantic Hybrid Retrieval (indbygget genrangering)](https://learn.microsoft.com/azure/search/semantic-search-overview). Bemærk, at de ovennævnte API'er også kan understøtte selvhostede mikrotjenester.

**Eksempel:**
```python
NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")```



---

**Ansvarsfraskrivelse**:  
Dette dokument er blevet oversat ved hjælp af AI-oversættelsestjenesten [Co-op Translator](https://github.com/Azure/co-op-translator). Selvom vi bestræber os på nøjagtighed, skal det bemærkes, at automatiserede oversættelser kan indeholde fejl eller unøjagtigheder. Det originale dokument på dets oprindelige sprog bør betragtes som den autoritative kilde. For kritisk information anbefales professionel menneskelig oversættelse. Vi påtager os ikke ansvar for eventuelle misforståelser eller fejltolkninger, der måtte opstå som følge af brugen af denne oversættelse.
