# Azure AI Search na Muunganiko wa NVIDIA NIM na LlamaIndex

Katika daftari hili, tutaonyesha jinsi ya kutumia mifano ya AI ya NVIDIA na LlamaIndex kuunda mfumo wenye nguvu wa Retrieval-Augmented Generation (RAG). Tutaunganisha LLMs na embeddings za NVIDIA na Azure AI Search kama hifadhi ya vector, na kutekeleza RAG ili kuboresha ubora na ufanisi wa utafutaji.

## Faida
- **Uwezo wa Kupanuka**: Tumia mifano mikubwa ya lugha ya NVIDIA na Azure AI Search kwa utafutaji unaoweza kupanuka na ufanisi.
- **Ufanisi wa Gharama**: Boresha utafutaji na urejeshaji kwa kutumia hifadhi ya vector yenye ufanisi na mbinu za utafutaji mseto.
- **Utendaji wa Juu**: Changanya LLMs zenye nguvu na utafutaji uliovectorishwa kwa majibu ya haraka na sahihi zaidi.
- **Ubora**: Dumisha ubora wa utafutaji kwa kuhusisha majibu ya LLM na nyaraka husika zilizorejeshwa.

## Mahitaji ya Awali
- 🐍 Python 3.9 au zaidi
- 🔗 [Huduma ya Azure AI Search](https://learn.microsoft.com/azure/search/)
- 🔗 Kitufe cha API cha NVIDIA kwa ufikiaji wa LLMs na Embeddings za NVIDIA kupitia huduma ndogo za NVIDIA NIM

## Vipengele Vilivyofunikwa
- ✅ Muunganiko wa LLM za NVIDIA (tutatumia [Phi-3.5-MOE](https://build.nvidia.com/microsoft/phi-3_5-moe))
- ✅ Embeddings za NVIDIA (tutatumia [nv-embedqa-e5-v5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5))
- ✅ Njia za Kipekee za Urejeshaji wa Azure AI Search
- ✅ Kuorodhesha Nyaraka kwa kutumia LlamaIndex
- ✅ RAG kwa kutumia Azure AI Search na LlamaIndex pamoja na LLMs za NVIDIA

Tuanzie sasa!


In [None]:
!pip install azure-search-documents==11.5.1
!pip install --upgrade llama-index
!pip install --upgrade llama-index-core
!pip install --upgrade llama-index-readers-file
!pip install --upgrade llama-index-llms-nvidia
!pip install --upgrade llama-index-embeddings-nvidia
!pip install --upgrade llama-index-postprocessor-nvidia-rerank
!pip install --upgrade llama-index-vector-stores-azureaisearch
!pip install python-dotenv

## Usakinishaji na Mahitaji  
Unda mazingira ya Python ukitumia toleo la Python >3.10.  

## Kuanza!  


Ili kuanza, unahitaji `NVIDIA_API_KEY` ili kutumia mifano ya NVIDIA AI Foundation:  
1) Tengeneza akaunti ya bure na [NVIDIA](https://build.nvidia.com/explore/discover).  
2) Bonyeza kwenye mfano wa chaguo lako.  
3) Chini ya Ingizo, chagua kichupo cha Python, kisha bonyeza **Get API Key** na kisha bonyeza **Generate Key**.  
4) Nakili na hifadhi funguo iliyotengenezwa kama NVIDIA_API_KEY. Kutoka hapo, utakuwa na ufikiaji wa sehemu za mwisho.  


In [3]:
import getpass
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key


## Mfano wa RAG ukitumia LLM na Embedding
### 1) Anzisha LLM
`llama-index-llms-nvidia`, inayojulikana pia kama kiunganishi cha LLM cha NVIDIA, inakuwezesha kuunganishwa na kuzalisha kutoka kwa mifano inayopatikana kwenye katalogi ya API ya NVIDIA. Tazama hapa kwa orodha ya mifano ya kukamilisha mazungumzo: https://build.nvidia.com/search?term=Text-to-Text

Hapa tutatumia **mixtral-8x7b-instruct-v0.1**


In [75]:
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA

# Here we are using mixtral-8x7b-instruct-v0.1 model from API Catalog
Settings.llm = NVIDIA(model="microsoft/phi-3.5-moe-instruct", api_key=os.getenv("NVIDIA_API_KEY"))

### 2) Anzisha Embedding  
`llama-index-embeddings-nvidia`, inayojulikana pia kama Kiongeza cha Embeddings cha NVIDIA, hukuruhusu kuunganishwa na kuzalisha kutoka kwa mifano inayooana inayopatikana kwenye orodha ya API ya NVIDIA. Tulichagua `nvidia/nv-embedqa-e5-v5` kama mfano wa embedding. Tazama hapa kwa orodha ya mifano ya embedding ya maandishi: https://build.nvidia.com/nim?filters=usecase%3Ausecase_text_to_embedding%2Cusecase%3Ausecase_image_to_embedding


In [6]:
from llama_index.embeddings.nvidia import NVIDIAEmbedding

Settings.embed_model = NVIDIAEmbedding(model="nvidia/nv-embedqa-e5-v5", api_key=os.getenv("NVIDIA_API_KEY"))

In [76]:
import logging
import sys
import os
import getpass
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore, IndexManagement


search_service_api_key = os.getenv('AZURE_SEARCH_ADMIN_KEY') or getpass.getpass('Enter your Azure Search API key: ')
search_service_endpoint = os.getenv('AZURE_SEARCH_SERVICE_ENDPOINT') or getpass.getpass('Enter your Azure Search service endpoint: ')
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-nvidia-azureaisearch-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstrate using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

In [None]:
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1024, # dimensionality for nv-embedqa-e5-v5 model
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
    # compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)

In [20]:
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.text_splitter import TokenTextSplitter

# Configure text splitter (nv-embedqa-e5-v5 model has a limit of 512 tokens per input size)
text_splitter = TokenTextSplitter(separator=" ", chunk_size=500, chunk_overlap=10)

# Load documents
documents = SimpleDirectoryReader(
    input_files=["data/txt/state_of_the_union.txt"]
).load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index with text splitter
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    storage_context=storage_context,
)

### 5) Unda Injini ya Maswali ili kuuliza maswali kuhusu data yako

Hapa kuna swali linalotumia utafutaji wa vekta safi katika Azure AI Search na kuhusisha jibu na LLM yetu (Phi-3.5-MOE)


In [69]:
query_engine = index.as_query_engine()
response = query_engine.query("Who did the speaker mention as being present in the chamber?")
display(Markdown(f"{response}"))

 The speaker mentioned the Ukrainian Ambassador to the United States, along with other members of Congress, the Cabinet, and various officials such as the Vice President, the First Lady, and the Second Gentleman, as being present in the chamber.

Hapa kuna swali linalotumia utafutaji wa mseto katika Azure AI Search.


In [70]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from IPython.display import Markdown, display
from llama_index.core.schema import MetadataMode

# Initialize hybrid retriever and query engine
hybrid_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.HYBRID)
hybrid_query_engine = RetrieverQueryEngine(retriever=hybrid_retriever)

# Query execution
query = "What were the exact economic consequences mentioned in relation to Russia's stock market?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))

 The Russian stock market experienced a significant drop, losing 40% of its value. Additionally, trading had to be suspended due to the ongoing situation.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia’s lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. 

We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. 

Together with our allies –we are right now enforcing powerful economic sanctions. 

We are cutting off Russia’s largest banks from the international financial system.  



#### Uchambuzi wa Utafutaji wa Vector
Jibu la LLM limeeleza kwa usahihi matokeo makuu ya kiuchumi yaliyotajwa katika maandishi ya chanzo kuhusu soko la hisa la Urusi. Hasa, linaeleza kuwa soko la hisa la Urusi liliporomoka kwa kiwango kikubwa, likipoteza 40% ya thamani yake, na kwamba biashara ilisitishwa kutokana na hali inayoendelea. Jibu hili linaendana vyema na taarifa zilizotolewa katika chanzo, likionyesha kuwa LLM ilitambua na kufupisha kwa usahihi maelezo muhimu kuhusu athari za soko la hisa kutokana na hatua za Urusi na vikwazo vilivyowekwa.

#### Maoni Kuhusu Vyanzo vya Habari
Vyanzo vya habari vinatoa maelezo ya kina kuhusu matokeo ya kiuchumi ambayo Urusi ilikumbana nayo kutokana na vikwazo vya kimataifa. Maandishi yanaonyesha kuwa soko la hisa la Urusi lilipoteza 40% ya thamani yake, na biashara ilisitishwa. Zaidi ya hayo, yanataja athari nyingine za kiuchumi, kama kushuka kwa thamani ya Ruble na kutengwa zaidi kwa uchumi wa Urusi. Jibu la LLM lilifupisha kwa ufanisi hoja muhimu kutoka kwa vyanzo hivi, likilenga athari za soko la hisa kama ilivyoombwa na swali.


Sasa, hebu tuangalie swali ambapo Hybrid Search haipatii jibu lenye msingi mzuri:


In [71]:
# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = hybrid_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it does mention that the events discussed are happening in the current era and that the actions taken are in response to Putin's aggression. For the precise date, one would need to refer to external sources or historical records.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.  

For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. 

As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  

And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  

Putin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. 

And a proud Ukrainian people, who have known 30 years  of independence,

### Utafutaji Mseto: Uchambuzi wa Majibu ya LLM
Jibu la LLM katika mfano wa Utafutaji Mseto linaonyesha kuwa muktadha uliotolewa haujaainisha tarehe halisi ya uvamizi wa Urusi nchini Ukraine. Jibu hili linapendekeza kwamba LLM inatumia taarifa zilizopo kwenye nyaraka za chanzo lakini inakubali ukosefu wa maelezo mahususi katika maandishi.

Jibu hili ni sahihi katika kubaini kuwa muktadha unataja matukio yanayohusiana na uchokozi wa Urusi lakini haujaonyesha tarehe mahususi ya uvamizi. Hii inaonyesha uwezo wa LLM wa kuelewa taarifa zilizotolewa huku ikitambua mapungufu yaliyopo kwenye maudhui. LLM inamshawishi mtumiaji kutafuta vyanzo vya nje au kumbukumbu za kihistoria kwa ajili ya tarehe halisi, ikionyesha kiwango cha tahadhari wakati taarifa hazijakamilika.

### Uchambuzi wa Vifungo vya Chanzo
Vifungo vya chanzo katika mfano wa Utafutaji Mseto vinajumuisha dondoo kutoka kwa hotuba inayojadili mwitikio wa Marekani kwa hatua za Urusi nchini Ukraine. Vifungo hivi vinaangazia athari pana za kijiografia na hatua zilizochukuliwa na Marekani pamoja na washirika wake kujibu uvamizi huo, lakini havitaji tarehe mahususi ya uvamizi. Hii inalingana na jibu la LLM, ambalo kwa usahihi linabaini kuwa muktadha hauna taarifa za tarehe halisi.


In [72]:
# Initialize hybrid retriever and query engine
semantic_reranker_retriever = index.as_retriever(vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID)
semantic_reranker_query_engine = RetrieverQueryEngine(retriever=semantic_reranker_retriever)

# Query execution
query = "What was the precise date when Russia invaded Ukraine?"
response = semantic_reranker_query_engine.query(query)

# Display the response
display(Markdown(f"{response}"))
print("\n")

# Print the source nodes
print("Source Nodes:")
for node in response.source_nodes:
    print(node.get_content(metadata_mode=MetadataMode.LLM))


 The provided context does not specify the exact date of Russia's invasion of Ukraine. However, it mentions that the event occurred six days before the speech was given. To determine the precise date, one would need to know the date of the speech.



Source Nodes:
file_path: data\txt\state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

### Hybrid w/Reranking: Uchambuzi wa Majibu ya LLM
Katika mfano wa Hybrid w/Reranking, jibu la LLM linatoa muktadha wa ziada kwa kutaja kwamba tukio lilitokea siku sita kabla ya hotuba kutolewa. Hii inaonyesha kwamba LLM inaweza kufasiri tarehe ya uvamizi kulingana na muda wa hotuba, ingawa bado inahitaji kujua tarehe halisi ya hotuba kwa usahihi.

Jibu hili linaonyesha uwezo ulioboreshwa wa kutumia vidokezo vya muktadha ili kutoa jibu lenye taarifa zaidi. Linaangazia faida ya reranking, ambapo LLM inaweza kufikia na kuzingatia taarifa muhimu zaidi ili kutoa makadirio ya karibu ya undani unaotakiwa (yaani, tarehe ya uvamizi).

### Uchambuzi wa Nodes za Chanzo
Nodes za chanzo katika mfano huu zinajumuisha marejeleo ya muda wa uvamizi wa Urusi, hasa kutaja kwamba ulitokea siku sita kabla ya hotuba. Ingawa tarehe halisi bado haijatajwa wazi, nodes zinatoa muktadha wa muda unaoruhusu LLM kutoa jibu lenye ufahamu zaidi. Ujumuishaji wa undani huu unaonyesha jinsi reranking inavyoweza kuboresha uwezo wa LLM wa kutoa na kufasiri taarifa kutoka kwa muktadha uliotolewa, na kusababisha jibu sahihi na lenye taarifa zaidi.


**Kumbuka:**
Katika daftari hili, tumetumia huduma ndogo za NVIDIA NIM kutoka Katalogi ya API ya NVIDIA.  
API zilizo juu, `NVIDIA (llms)`, `NVIDIAEmbedding`, na [Azure AI Search Semantic Hybrid Retrieval (iliyojengwa kwa upya)](https://learn.microsoft.com/azure/search/semantic-search-overview). Kumbuka, API zilizo juu pia zinaweza kusaidia huduma ndogo zinazojisimamia.

**Mfano:**
```python
NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")```



---

**Kanusho**:  
Hati hii imetafsiriwa kwa kutumia huduma ya tafsiri ya AI [Co-op Translator](https://github.com/Azure/co-op-translator). Ingawa tunajitahidi kwa usahihi, tafadhali fahamu kuwa tafsiri za kiotomatiki zinaweza kuwa na makosa au kutokuwa sahihi. Hati ya asili katika lugha yake ya awali inapaswa kuzingatiwa kama chanzo cha mamlaka. Kwa taarifa muhimu, inashauriwa kutumia huduma ya tafsiri ya kitaalamu ya binadamu. Hatutawajibika kwa maelewano mabaya au tafsiri zisizo sahihi zinazotokana na matumizi ya tafsiri hii.
