## About this example
This sample shows how to create an index from local files or an existing ai search index. The result MLIndex file will be generated locally.

This sample is useful for developers and data scientists who wish to use their data to create an Index which can be used in the RAG pattern.

This sample shows how to:
- create an index locally
- register a local index to cloud
- retrieve index from the cloud
- consume an index in langchain

## Before you begin

### Parameters

In [None]:
# names of indexes we will create
local_index_name = "local-test"

# project details
subscription_id: str = "<your-subscription-id>"
resource_group_name: str = "<your-resource-group>"
project_name: str = "<your-project-name>"

# model used for embedding
embedding_model_aoai: str = "text-embedding-ada-002"
deployment_name_aoai: str = "text-embedding-ada-002"
embedding_model_cohere: str = "cohere-embed-v3-multilingual" # or "cohere-embed-v3-english"

# connection details
ai_search_connection_name: str = "<your-ai-search-connection>"
aoai_connection_name: str = "<your-aoai-connection>"
# serverless_connection_name: str = "<your-serverless-connection>"

import os
# set credentials to your Azure OpenAI instance
os.environ["OPENAI_API_VERSION"] = "2023-07-01-preview"
os.environ["AZURE_OPENAI_KEY"] = "api-key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "api-endpoint"

# set credentials to your Azure AI Search instance
os.environ["AZURE_AI_SEARCH_KEY"] = "<your-acs-key>"
os.environ["AZURE_AI_SEARCH_ENDPOINT"] = "<your-acs-endpoint>"

### Connect to your project

To start with let us create a config file with your project details. This file can be used in this sample or other samples to connect to your workspace. To get the required details, you can go to the Project Overview page in the AI Studio.

In [None]:
import json
from pathlib import Path

config = {
    "subscription_id": subscription_id,
    "resource_group": resource_group_name,
    "project_name": project_name,
}

p = Path("config.json")

with p.open(mode="w") as file:
    file.write(json.dumps(config))

Initalize MlClient to interact with resources in your Azure AI Studio.

Please make sure you have connections for your embedding model and Azure AI Search in this workspace. 

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

client=MLClient.from_config(DefaultAzureCredential(), path="./config.json")

### Retrieve connections to embedding model and AI Search
We will use an Azure Open AI service to access the LLM and embedding model. We will also use an Azure Cognitive Search to store the index. Let us get the details of these from your project.

In [None]:
aoai_connection = client.connections.get(aoai_connection_name)
ai_search_connection = client.connections.get(ai_search_connection_name)
# serverless_connection = client.connections.get(serverless_connection_name)

print(f"aoai connection id is {aoai_connection.id}")
print(f"aoai connection id is {ai_search_connection.id}")

### Input types

You can build index from local files or an existing ai search index

In [None]:
from promptflow.rag.config import AzureAISearchSource, LocalSource
# local files
input_source_local = LocalSource(input_data="product-info/")

# existing ai search index 
# keys might be different, please refer to your MLIndex
input_source = AzureAISearchSource(ai_search_index_name="<index-name>",
                                   ai_search_content_key="content",
                                   ai_search_embedding_key="contentVector",
                                   ai_search_title_key="title",
                                   ai_search_metadata_key="meta_json_string",
                                   ai_search_connection_id="<connection-id>"
                                 )

## Build index locally

### 1. With AOAI embedding model
To connect to your aoai embedding model, you can either set your api-key and endpoint in the environment variable, or pass in an optional ConnectionConfig if you have a connection to the model deployment in your workspace.

In [None]:
from promptflow.rag.config import LocalSource, AzureAISearchConfig, EmbeddingsModelConfig
from promptflow.rag import build_index

ai_search_index_path=build_index(
    name=local_index_name + "aoai",  # name of your index
    vector_store="azure_ai_search",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=embedding_model_aoai,
        deployment_name=deployment_name_aoai
        # connection_config = ConnectionConfig(
        #     subscription = "<subscription>",
        #     resource_group = "<resource-group>",
        #     workspace = "<workspace>",
        #     connection_name = "<connection-name>"
        # ),
    ),
    input_source=LocalSource(input_data="product-info/"),  # the location of your file/folders
    index_config=AzureAISearchConfig(
        ai_search_index_name=local_index_name + "-aoai-store" # the name of the index store inside the azure ai search service
    ),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
)

### 2. With Cohere embedding model
To use your cohere embedding model, please specify the ConnectionConfig to the model deployment you want to use.

In [None]:
from promptflow.rag.config import LocalSource, AzureAISearchConfig, EmbeddingsModelConfig, ConnectionConfig
from promptflow.rag import build_index

ai_search_index_path=build_index(
    name=local_index_name + "cohere",  # name of your index
    vector_store="azure_ai_search",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=embedding_model_cohere,
        connection_config=ConnectionConfig(
            subscription = "<subscription>",
            resource_group = "<resource-group>",
            workspace = "<workspace>",
            connection_name = "<connection-name>"
        )
    ),
    input_source=LocalSource(input_data="product-info/"),  # the location of your file/folders
    index_config=AzureAISearchConfig(
        ai_search_index_name=local_index_name + "cohere-store" # the name of the index store inside the azure ai search service
    ),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
)

### Register the index
Register the index so that it shows up in the AI Studio Project.

In [None]:
from azure.ai.ml.entities import Index
client.indexes.create_or_update(Index(name=local_index_name, path=ai_search_index_path, version="1"))

### Retrieve index from the cloud

In [None]:
ml_index=client.indexes.get(name=index_name, label="latest")

### 3. Consume the index as a langchain retriever

In [None]:
from promptflow.rag import get_langchain_retriever_from_index

# Get the Index
retriever=get_langchain_retriever_from_index(ml_index.path)
retriever.get_relevant_documents("which tent is the most waterproof?")