## About this example
This sample shows how to create an index from local files or an existing ai search index. The result MLIndex file will be generated locally.

This sample is useful for developers and data scientists who wish to use their data to create an Index which can be used in the RAG pattern.

## Before you begin

### Parameters

In [None]:
# names of indexes we will create
local_index_name = "local-test"

# model used for embedding
embedding_model_aoai: str = "text-embedding-ada-002"
deployment_name_aoai: str = "text-embedding-ada-002"
embedding_model_cohere: str = "cohere-embed-v3-multilingual" # or "cohere-embed-v3-english"

import os

# set credentials to your Azure OpenAI instance
os.environ["OPENAI_API_VERSION"] = "2023-07-01-preview"
os.environ["AZURE_OPENAI_KEY"] = "api-key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "api-endpoint"

# set credentials to your Azure AI Search instance
os.environ["AZURE_AI_SEARCH_KEY"] = "<your-acs-key>"
os.environ["AZURE_AI_SEARCH_ENDPOINT"] = "<your-acs-endpoint>"

from promptflow.rag.resources import AzureAISearchSource
input_source = AzureAISearchSource(ai_search_index_name="<index-name>",
                                   ai_search_content_key="content",
                                   ai_search_embedding_key="contentVector",
                                   ai_search_title_key="title",
                                   ai_search_metadata_key="meta_json_string",
                                   ai_search_connection_id="<connection-id>"
                                )

### Input types

You can build index from local files or an existing ai search index

In [None]:
from promptflow.rag.resources import AzureAISearchSource, LocalSource
# local files
input_source_local = LocalSource(input_data="product-info/")

# existing ai search index 
# keys might be different, please refer to your MLIndex
input_source = AzureAISearchSource(ai_search_index_name="<index-name>",
                                   ai_search_content_key="content",
                                   ai_search_embedding_key="contentVector",
                                   ai_search_title_key="title",
                                   ai_search_metadata_key="meta_json_string",
                                   ai_search_connection_id="<connection-id>"
                                 )

## Build index locally

### 1. With AOAI embedding model
To connect to your aoai embedding model, you can either set your api-key and endpoint in the environment variable, or pass in an optional ConnectionConfig if you have a connection to the model deployment in your workspace.

In [None]:
from promptflow.rag.resources import LocalSource, AzureAISearchConfig, EmbeddingsModelConfig
from promptflow.rag import build_index

ai_search_index_path=build_index(
    name=local_index_name + "aoai",  # name of your index
    vector_store="azure_ai_search",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=embedding_model_aoai,
        deployment_name=deployment_name_aoai
        # connection_config = ConnectionConfig(
        #     subscription = "<subscription>",
        #     resource_group = "<resource-group>",
        #     workspace = "<workspace>",
        #     connection_name = "<connection-name>"
        # ),
    ),
    input_source=LocalSource(input_data="product-info/"),  # the location of your file/folders
    index_config=AzureAISearchConfig(
        ai_search_index_name=local_index_name + "-aoai-store" # the name of the index store inside the azure ai search service
    ),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
)

### 2. With Cohere embedding model
To use your cohere embedding model, please specify the ConnectionConfig to the model deployment you want to use.

In [None]:
from promptflow.rag.resources import LocalSource, AzureAISearchConfig, EmbeddingsModelConfig, ConnectionConfig
from promptflow.rag import build_index

ai_search_index_path=build_index(
    name=local_index_name + "cohere",  # name of your index
    vector_store="azure_ai_search",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=embedding_model_cohere,
        connection_config=ConnectionConfig(
            subscription = "<subscription>",
            resource_group = "<resource-group>",
            workspace = "<workspace>",
            connection_name = "<connection-name>"
        )
    ),
    input_source=LocalSource(input_data="product-info/"),  # the location of your file/folders
    index_config=AzureAISearchConfig(
        ai_search_index_name=local_index_name + "cohere-store" # the name of the index store inside the azure ai search service
    ),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
)