# 03 - Langchain with vector searcj

In this lab, we will introduce [Langchain](https://python.langchain.com/docs/get_started/introduction), a framework for developing applications powered by language models and ask question on custom data using Azure Search

Langchain supports Python and Javascript / Typescript. For this lab, we will use Python.

## Setup

We'll use the `pip` tool to install the `langchain` Python package and azure cognitive search

In [None]:
pip install --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/ azure-search-documents==11.4.0a20230509004
pip install azure-identity

In [None]:
import os, json
from dotenv import load_dotenv

# Load environment variables
# API_KEY = "<YOUR API KEY>"
# RESOURCE_ENDPOINT = "<YOUR AZURE OPENAI ENDPOINT>" # For example https://<your azure open ai instance>.openai.azure.com/
# DEPLOYMENT_ID = "<YOUR DEPLOYMENT ID>" # For example "text-davinci-003"
load_dotenv()

# Set this to `azure`
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"

First we will load the data from the csv file into a loader

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='../../extra/data/movies/movies.csv', source_column='original_title', encoding='utf-8', csv_args={'delimiter':',', 'fieldnames': ['id', 'original_language', 'original_title', 'popularity', 'release_date', 'vote_average', 'vote_count', 'genre', 'overview', 'revenue', 'runtime', 'tagline']})
data = loader.load()
data = data[1:200] # reduce dataset if you want
print('Loaded %s movies' % len(data))

We will be using the OpenAI embedding

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

from langchain.llms import AzureOpenAI

llm = AzureOpenAI(
    deployment_name=os.environ["DEPLOYMENT_ID"],
    model_name="gpt-35-turbo",
)

In [None]:
import openai
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.azuresearch import AzureSearch

model: str = "text-embedding-ada-002"
index_name: str = "langchain-vector-demo"

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(model=model, chunk_size=1)
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.environ["AZURE_COGNITIVE_SEARCH_ENDPOINT"],
    azure_search_key=os.environ["AZURE_COGNITIVE_SEARCH_ADMIN_KEY"],
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

vector_store.add_documents(documents=data)

In [None]:
# Perform a similarity search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)

# Perform a hybrid search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?", k=3
)
print(docs[0].page_content)

In [None]:
from langchain.retrievers import AzureCognitiveSearchRetriever

# os.environ["AZURE_COGNITIVE_SEARCH_SERVICE_NAME"] = "<YOUR_ACS_SERVICE_NAME>"
# os.environ["AZURE_COGNITIVE_SEARCH_INDEX_NAME"] = "<YOUR_ACS_INDEX_NAME>"
# os.environ["AZURE_COGNITIVE_SEARCH_API_KEY"] = "<YOUR_API_KEY>"

retriever = AzureCognitiveSearchRetriever(content_key="content")

retriever.get_relevant_documents("what is the best movie of all time")