In [1]:
import os
from langchain_openai import AzureOpenAIEmbeddings

os.environ["OPENAI_API_VERSION"] = "2024-02-01"
os.environ["AZURE_OPENAI_ENDPOINT"] = ""
os.environ["AZURE_OPENAI_API_KEY"] = ""

embeddings = AzureOpenAIEmbeddings(
    azure_deployment="embedding"
)

In [2]:
from langchain_community.vectorstores import FAISS

vector = FAISS.load_local("./vector-db/", embeddings)

In [3]:
from langchain_openai import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

llm = AzureChatOpenAI(
    azure_deployment="gpt-4o"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    ("user", "{input}"),
])

document_chain = create_stuff_documents_chain(llm, prompt)

In [4]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain

retriever = vector.as_retriever()

retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

In [6]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

context = [SystemMessage(content="You are a helpful AI assistant that can answer user questions about Azure Machine Learning.")]

query = "How do I define the scoring code for an online endpoint?"

output = retrieval_chain.invoke({
    "context": context,
    "input": query
})

print(output["answer"])

To define the scoring code for an online endpoint, you need to create an entry script (also known as a scoring script) that will handle the scoring requests. Here's a step-by-step guide on how to set this up:

1. **Create a Scoring Script**:
    - This script should accept requests, use the model to score the data, and return a response.
    - The script typically contains two main functions:
        - **init()**: This function is called once when the deployment is initialized. It is used to load the model and any other assets.
        - **run(input_data)**: This function is called for each request. It processes the input data and returns the result.

    Below is an example of a scoring script (`score.py`):

    ```python
    import json
    import joblib
    import numpy as np

    def init():
        global model
        # Load the model from file into a global object
        model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
        model = joblib.load(model_pat