# Calling Composition Of Experts (CoE) Models

This notebook demonstrates how to use the `use_coe_model.py` script to call the Composition Of Experts (CoE) models using different approaches. We'll explore three examples:

1. Using SambaVerse to call CoE Model
2. Using SambaStudio to call CoE with Named Expert
3. Using SambaStudio to call CoE with Routing

Before we begin, make sure you have the `use_coe_model.py` script in the same directory as this notebook.

Let's get started!

## Example 1: Using SambaVerse to call CoE Model

In this example, we'll use SambaVerse to call the CoE model. SambaVerse provides the expert name and their API key.

In [14]:
import os
import sys

current_dir = os.getcwd()
kit_dir = os.path.abspath(os.path.join(current_dir, ".."))
repo_dir = os.path.abspath(os.path.join(kit_dir, ".."))

sys.path.append(kit_dir)
sys.path.append(repo_dir)

In [15]:
from use_CoE_model import SambaNovaEmbeddingModel, SambaverseEndpoint, create_stuff_documents_chain, create_retrieval_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
import os
import yaml

# Update the config.yaml file with the following:
# api: sambaverse
# llm:
#   sambaverse_model_name: "Mistral/Mistral-7B-Instruct-v0.2"
#   samabaverse_select_expert: "Mistral-7B-Instruct-v0.2"

# Load the config.yaml
CONFIG_PATH = os.path.join(current_dir, "config.yaml")

with open(CONFIG_PATH, "r") as yaml_file:
    config = yaml.safe_load(yaml_file)
api_info = config["api"]
llm_info = config["llm"]


# Since Embedding Models are only available on SambaStudio and not SambaVerse we create a local Hugging Face Embeddings Object
# In the SambaStudio examples later we utilise an Embeddings Models hosted on SambaStudio
embeddings = HuggingFaceEmbeddings()

# Load documents and split into chunks
loader = WebBaseLoader("https://docs.smith.langchain.com")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)

# Create a vector database using Chroma
vector = Chroma.from_documents(documents, embeddings,collection_name='sambaverse_coe_aisk')

# Define the prompt template
prompt = ChatPromptTemplate.from_template(
    """Answer the following question based only on the provided context:

    <context>
    {context}
    </context>

    Question: {input}"""
)

# Set up the language model
llm = SambaverseEndpoint(
    sambaverse_model_name=llm_info["sambaverse_model_name"],
    sambaverse_api_key=os.getenv("SAMBAVERSE_API_KEY"),
    model_kwargs={
        "do_sample": False,
        "max_tokens_to_generate": llm_info["max_tokens_to_generate"],
        "temperature": llm_info["temperature"],
        "process_prompt": True,
        "select_expert": llm_info["samabaverse_select_expert"],
    },
)

# Create the document chain and retrieval chain
document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

# Invoke the retrieval chain with the user query
user_query = "How can you use langsmith for testing"
response = retrieval_chain.invoke({"input": user_query})
print(response["answer"])

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
INFO:langchain_community.document_loaders.web_base:fake_useragent not found, using default user agent.To get a realistic header for requests, `pip install fake_useragent`.


LangSmith is primarily designed for building production-grade LLM applications. However, it does provide some capabilities that can be useful for testing purposes.

One way to use LangSmith for testing is by using its tracing capabilities. Tracing allows you to record and analyze the execution of your LLM application. This can be useful for identifying and debugging issues in your application.

Another way to use LangSmith for testing is by using its evaluation capabilities. Evaluation allows you to automatically grade the output of your LLM application against a set of predefined criteria. This can be useful for quickly identifying the performance of your application and for providing objective feedback to your development team.

In summary, while LangSmith is primarily designed for building production-grade LLM applications, it does provide some capabilities that can be useful for testing purposes. These capabilities include tracing and evaluation. By using these capabilities, you ca

## Example 2: Using SambaStudio to call CoE with Named Expert

In this example, we'll use SambaStudio to call the CoE model with a named expert.

In [17]:
from use_CoE_model import SambaNovaEmbeddingModel, SambaNovaEndpoint, create_stuff_documents_chain, create_retrieval_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate

# Update the config.yaml file with the following:
# api: sambastudio
# llm:
#   samabaverse_select_expert: "Mistral-7B-Instruct-v0.2"
# Load the config.yaml
CONFIG_PATH = os.path.join(current_dir, "config.yaml")

with open(CONFIG_PATH, "r") as yaml_file:
    config = yaml.safe_load(yaml_file)
api_info = config["api"]
llm_info = config["llm"]

# Create a SambaNovaEmbeddingModel object
snsdk_model = SambaNovaEmbeddingModel()
embeddings = snsdk_model

# Load documents and split into chunks
loader = WebBaseLoader("https://docs.smith.langchain.com")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)

# Create a vector database using Chroma
vector = Chroma.from_documents(documents, embeddings,collection_name='sambastudio_coe_aisk')

# Define the prompt template
prompt = ChatPromptTemplate.from_template(
    """Answer the following question based only on the provided context:

    <context>
    {context}
    </context>

    Question: {input}"""
)

# Set up the language model
llm = SambaNovaEndpoint(
    model_kwargs={
        "do_sample": True,
        "temperature": llm_info["temperature"],
        "max_tokens_to_generate": llm_info["max_tokens_to_generate"],
        "select_expert": llm_info["samabaverse_select_expert"],
        "process_prompt": False,
    }
)

# Create the document chain and retrieval chain
document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

# Invoke the retrieval chain with the user query
user_query = "Tell me how I can use langsmith within applications"
response = retrieval_chain.invoke({"input": user_query})
print(response["answer"])

INFO:langchain_community.document_loaders.web_base:fake_useragent not found, using default user agent.To get a realistic header for requests, `pip install fake_useragent`.


?

Answer: Based on the provided context, LangSmith is a language processing tool that can be used within applications. However, the context does not provide specific information on how to integrate LangSmith into applications. For more detailed instructions, you may want to refer to LangSmith's official documentation or contact their support team for assistance.


## Example 3: Using SambaStudio to call CoE with Routing

In this example, we'll use SambaStudio to call the CoE model with routing. The script will automatically determine the appropriate expert based on the user query.

In [18]:
from use_CoE_model import SambaNovaEmbeddingModel, SambaNovaEndpoint, create_stuff_documents_chain, create_retrieval_chain, get_expert, get_expert_val
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate

# Update the config.yaml file with the following:
# api: sambastudio
# llm:
#   coe_routing: true

# Create a SambaNovaEmbeddingModel object
snsdk_model = SambaNovaEmbeddingModel()
embeddings = snsdk_model

# Load documents and split into chunks
loader = WebBaseLoader("https://docs.smith.langchain.com")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)

# Create a vector database using Chroma
vector = Chroma.from_documents(documents, embeddings,collection_name='sambastudio_coe_aisk')

# Define the prompt template
prompt = ChatPromptTemplate.from_template(
    """Answer the following question based only on the provided context:

    <context>
    {context}
    </context>

    Question: {input}"""
)

user_query = "Tell me how I can use langsmith for testing"

# Get the expert by calling SambaStudio with a custom prompt workflow
expert_response = get_expert(user_query,use_requests=True)
print(f"Router expert response: {expert_response}")

# Extract the expert name from the response
expert = get_expert_val(expert_response)
print(f"Routing Named Expert: {expert}")

# Look up the model name based on the expert
named_expert = {
    "Finance expert": "finance-chat",
    "Math expert": "deepseek-llm-67b-chat",
    "Code expert": "deepseek-llm-67b-chat",
    "Medical expert": "medicine-chat",
    "Legal expert": "law-chat",
    "Generalist": "Mistral-7B-Instruct-v0.2",
}[expert]
print(f"Named expert Model Name: {named_expert}")

# Set up the language model
llm = SambaNovaEndpoint(
    model_kwargs={
        "do_sample": True,
        "temperature": llm_info["temperature"],
        "max_tokens_to_generate": llm_info["max_tokens_to_generate"],
        "select_expert": named_expert,
        "process_prompt": False,
    }
)

# Create the document chain and retrieval chain
document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

# Invoke the retrieval chain with the user query
response = retrieval_chain.invoke({"input": user_query})
print(f"Response: {response['answer']}")

INFO:langchain_community.document_loaders.web_base:fake_useragent not found, using default user agent.To get a realistic header for requests, `pip install fake_useragent`.


Router expert response: {'data': [{'stop_reason': 'end_of_text', 'completion': ' Based on the information provided, I would classify the message "Tell me how I can use langsmith for testing" as "Code Generation".\n\n<<detected category>>: Code Generation\n\nThe message is asking for information on how to use a tool called "langsmith" for testing, which is a programming concept related to writing and debugging code. Therefore, it falls under the category of "Code Generation".', 'total_tokens_count': 400.0, 'prompt': '{"conversation_id": "sambaverse-conversation-id", "messages": [{"message_id": 0, "role": "user", "content": "Tell me how I can use langsmith for testing"}], "prompt": "<s>[INST] \\n\\nA message can be classified as only one of the following categories: \'finance\',  \'economics\',  \'maths\',  \'code generation\', \'legal\', \'medical\', \'history\' or \'None of the above\'.  \\n\\nExamples for few of these categories are given below:\\n- \'code generation\': Write a python

In each example, we walk through the following steps:

1. Update the `config.yaml` file with the appropriate API information and LLM parameters.
2. Create a `SambaNovaEmbeddingModel` object for embeddings.
3. Load documents from a URL and split them into chunks.
4. Create a vector database using Chroma.
5. Define the prompt template.
6. Set up the language model based on the example configuration.
7. Create the document chain and retrieval chain.
8. Invoke the retrieval chain with the user query.
9. Print the response.

For Example 3, we additionally:
- Call `get_expert()` to determine the appropriate expert based on the user query.
- Extract the expert name using `get_expert_val()`.
- Look up the model name based on the expert.

Feel free to explore and experiment with different configurations and queries to see how the CoE models respond!

If you have any questions or need further assistance, please don't hesitate to ask.