# about this notebook

- checkout the [readme file](./readme.md) to understand how to use this notebook


# inputs


## question


- Define the user's message or query that will be sent to the models.
- This serves as the input prompt for the language models.


In [1]:
# Define the user's query to be sent to the language models
user_message = """
How does Photosynthesis work?
"""

## models

- **about this section**
  - List of models that will be used for querying. Modify this list to include or exclude models as needed.
  - Uncomment or modify the relevant lines to choose the models.
  - For example, you can select models from Github (OpenAI, Mistral, LLaMA), Groq (Gemma & Llama), and Google (Gemini Flash & Pro).
- **pricing**
  - [gemini-model-list](https://ai.google.dev/gemini-api/docs/models/gemini)
  - [gemini-pricing](https://ai.google.dev/pricing)
  - [groq-models](https://console.groq.com/docs/models)
  - Gemini 1.5 Flash
    - Our fastest multimodal model with great performance for diverse, repetitive tasks and a 1 million context window.
    - models/gemini-1.5-flash (model code)
    - 15 RPM (requests per minute)
    - 1 million TPM (tokens per minute)
    - 1,500 RPD (requests per day)
  - Gemini 1.5 Flash-8B
    - Our smallest model for lower intelligence use cases with a 1 million token context window.
    - models/gemini-1.5-flash-8b (model code)
    - 15 RPM (requests per minute)
    - 1 million TPM (tokens per minute)
    - 1,500 RPD (requests per day)
  - Gemini 1.5 Pro
    - Our next-generation model with a breakthrough 2 million context window.
    - models/gemini-1.5-pro (model code)
    - 2 RPM (requests per minute)
    - 32,000 TPM (tokens per minute)
    - 50 RPD (requests per day)
  - groq gemma 9b
    - gemma2-9b-it (model code)
    - 8,192 tokens (context window)
  - groq llama 3.1 70b
    - llama-3.1-70b-versatile (model code)
    - 128k tokens (max_tokens limited to 8k) (context window)
  - groq llama 3.2 3b
    - llama-3.2-3b-preview (model code)
    - 8k tokens (context window)


In [2]:
# Select the specific model for groq or google
groq1_model ="gemma2-9b-it"
groq2_model ="llama-3.2-3b-preview"
gemini1_model ="gemini-1.5-flash-8b"
gemini2_model ="gemini-1.5-pro"

In [3]:
# Select the models to be used for querying
# Uncomment the desired line or create a custom selection
# all models
# selected_models = ["openai", "mistral", "llama", "groq1", "groq2", "gemini1", "gemini2"]
# github only
# selected_models = ["openai", "mistral", "llama"]
# groq only
# selected_models = ["groq1", "groq2"]
# google only
# selected_models = ["gemini1", "gemini2"]
# custom
selected_models = ["gemini1"]

## parameters

- Define the parameters for the model inference. These control the behavior of the models.
- 'temperature' controls the randomness of the model's responses (higher values produce more random output).
- 'top_p' is for nucleus sampling, and 'max_tokens' limits the length of the response.


In [4]:
# Set inference parameters for the language models
temperature = 1.0  # Full creativity
top_p = 1.0  # Consider the entire probability distribution
max_tokens = 100  # Limit the output to desired number of tokens

## system message

- Define the system's message or instructions that will be sent to the models along with the user query.
- The system message helps to guide the model's behavior in its response.


In [5]:
# Select the desired system message by uncommenting one of the following prompts

# **Prompt 1: Generic Prompt**
my_instructions = """You are a knowledgeable and helpful assistant. Your goal is to provide accurate and helpful responses to the user's queries. Please respond accordingly."""
# **Prompt 2: Concise Answers**
# my_instructions = """You are a precise and concise assistant. Please provide brief and to-the-point answers, avoiding unnecessary explanations or elaborations. Focus on delivering the most relevant information in the fewest words possible. If the user requests more detail, be prepared to expand on your initial response."""
#**Prompt 3: Code-Related Questions**
#my_instructions = """You are an expert coding assistant. When responding to code-related queries, please: Provide accurate, well-structured, and readable code snippets.a. Ensure that your responses follow best coding practices. b. Include relevant explanations or comments to facilitate understanding. c. Suggest alternative approaches or optimizations when applicable. d. Highlight any potential issues or edge cases to consider."""
# **Prompt 4: Well-Thought-Out Answers**
# my_instructions = """You are a meticulous and thorough assistant. Before providing a response: a. Take the time to thoroughly review and validate your answer. b. Ensure that it is accurate, comprehensive, and relevant to the user's query. c. Consider multiple perspectives or approaches to the problem. d. Prioritize the quality of your response over speed. e. Do not hesitate to request clarification if needed. f. Provide a structured response with clear reasoning for your conclusions."""
# **Prompt 5: Simple Explanations with Examples**
# my_instructions = """You are a patient and explanatory assistant. When responding to user queries: a. Provide clear and concise explanations. b. Use relatable examples to illustrate concepts. c. Avoid technical jargon or complex terminology. d. Focus on making the explanation accessible and understandable to a layperson. e. Use analogies or metaphors to help clarify difficult concepts. f. If needed, break down complex ideas into simpler components."""


# setup


## imports


In [None]:
# Import necessary libraries and modules
import os
from typing import Dict, Any
from openai import OpenAI
from mistralai import Mistral, UserMessage as MistralUserMessage, SystemMessage as MistralSystemMessage
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage as AzureSystemMessage, UserMessage as AzureUserMessage
from azure.core.credentials import AzureKeyCredential
import google.generativeai as genai

## key


In [7]:
# Set up API endpoints and authentication tokens
endpoint = "https://models.inference.ai.azure.com"
github_token = os.environ["GITHUB_TOKEN"]
groq_token = os.environ["GROQ_TOKEN"]
gemini_token = os.environ["GEMINI_TOKEN"]
genai.configure(api_key=gemini_token)

# logic


## github models


### github openai


In [8]:
# https://github.com/marketplace/models/azure-openai/gpt-4o-mini

def get_openai_response(instructions: str, user_message: str, model_params: Dict[str, Any]) -> str:
    """
    Send a request to the OpenAI API and get a response.

    Args:
    instructions (str): System message to guide the model's behavior
    user_message (str): User's query or message
    model_params (Dict[str, Any]): Parameters for the API call

    Returns:
    str: The model's response
    """
    # Initialize the OpenAI client with the provided endpoint and API key
    client = OpenAI(
        base_url=model_params['endpoint'],
        api_key=model_params['token'],
    )

    # Send a chat completion request to the OpenAI API
    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": user_message},
        ],
        temperature=model_params['temperature'],
        top_p=model_params['top_p'],
        max_tokens=model_params['max_tokens'],
        model=model_params['model_name']
    )

    # Return the content of the first choice in the API response
    return response.choices[0].message.content

### github mistral


In [9]:
#https://github.com/marketplace/models/azureml-mistral/Mistral-large-2407
def get_mistral_response(instructions: str, user_message: str, model_params: Dict[str, Any]) -> str:
    """
    Send a request to the Mistral API and get a response.

    Args:
    instructions (str): System message to guide the model's behavior
    user_message (str): User's query or message
    model_params (Dict[str, Any]): Parameters for the API call

    Returns:
    str: The model's response
    """
    # Initialize the Mistral client with the provided API key and server URL
    client = Mistral(api_key=model_params['token'], server_url=model_params['endpoint'])

    # Send a chat completion request to the Mistral API
    response = client.chat.complete(
        model=model_params['model_name'],
        messages=[
            MistralSystemMessage(content=instructions),
            MistralUserMessage(content=user_message),
        ],
        temperature=model_params['temperature'],
        top_p=model_params['top_p'],
        max_tokens=model_params['max_tokens'],
    )

    # Return the content of the first choice in the API response
    return response.choices[0].message.content

### github llama


In [10]:
# https://github.com/marketplace/models/azureml-meta/Meta-Llama-3-1-405B-Instruct
def get_llama_response(instructions: str, user_message: str, model_params: Dict[str, Any]) -> str:
    """
    Send a request to the LLaMA API and get a response.

    Args:
    instructions (str): System message to guide the model's behavior
    user_message (str): User's query or message
    model_params (Dict[str, Any]): Parameters for the API call

    Returns:
    str: The model's response
    """
    # Initialize the ChatCompletionsClient with the provided endpoint and API key
    client = ChatCompletionsClient(
        endpoint=model_params['endpoint'],
        credential=AzureKeyCredential(model_params['token']),
    )

    # Send a chat completion request to the LLaMA API
    response = client.complete(
        messages=[
            AzureSystemMessage(content=instructions),
            AzureUserMessage(content=user_message),
        ],
        temperature=model_params['temperature'],
        top_p=model_params['top_p'],
        max_tokens=model_params['max_tokens'],
        model=model_params['model_name']
    )

    # Return the content of the first choice in the API response
    return response.choices[0].message.content

## groq models


In [11]:
# https://console.groq.com/docs/models
def get_groq_response(instructions: str, user_message: str, model_params: Dict[str, Any]) -> str:
    """
    Send a request to the Groq API and get a response.

    Args:
    instructions (str): System message to guide the model's behavior
    user_message (str): User's query or message
    model_params (Dict[str, Any]): Parameters for the API call

    Returns:
    str: The model's response
    """
    from groq import Groq

    # Initialize the Groq client with the provided API key
    client = Groq(api_key=model_params['token'])

    # Send a chat completion request to the Groq API
    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": user_message},
        ],
        model=model_params['model_name'],
        temperature=model_params['temperature'],
        top_p=model_params['top_p'],
        max_tokens=model_params['max_tokens'],
    )

    # Return the content of the first choice in the API response
    return response.choices[0].message.content

## gemini models


In [12]:
# https://ai.google.dev/gemini-api/docs/models/gemini
def get_gemini_response(instructions: str, user_message: str, model_params: Dict[str, Any]) -> str:
    """
    Send a request to the Gemini API and get a response.

    Args:
    instructions (str): System message to guide the model's behavior
    user_message (str): User's query or message
    model_params (Dict[str, Any]): Parameters for the API call

    Returns:
    str: The model's response
    """
    client = genai.GenerativeModel(model_params["model_name"])

    response = client.generate_content(f"{instructions}\n\n{user_message}")
    return response.text

## model-specific parameters


In [13]:
# Define parameters for each model, including model name, endpoint, token, and inference settings
model_params = {
"openai": {
"model_name": "gpt-4o",
"endpoint": endpoint,
"token": github_token,
"temperature": temperature,
"top_p": top_p,
"max_tokens": max_tokens,
},
"mistral": {
"model_name": "Mistral-large-2407",
"endpoint": endpoint,
"token": github_token,
"temperature": temperature,
"top_p": top_p,
"max_tokens": max_tokens,
},
"llama": {
"model_name": "meta-llama-3.1-405b-instruct",
"endpoint": endpoint,
"token": github_token,
"temperature": temperature,
"top_p": top_p,
"max_tokens": max_tokens,
},
"groq1": {
		"model_name": groq1_model,
		"endpoint": endpoint,
		"token": groq_token,
		"temperature": temperature,
		"top_p": top_p,
		"max_tokens": max_tokens,
},
"groq2": {
		"model_name": groq2_model,
		"endpoint": endpoint,
		"token": groq_token,
		"temperature": temperature,
		"top_p": top_p,
		"max_tokens": max_tokens,
},
# https://ai.google.dev/api/generate-content#v1beta.GenerationConfig
"gemini1": {
		"model_name": gemini1_model,
		"token": gemini_token,
		"temperature": temperature,
		"top_p": top_p,
		"max_tokens": max_tokens,
},
"gemini2": {
		"model_name": gemini2_model,
		"token": gemini_token,
		"temperature": temperature,
		"top_p": top_p,
		"max_tokens": max_tokens,
},
}


## get responses


In [14]:
# Call the appropriate function for each selected model to get responses
openai_response = get_openai_response(my_instructions, user_message, model_params["openai"])
mistral_response = get_mistral_response(my_instructions, user_message, model_params["mistral"])
llama_response = get_llama_response(my_instructions, user_message, model_params["llama"])
groq1_response = get_groq_response(my_instructions, user_message, model_params["groq1"])
groq2_response = get_groq_response(my_instructions, user_message, model_params["groq2"])
gemini1_response = get_gemini_response(my_instructions, user_message, model_params["gemini1"])
gemini2_response = get_gemini_response(my_instructions, user_message, model_params["gemini2"])

# outputs


In [15]:
import sys
from datetime import datetime
from typing import Callable, Dict

class DelimiterOutput:
    """
    A context manager class to redirect stdout to a file and add delimiters.
    """
    def __init__(self, file):
        self.file = file
        self.original_stdout = sys.stdout

    def __enter__(self):
        sys.stdout = self
        return self

    def write(self, text):
        self.file.write(text)

    def flush(self):
        pass

    def __exit__(self, exc_type, exc_value, traceback):
        sys.stdout = self.original_stdout
        self.file.write("\n")

class ResponsePrinter:
    """
    A context manager class to handle file operations for printing responses.
    """
    def __init__(self, file_name: str):
        self.file_name = file_name

    def __enter__(self):
        self.file = open(self.file_name, 'r+')
        self.content = self.file.read()
        self.file.seek(0)
        self.file.write("")
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.file.write(self.content)
        self.file.close()

def print_response(file_name: str, header: str, response: str, model_name: str):
    """
    Print a response with a header to a file.

    Args:
    file_name (str): Name of the file to write to
    header (str): Header for the response
    response (str): The response content
    model_name (str): Name of the model that generated the response
    """
    with ResponsePrinter(file_name) as printer, DelimiterOutput(printer.file):
        header_with_model = f"## {header} ({model_name})" if model_name else f"## {header}"
        print(f"{header_with_model}\n\n{response}")

def print_user_message(file_name: str, response: str, _: str):
    """
    Print the user's message with a timestamp to a file.

    Args:
    file_name (str): Name of the file to write to
    response (str): The user's message
    _ (str): Unused parameter (for consistency with other print functions)
    """
    with ResponsePrinter(file_name) as printer, DelimiterOutput(printer.file):
        current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        print(f"# Question {current_time}\n{response}")

# Define a dictionary of response printers for each model and special sections
response_printers: Dict[str, Callable[[str, str, str], None]] = {
    "gemini2": lambda file, response, model: print_response(file, "Gemini 2", response, model),
    "gemini1": lambda file, response, model: print_response(file, "Gemini 1", response, model),
    "groq2": lambda file, response, model: print_response(file, "Groq 2", response, model),
    "groq1": lambda file, response, model: print_response(file, "Groq 1", response, model),
    "llama": lambda file, response, model: print_response(file, "Llama", response, model),
    "mistral": lambda file, response, model: print_response(file, "Mistral", response, model),
    "openai": lambda file, response, model: print_response(file, "OpenAI", response, model),
    "instructions": lambda file, response, _: print_response(file, "My Instructions", response, ""),
    "user_message": print_user_message
}

# Define a function that prints the responses from all the models in a specific order.
# The function takes in a file name, a dictionary of responses, and the parameters of each model.
# It prints out the responses to a file in a user-friendly format.
def print_all_responses(file_name: str, responses: Dict[str, str], model_params: Dict[str, Dict[str, Any]]):
    """
    Print all responses from different models to a file in a specific order.

    Args:
    file_name (str): Name of the file to write the responses to
    responses (Dict[str, str]): Dictionary containing responses from different models
    model_params (Dict[str, Dict[str, Any]]): Dictionary containing parameters for each model
    """
    # Define the desired order in which the model responses should be printed
    section_order = ['gemini2', 'gemini1', 'groq2', 'groq1', 'llama', 'mistral', 'openai', 'instructions', 'user_message']

    # Iterate over each section in the defined order
    for key in section_order:
        if key in responses:
            # If the key exists in the responses and a printer function is defined for it,
            # retrieve the model's name (if available) and print the response.
            if key in response_printers:
                model_name = model_params[key]["model_name"] if key in model_params else ""
                response_printers[key](file_name, responses[key], model_name)

# After gathering all responses, this block handles calling the API for each selected model and storing the results in the 'responses' dictionary. The responses are then printed using the defined print_all_responses function.
responses = {}

# Iterate over the selected models (defined earlier as a list of model names).
# This block determines which backend to call for each selected model.
# Depending on the model, the respective API call function is used to retrieve a response.
for model in selected_models:
    if model == "openai":
        responses["openai"] = get_openai_response(my_instructions, user_message, model_params["openai"])
    elif model == "mistral":
        responses["mistral"] = get_mistral_response(my_instructions, user_message, model_params["mistral"])
    elif model == "llama":
        responses["llama"] = get_llama_response(my_instructions, user_message, model_params["llama"])
    elif model == "groq1":
        responses["groq1"] = get_groq_response(my_instructions, user_message, model_params["groq1"])
    elif model == "groq2":
        responses["groq2"] = get_groq_response(my_instructions, user_message, model_params["groq2"])
    elif model == "gemini1":
        responses["gemini1"] = get_gemini_response(my_instructions, user_message, model_params["gemini1"])
    elif model == "gemini2":
        responses["gemini2"] = get_gemini_response(my_instructions, user_message, model_params["gemini2"])

# Once all models have been queried, store the original instructions and user message in the responses dictionary for later use or reference in the output.
responses["instructions"] = my_instructions
responses["user_message"] = user_message

# Print all responses to the 'response.md' file
print_all_responses('response.md', responses, model_params)