# Mixtcha Quickstart Guide

## Step 0: Sign Up and get an API Key at https://mixtcha.com. 

Then, save that key in a .env file, in the same subfolder as this notebook 

`MIXTCHA_API_KEY = sk-mix-1234`

In [None]:
%pip install openai
%pip install python-dotenv
%pip install pyyaml

## Step 1: Make sure your API key is saved in .env

In [4]:
import os
import dotenv
dotenv.load_dotenv() #you should have already written .env in Step 0

# alternatively - paste your Mixtcha.com API key
#MIXTCHA_API_KEY="sk-1234"

# Assert that MIXTCHA_API_KEY is present in the environment
assert 'MIXTCHA_API_KEY' in os.environ, "MIXTCHA_API_KEY is not set in the environment. Please make sure it's in your .env file."

# if you really aren't sure, print it out
#print(os.getenv("MIXTCHA_API_KEY"))

## Step 2: Directly setting a mixtcha configuration

We need to set our `base_url="https://api.mixtcha.com/"` to talk to the mixtcha server and use the `MIXTCHA_API_KEY`.

The mixtcha server uses the `model` variable to configure the mixtcha. It can accept YAML, JSON, URLs and single LLM model names. To start, let's show how to directly set a mixtcha configuration. 

This configuration is going to call both `gpt-4o` and `claude-3.5-sonnet` in parallel, and then have `claude-3.5-sonnet` synthesize a single final response.

In [32]:
#set up the client
import openai
client = openai.OpenAI(
    api_key=os.getenv("MIXTCHA_API_KEY"),
    base_url="https://api.mixtcha.com"
)

#define the mixtcha configuration
config = {
        "layers": [
            {
                "type": "parallel",
                "models": ["openai/gpt-4o", "anthropic/claude-3.5-sonnet"]
            },
            {
                "type": "aggregator",
                "model": "anthropic/claude-3.5-sonnet",
                "prompt": "Multiple answers were provided between <option> tags, but don't assume that I've seen them. Please synthesize them into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be incorrect. Think step-by-step before providing your final answer."
            }
        ],
        "messageMode": "inline",
        "delimiter": ["<option>", "</option>"]
    }

# Convert config to YAML string
import yaml
config_yaml = yaml.dump(config, default_flow_style=True)

response = client.chat.completions.create(
    model=config_yaml,
    messages=[
        {"role": "user", "content": "What do you know about creating a 'mixture of agents'? Explain it in a way that is easy to understand."}
    ]
)

print(response.choices[0].message.content)

Here's my synthesized, critically evaluated explanation of what a mixture of agents means:

A mixture of agents refers to a collaborative AI system where multiple specialized AI models or "agents" work together to solve complex tasks more effectively than any single agent could alone. Think of it like a skilled professional team where each member has distinct expertise and responsibilities.

Key aspects of agent mixtures:

1. Specialized Capabilities
- Each agent is designed or trained for specific types of tasks
- Agents complement each other's strengths and compensate for weaknesses
- Example: One agent might excel at language processing while another at image recognition

2. Coordinated Collaboration
- A management system or "meta-controller" orchestrates the agents
- It determines which agent(s) should handle specific aspects of a task
- Ensures efficient task distribution based on agent capabilities

3. Adaptive Performance
- The system can learn which combinations of agents work 

## Step 3: What's going on under the hood?

The main benefit of mixtcha is to make it easy to leverage many LLMs while still having the client work as a single query and a single response. However, we can inspect the intermediate responses to see what is going on under the hood:

In [33]:
# The intermediate_responses field contains the raw responses from each layer
print("\nLayer 1 (parallel) - Raw responses from each model:")
for completion in response.intermediate_responses['layers'][0]['completions']:
    print(f"\n{completion['model']}:")
    print(completion['choices'][0]['message']['content'])

print("\nLayer 2 (aggregator) - Final synthesized response:")
print(response.choices[0].message.content)


Layer 1 (parallel) - Raw responses from each model:

openai/gpt-4o:
Creating a "mixture of agents" refers to combining different AI models or systems, known as "agents," to solve complex tasks more effectively than any single agent could manage on its own. Think of it like assembling a team where each member brings a unique skill set to the table, allowing the team to tackle a wider variety of problems and adapt to different situations. 

Here’s a simple breakdown:

1. **Diversity of Skills**: Each agent in the mixture might specialize in different areas. For example, one agent could be great at recognizing images, while another might excel at processing language.

2. **Collaboration**: These agents work together, sharing information and dividing tasks based on their strengths. Just like a soccer team has different players for different roles (goalkeeper, defender, striker), a mixture of agents allows for specialized roles.

3. **Decision Making**: Typically, there is a system that ma

## Step 4: How much does Mixtcha cost?

The list of all available models and their prices is available at [https://mixtcha.com/models_list.yaml](https://mixtcha.com/models_list.yaml). Charges are rounded to the nearest penny and the minimum charge for a mixtcha completion is $0.01. We can inspect our last call to see the pricing:

In [35]:
# Print costs from all layers
print("\nCosts by layer:")
total_raw_cost = 0

for layer_idx, layer in enumerate(response.intermediate_responses['layers']):
    print(f"\nLayer {layer_idx + 1} ({layer['layer_type']}):")
    for completion in layer['completions']:
        print(f"  {completion['model']}:")
        if 'usage' in completion:
            cost = completion['usage'].get('cost', 0)
            total_raw_cost += cost
            print(f"  Cost: ${cost:.6f}")

print(f"\nSum of raw costs: ${total_raw_cost:.6f}")
print(f"Final rounded cost: ${response.usage.cost:.2f}")


Costs by layer:

Layer 1 (parallel):
  openai/gpt-4o:
  Cost: $0.005711
  anthropic/claude-3.5-sonnet:
  Cost: $0.004829

Layer 2 (aggregator):
  anthropic/claude-3.5-sonnet:
  Cost: $0.009806

Sum of raw costs: $0.020345
Final rounded cost: $0.02


## Step 5: Create your own mixtchas and share them!

You can make your own mixtchas as YAML or JSON files and share them online. See the [type definitions](https://github.com/mixtcha/mixtcha/blob/main/reference/types.ts) for help making your own mixtchas. You can share them as a GitHub Gist, raw GitHub file, or any other method you have to host a publicly-available file on the internet.

In the example below, we load this same configuration from the url to the file on GitHub.

**Warning: Always inspect the configuration file before running a mixture from URL! Mixtcha configurations can potentially contain harmful content or prompt injection attacks.**

In [36]:
# The URL to the 4oSo mixtcha configuration
model_url = "https://github.com/mixtcha/mixtcha/raw/refs/heads/main/official-mixtchas/4oSo.yaml"

response = client.chat.completions.create(
    model=model_url,
    messages=[
        {"role": "user", "content": "What would it be like to make a 'neural network of LLMs'?"}
    ]
)

print(response.choices[0].message.content)

Creating a "neural network of LLMs" (large language models) is a conceptual idea that envisions linking multiple LLMs to function together in a structured system, similar to how neurons are interconnected in traditional neural networks. Here’s a synthesized explanation of how such a system might work, potential challenges, and benefits:

### Architecture and Design
1. **Hierarchical Structure:**
   - The system could have a hierarchical design where LLMs operate at various abstraction levels, with lower-level LLMs handling specific tasks and higher-level LLMs coordinating the output.
   - This mirrors how specialized regions in the human brain process information before integrating it into a coherent response.

2. **Parallel and Specialized Processing:**
   - Multiple LLMs could process different aspects of a task simultaneously, each specializing in distinct types of reasoning or domains.
   - Such specialization could optimize efficiency, as individual models can be tailored to excel

## Assorted Tricks

You can just call a single LLM if you want, using the `<provider>/<model>` format. (See the [model list](https://mixtcha.com/models_list.yaml))

In [None]:
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "user", "content": "What would it be like to make a 'neural network of LLMs'?"}
    ]
)

print(response.choices[0].message.content)

If you don't define an aggregator, you will get the concatenated responses back:

In [37]:
# Define a simple parallel-only configuration
config = {
    "layers": [
        {
            "type": "parallel",
            "models": ["openai/gpt-4o-mini", "anthropic/claude-3.5-haiku"],
            "systemPrompts": ["Respond in spanish", "Respond in french"]
        }
    ],
    "messageMode": "concat",
    "delimiter": ["<option>", "</option>"]
}

# Convert config to YAML string
config_yaml = yaml.dump(config)

response = client.chat.completions.create(
    model=config_yaml,
    messages=[
        {"role": "user", "content": "tell me a joke"}
    ]
)

print(response.choices[0].message.content)

<option>[openai/gpt-4o-mini] ¿Por qué los pájaros no usan Facebook? 

Porque ya tienen Twitter. 😄</option>

<option>[anthropic/claude-3.5-haiku] D'accord, voici une blague pour vous :

Pourquoi est-ce que les plongeurs plongent toujours en arrière et jamais en avant ?

Parce que sinon, ils tombent dans le bateau !

*Ba dum tss* 😄 C'est une blague un peu bête, mais j'espère qu'elle vous a fait sourire !</option>


We can have as many layers as we want, kind of like a neural network. (See the ["Mixture of Agents" paper](https://arxiv.org/abs/2406.04692) for additional inspiration.)

In [38]:
# Define a multi-layer configuration
config = {
    "layers": [
        {
            "type": "parallel",
            "models": ["openai/gpt-4o-mini", "openai/gpt-4o-mini"],
            "systemPrompts": ["state the obvious", "only give non-obvious details"]
        },
        {
            "type": "parallel",
            "models": ["openai/gpt-4o-mini", "openai/gpt-4o-mini"],
            "systemPrompts": ["expand on everything discussed before", "give the counter-argument to everything discussed before"]
        },
        {
            "type": "aggregator",
            "model": "openai/gpt-4o-mini",
            "prompt": """Multiple answers were provided between <option> tags.
                 Please synthesize them into a single, high-quality response.
                 Think step-by-step about the different perspectives provided
                 and combine them into a comprehensive answer."""
        }
    ],
    "messageMode": "concat",
    "delimiter": ["<option>", "</option>"]
}

response = client.chat.completions.create(
    model=yaml.dump(config),
    messages=[
        {
            "role": "user",
            "content": "What are the key principles of effective debugging?"
        }
    ]
)

print("Final synthesized response:\n")
print(response.choices[0].message.content)

Final synthesized response:

Certainly! Below is a synthesized response that incorporates the various debugging principles along with counterpoints, creating a comprehensive approach to effective debugging.

---

### Comprehensive Guide to Effective Debugging

Debugging is an essential skill in software development that involves identifying, isolating, and resolving bugs or issues in code. While there are several principles that guide effective debugging, it's important to recognize that the context can influence their applicability. Here's a refined approach combining best practices and considerations.

#### Step 1: Understand the Problem
- **Gather Detailed Information**: Begin by acquiring comprehensive information about the issue. Understand the symptoms and differentiate them from the root causes. Utilize user reports or error messages to clarify the problem.
- **Beware of Over-Analysis**: While understanding the problem is critical, avoid "analysis paralysis." Sometimes, diving d

In [39]:
print("\nLayer-by-layer breakdown:")

for layer_idx, layer in enumerate(response.intermediate_responses['layers']):
    print(f"\nLayer {layer_idx + 1} ({layer['layer_type']}):")
    for completion in layer['completions']:
        print(f"\n  Model {completion['model']}:")
        print(f"  Response: {completion['choices'][0]['message']['content']}")
        if 'usage' in completion:
            cost = completion['usage'].get('cost', 0)
            print(f"  Cost: ${cost:.6f}")

print(f"Final rounded cost: ${response.usage.cost:.2f}")


Layer-by-layer breakdown:

Layer 1 (parallel):

  Model openai/gpt-4o-mini:
  Response: Effective debugging is crucial for identifying and resolving issues in software. Here are some key principles to keep in mind:

1. **Understand the Problem**: Clearly define the issue you're facing. Gather information about the symptoms, expected behavior, and error messages.

2. **Reproduce the Issue**: Ensure you can consistently reproduce the bug. This is vital for testing potential fixes.

3. **Check Recent Changes**: If the issue has appeared recently, look at the changes made to the codebase. Recent modifications are often the source of new bugs.

4. **Use a Debugger**: Utilize debugging tools to step through the code, inspect variables, and understand the program's flow.

5. **Add Logging**: Insert logging statements to track the execution flow and variable values at critical points in the application.

6. **Isolate the Problem**: Narrow down the part of the code where the issue occurs. This