# Arcee AI AFM-4.5B-Preview Demo Across Domains with DeepSeek-R1 Evaluation

This notebook demonstrates the capabilities of Arcee AI's [AFM-4.5B-Preview model](https://api.together.ai/models/arcee-ai/AFM-4.5B-Preview) across different domains and use cases. Each response from AFM-4.5B-Preview is evaluated by [DeepSeek-R1](https://api.together.ai/models/deepseek-ai/DeepSeek-R1) to provide an independent assessment of answer quality.

We'll explore how the AFM-4.5B-Preview model performs on challenging tasks including:

- Advanced Knowledge Questions
- Sophisticated Creative Writing
- Specialized Domain-Specific Applications

This approach provides both the model's response and an expert evaluation of its performance on each task. Out of the box, it can answer complex domain-specific questions, and will easily be fine-tuned for even greater precision. Contact julien@arcee.ai for details.

The final model will be released on Hugging Face in the next few weeks, with a non-commercial license. In the meantime, the preview model is free to use on Together AI :)


## Install Required Packages


In [1]:
!pip install -U pip
!pip install -qU together




## Set Up API Key and Import Libraries


In [2]:
import os
import json
import ipywidgets as widgets
from IPython.display import Markdown, display, HTML
from together import Together

# Initialize the Together client
api_key = os.environ.get("TOGETHER_API_KEY")

# Verify API key is available
if api_key is None:
    print("⚠️ TOGETHER_API_KEY environment variable not found. Please set it before proceeding.")
    print("You can set it with: import os; os.environ['TOGETHER_API_KEY'] = 'your_api_key_here'")
else:
    # Create a client instance
    client = Together(api_key=api_key)
    print("✅ API key loaded successfully and client initialized")


✅ API key loaded successfully and client initialized


## Model Queries with Real-time Streaming

This notebook demonstrates:
- Real-time streaming of AFM-4.5B-Preview responses as they are generated
- Automatic evaluation of each response by DeepSeek-R1
- Clean presentation of questions, answers, and evaluations
- Robust error handling with fallbacks if streaming fails


In [3]:
def stream_afm_response(messages, max_tokens=1024, temperature=0.7):
    """
    Stream responses from AFM-4.5B-Preview and then get DeepSeek-R1's evaluation.
    
    Args:
        messages: List of message dictionaries with 'role' and 'content'
        max_tokens: Maximum number of tokens to generate
        temperature: Controls randomness (0.0-1.0)
        
    Returns:
        The AFM model's response as a string
    """
    # Get the question
    question = messages[-1]['content']
    
    # Display the question
    display(Markdown(f"""### Question
{question}"""))
    
    # Display AFM header
    display(Markdown("### AFM-4.5B-Preview Response"))
    
    # Create streaming output for AFM
    afm_output = widgets.HTML(value="")
    display(afm_output)
    
    # Stream the AFM response
    full_response = ""
    stream_success = False
    
    try:
        # Stream the response
        for chunk in client.chat.completions.create(
            model="arcee-ai/AFM-4.5B-Preview",
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
            stream=True,
        ):
            # Safely extract content from the chunk
            if hasattr(chunk, 'choices') and len(chunk.choices) > 0:
                if hasattr(chunk.choices[0], 'delta'):
                    content = chunk.choices[0].delta.content
                    if content is not None:
                        # Accumulate the full response
                        full_response += content
                        # Update the output with the current response
                        afm_output.value = f"""<div style="white-space: pre-wrap; font-family: monospace;">{full_response}</div>"""
                        stream_success = True
        
        # If streaming fails or produces no content, fall back to non-streaming
        if not stream_success:
            afm_output.value = "<div>Streaming produced no content. Falling back to non-streaming API call.</div>"
            response = client.chat.completions.create(
                model="arcee-ai/AFM-4.5B-Preview",
                messages=messages,
                max_tokens=max_tokens,
                temperature=temperature,
            )
            full_response = response.choices[0].message.content
            afm_output.value = f"""<div style="white-space: pre-wrap; font-family: monospace;">{full_response}</div>"""
            
    except Exception as e:
        error_msg = f"Error with AFM-4.5B-Preview: {str(e)}"
        afm_output.value = f"""<div style="color: red;">{error_msg}</div>"""
        full_response = error_msg
    
    # Now get DeepSeek-R1's evaluation
    display(Markdown("### DeepSeek-R1's Opinion"))
    
    # Create output for DeepSeek
    deepseek_output = widgets.HTML(value="<div>Evaluating response quality...</div>")
    display(deepseek_output)
    
    try:
        # Create evaluation prompt
        evaluation_prompt = f"""Please provide a brief evaluation of the following answer to the question. 
Focus only on the answer quality, accuracy, and completeness. Be concise.

Question: {question}

Answer: {full_response}

Your brief evaluation (2-3 sentences):"""
        
        # Get DeepSeek evaluation
        deepseek_response = client.chat.completions.create(
            model="deepseek-ai/DeepSeek-R1",
            messages=[{"role": "user", "content": evaluation_prompt}],
            max_tokens=2048,
            temperature=0.3,
        )
        
        deepseek_evaluation = deepseek_response.choices[0].message.content
        deepseek_output.value = f"""<div style="background-color: #f0f0f0; padding: 10px; border-left: 4px solid #007bff;">{deepseek_evaluation}</div>"""
        
    except Exception as e:
        deepseek_output.value = f"""<div style="color: red;">Error getting evaluation: {str(e)}</div>"""
    
    # Return the AFM response for further use
    return full_response

def query_afm(prompt, max_tokens=1024, temperature=0.7):
    """
    Query the AFM-4.5B-Preview model and get DeepSeek-R1's evaluation.
    
    Args:
        prompt: The user's question or request
        max_tokens: Maximum number of tokens to generate
        temperature: Controls randomness (0.0-1.0)
        
    Returns:
        The AFM model's response
    """
    # Create a message list with the user's prompt
    messages = [{"role": "user", "content": prompt}]
    
    # Use the shared function to process and display results
    return stream_afm_response(messages, max_tokens, temperature)


## 1. Knowledge Questions


In [4]:
knowledge_questions = [
    "Compare and contrast the First and Second Industrial Revolutions, analyzing their distinct technological innovations and socioeconomic impacts across different regions of the world.",
    "Explain the biochemical mechanisms of the Calvin cycle in C4 photosynthesis and how they differ from CAM photosynthesis, including the evolutionary advantages of each pathway.",
    "Analyze the fundamental architectural differences between transformers and convolutional neural networks, and explain how these differences affect their performance across various NLP and computer vision tasks.",
    "Describe the quantum mechanical principles behind nuclear magnetic resonance and how they enable structural determination of complex organic molecules."
    # Add your own questions here
]

# Run the first knowledge question with streaming
question = knowledge_questions[0]
answer = query_afm(question)


### Question
Compare and contrast the First and Second Industrial Revolutions, analyzing their distinct technological innovations and socioeconomic impacts across different regions of the world.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

In [5]:
# Run the second knowledge question with streaming
question = knowledge_questions[1]
answer = query_afm(question)


### Question
Explain the biochemical mechanisms of the Calvin cycle in C4 photosynthesis and how they differ from CAM photosynthesis, including the evolutionary advantages of each pathway.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

## 2. Creative Writing


In [12]:
creative_writing_prompts = [
    "Write a short story that begins at the end and moves backward in time, using the literary techniques of Jorge Luis Borges, incorporating themes of infinity, recursion, and paradox. The story should feature a librarian who discovers a book that predicts the manner of its reader's death. Write at least 1000 words.",
    "Write a narrative that seamlessly integrates three distinct literary styles: Victorian gothic, cyberpunk, and magical realism. The narrative should center around a character who discovers they are a fictional character aware of their own narrative constraints.",
    "Compose a sestina (a complex poetic form with six stanzas of six lines each, followed by a tercet, using the same six words at the end of each line in a specific pattern) about the nature of consciousness and artificial intelligence.",
    "Write a dialogue between two entities: one exists outside of time experiencing all moments simultaneously, while the other experiences time linearly but can remember multiple possible futures. Their conversation should explore the philosophical implications of determinism versus free will.",
    "Create a scene written in the second-person perspective that employs synesthesia (the blending of sensory experiences) throughout, where colors have sounds, words have tastes, and emotions have textures. The scene should take place in a setting where reality is gradually unraveling.",
]

# Run the first creative writing prompt with streaming
prompt = creative_writing_prompts[0]
response = query_afm(prompt, max_tokens=4096, temperature=0.8)  # Using slightly higher temperature for creativity


### Question
Write a short story that begins at the end and moves backward in time, using the literary techniques of Jorge Luis Borges, incorporating themes of infinity, recursion, and paradox. The story should feature a librarian who discovers a book that predicts the manner of its reader's death. Write at least 1000 words.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

In [13]:
# Run the second creative writing prompt with streaming
prompt = creative_writing_prompts[1]
response = query_afm(prompt, temperature=0.8)


### Question
Write a narrative that seamlessly integrates three distinct literary styles: Victorian gothic, cyberpunk, and magical realism. The narrative should center around a character who discovers they are a fictional character aware of their own narrative constraints.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

## 3. Domain-Specific Applications


### 3.1 Healthcare


In [14]:
healthcare_questions = [
    "Compare and contrast the mechanisms of action between CRISPR-Cas9 and zinc finger nucleases for gene editing, including their specificity, efficiency, and potential off-target effects in therapeutic applications for monogenic disorders.",
    "Analyze the immunopathological mechanisms underlying cytokine release syndrome in CAR-T cell therapy and propose pharmacological interventions to mitigate this adverse effect while preserving therapeutic efficacy.",
    "Evaluate the potential applications of quantum computing in protein folding prediction and drug discovery, addressing both the theoretical advantages over classical computing methods and the practical challenges of implementation."
]

# Run a healthcare question with streaming
question = healthcare_questions[0]
answer = query_afm(question)


### Question
Compare and contrast the mechanisms of action between CRISPR-Cas9 and zinc finger nucleases for gene editing, including their specificity, efficiency, and potential off-target effects in therapeutic applications for monogenic disorders.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

### 5.2 Finance


In [16]:
finance_questions = [
    "Evaluate the effectiveness of the Black-Scholes-Merton model in pricing exotic options during periods of extreme market volatility, and propose modifications that incorporate jump-diffusion processes and stochastic volatility.",
    "Compare and contrast the capital structure theories of Modigliani-Miller, Trade-off Theory, and Pecking Order Theory, analyzing their empirical validity across different industry sectors and macroeconomic environments.",
    "Analyze the implications of negative interest rate policies on banking system stability, wealth distribution, and currency valuations in the context of secular stagnation theory."
]

# Run a finance question with streaming
question = finance_questions[0]
answer = query_afm(question)


### Question
Evaluate the effectiveness of the Black-Scholes-Merton model in pricing exotic options during periods of extreme market volatility, and propose modifications that incorporate jump-diffusion processes and stochastic volatility.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

### 5.3 Technology


In [19]:
tech_questions = [
    "Analyze the architectural trade-offs between transformer-based language models and recurrent neural networks with attention mechanisms for real-time natural language processing tasks on resource-constrained edge devices.",
    "Evaluate the theoretical and practical limitations of current formal verification methods for ensuring safety properties in autonomous systems that incorporate deep reinforcement learning, and propose novel approaches to address these challenges."
]

# Run a technology question with streaming
question = tech_questions[0]
answer = query_afm(question)


### Question
Analyze the architectural trade-offs between transformer-based language models and recurrent neural networks with attention mechanisms for real-time natural language processing tasks on resource-constrained edge devices.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

### 5.4 Education


In [20]:
education_questions = [
    "Critically evaluate the neuroscientific evidence supporting spaced repetition and interleaving in knowledge retention, and design an optimal curriculum structure for teaching complex abstract mathematical concepts that incorporates these findings while accounting for individual cognitive differences.",
    "Analyze the methodological limitations in educational research on personalized learning environments, particularly regarding the confounding variables that affect the measurement of cognitive outcomes, and propose a research design that addresses these limitations.",
    "Develop a theoretical framework that integrates Vygotsky's sociocultural theory, cognitive load theory, and recent advances in educational neuroscience to optimize knowledge transfer in collaborative learning environments for complex problem-solving tasks."
]

# Run an education question with streaming
question = education_questions[0]
answer = query_afm(question)


### Question
Critically evaluate the neuroscientific evidence supporting spaced repetition and interleaving in knowledge retention, and design an optimal curriculum structure for teaching complex abstract mathematical concepts that incorporates these findings while accounting for individual cognitive differences.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

## 6. Multi-turn Conversation Example


In [21]:
def multi_turn_conversation(messages):
    """
    Conduct a multi-turn conversation with AFM-4.5B-Preview and get DeepSeek-R1's evaluation.
    
    Args:
        messages: List of message dictionaries with 'role' and 'content'
        
    Returns:
        The AFM model's response
    """
    # Display conversation context
    if len(messages) > 1:
        display(Markdown("### Conversation Context"))
        for i, msg in enumerate(messages[:-1]):
            if msg["role"] == "user":
                display(Markdown(f"**User:** {msg['content']}"))
            else:
                # Truncate long assistant responses
                content = msg['content']
                if len(content) > 100:
                    content = content[:100] + "..."
                display(Markdown(f"**Assistant:** {content}"))
        display(Markdown("---"))
    
    # Use the shared function to process and display results
    return stream_afm_response(messages)

# Start a conversation about advanced theoretical physics
conversation = [
    {"role": "user", "content": "Explain the implications of the holographic principle for our understanding of quantum gravity, particularly regarding the black hole information paradox."}
]

# Get the first response
response = multi_turn_conversation(conversation)

# Add the response to the conversation
conversation.append({"role": "assistant", "content": response})


### Question
Explain the implications of the holographic principle for our understanding of quantum gravity, particularly regarding the black hole information paradox.

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

In [22]:
# Continue the conversation with a follow-up question
conversation.append({"role": "user", "content": "What are its limitations in addressing quantum entanglement across the event horizon?"})

# Get the second response
response = multi_turn_conversation(conversation)

# Add the response to the conversation
conversation.append({"role": "assistant", "content": response})


### Conversation Context

**User:** Explain the implications of the holographic principle for our understanding of quantum gravity, particularly regarding the black hole information paradox.

**Assistant:** The holographic principle suggests that the information contained within a volume of space can be en...

---

### Question
What are its limitations in addressing quantum entanglement across the event horizon?

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

In [23]:
# Ask a third question that builds on the conversation
conversation.append({"role": "user", "content": "Given the tensions between general relativity and quantum mechanics, how might the holographic principle and the firewall paradox lead to a reformulation of our understanding of spacetime as emergent rather than fundamental?"})

# Get the third response
response = multi_turn_conversation(conversation)


### Conversation Context

**User:** Explain the implications of the holographic principle for our understanding of quantum gravity, particularly regarding the black hole information paradox.

**Assistant:** The holographic principle suggests that the information contained within a volume of space can be en...

**User:** What are its limitations in addressing quantum entanglement across the event horizon?

**Assistant:** The holographic principle, while offering valuable insights into quantum gravity, faces limitations ...

---

### Question
Given the tensions between general relativity and quantum mechanics, how might the holographic principle and the firewall paradox lead to a reformulation of our understanding of spacetime as emergent rather than fundamental?

### AFM-4.5B-Preview Response

HTML(value='')

### DeepSeek-R1's Opinion

HTML(value='<div>Evaluating response quality...</div>')

## 7. Conclusion


This notebook demonstrates the capabilities of Arcee AI's AFM-4.5B-Preview model across various domains and tasks, with each response evaluated by DeepSeek-R1 for quality assessment.


The dual-model approach of using AFM-4.5B-Preview for responses and DeepSeek-R1 for evaluation provides both answers and quality assessment, offering insights into the strengths and limitations of the AFM model across different domains and complexity levels.
