# Groq + HuggingFace MCP: Real-Time Model Discovery with Text-to-Speech
**The Problem:** Finding the right model for your application usually means manually browsing lists, missing trending models, or relying on stale training data from large language models.

**The Solution:** Reviewing this tutorial to learn how to combine Groq's fast inference with HuggingFace's Model Context Protocol (MCP) server to discover, analyze, and get insights about recently published models and datasets available on HuggingFace in real-time.

## What is MCP and Why Does It Matter?
MCP is a standardized way for large language models to connect with external data sources and tools. Think of it as a universal adapter that lets your model:
- Access real-time data from APIs
- Extend model capabilities beyond training data
- Perform actions

You can view the HuggingFace MCP server [here](https://huggingface.co/settings/mcp) to learn more about the built-in tools you can access by equipping your model with the server as we'll see below.

## Prerequisites

In [None]:
# install dependencies
!pip install openai python-dotenv

## Setup and Configuration

In [2]:
#!/usr/bin/env python3
"""
Groq + HuggingFace MCP Demo
"""

import json
import os
import time
from datetime import datetime

from openai import OpenAI

# Set API Keys

Here we will set the API keys for the Groq and HuggingFace APIs via either environment variables or a userdata object in Google Colab.

- You can get your Groq API key from [here](https://console.groq.com/keys)
- You can get your HuggingFace token from [here](https://huggingface.co/settings/tokens)
  


In [None]:
try:
    from google.colab import userdata

    GROQ_API_KEY = userdata.get("GROQ_API_KEY")
    HF_TOKEN = userdata.get("HF_TOKEN")
except ImportError:
    from dotenv import load_dotenv

    load_dotenv()

    GROQ_API_KEY = os.getenv("GROQ_API_KEY")
    HF_TOKEN = os.getenv("HF_TOKEN")

# Check if API key is set
if not GROQ_API_KEY:
    print("Please set your Groq API key:")
elif not HF_TOKEN:
    print("Please set your HuggingFace token:")
else:
    print("Groq API key and HuggingFace token configured successfully!")

# Model configuration
MODEL = "openai/gpt-oss-120b"

Groq API key and HuggingFace token configured successfully!


## Core Function: HuggingFace Model Discovery with Groq's Responses API 

This function will use the Groq + HuggingFace MCP integration to discover trending models on HuggingFace and report the results and tool calls.

In [None]:
def discover_huggingface_models(query):
    """
    Discover trending models on HuggingFace using Groq + HuggingFace MCP

    This function demonstrates the speed and accuracy of combining:
    - Groq's fast LLM inference (500+ tokens/second)
    - HuggingFace MCP server for real-time model discovery
    """

    if not GROQ_API_KEY:
        print("Please set your Groq API key first!")
        return

    client = OpenAI(base_url="https://api.groq.com/api/openai/v1", api_key=GROQ_API_KEY)

    print(f"{query}")

    start_time = time.time()

    # Configure MCP tools using HuggingFace server
    tools = [
        {
            "type": "mcp",
            "server_url": "https://huggingface.co/mcp",
            "server_label": "huggingface",
            "require_approval": "never",
            "headers": {"Authorization": f"Bearer {HF_TOKEN}"},
        }
    ]

    # Call Groq with HuggingFace MCP integration using responses API
    response = client.responses.create(
        model=MODEL,
        input=query,
        tools=tools,
        stream=False,
        temperature=0.1,  # Low temperature for consistent tool calling
        top_p=0.4,  # Balanced top_p for focused but flexible responses
    )

    total_time = time.time() - start_time

    # Get response content from responses API format
    content = (
        response.output_text if hasattr(response, "output_text") else str(response)
    )

    print("=" * 80)
    print("HUGGINGFACE MODEL DISCOVERY RESULTS")
    print("=" * 80)
    print(content)
    print("=" * 80)

    # Show executed tools (MCP tool calls) for transparency
    executed_tools = []

    # Extract MCP calls from response if available
    if hasattr(response, "output") and response.output:
        for output_item in response.output:
            if hasattr(output_item, "type") and output_item.type == "mcp_call":
                executed_tools.append(
                    {
                        "type": "mcp",
                        "arguments": getattr(output_item, "arguments", "{}"),
                        "output": getattr(output_item, "output", ""),
                        "name": getattr(output_item, "name", ""),
                        "server_label": getattr(output_item, "server_label", ""),
                    }
                )

    if executed_tools:
        print(f"\nHUGGINGFACE MCP CALLS: Found {len(executed_tools)} tool calls:")
        print("-" * 50)

        for i, tool in enumerate(executed_tools, 1):
            print(f"\nTool Call #{i}")
            print(f"   Type: {tool['type']}")
            print(f"   Tool Name: {tool['name']}")
            print(f"   Server: {tool['server_label']}")
            try:
                if tool["arguments"]:
                    args = (
                        json.loads(tool["arguments"])
                        if isinstance(tool["arguments"], str)
                        else tool["arguments"]
                    )
                    print(f"   Arguments: {args}")

                # Print model results for transparency
                if tool["output"]:
                    output_data = (
                        json.loads(tool["output"])
                        if isinstance(tool["output"], str)
                        else tool["output"]
                    )
                    if isinstance(output_data, dict) and "models" in output_data:
                        print(f"   Models found: {len(output_data['models'])}")
                        for j, model in enumerate(
                            output_data["models"][:5], 1
                        ):  # Show top 5
                            model_name = model.get("id", model.get("name", "Unknown"))
                            print(f"      {j}. {model_name}")
                        if len(output_data["models"]) > 5:
                            print(
                                f"      ... and {len(output_data['models']) - 5} more models"
                            )
                    else:
                        print(f"   Output: {str(output_data)[:200]}...")
            except Exception as e:
                print(f"   Could not parse tool data: {e}")

    # Performance summary
    print(f"\nPERFORMANCE SUMMARY")
    print(f"   Total time: {total_time:.2f} seconds")
    print(f"   HuggingFace MCP calls: {len(executed_tools)}")

    print(
        f"\nNote: This speed comes from fast inference + HuggingFace MCP integration!"
    )

    return {
        "content": content,
        "total_time": total_time,
        "mcp_calls_performed": len(executed_tools),
        "timestamp": datetime.now().isoformat(),
    }

## Demo 1: Discover Top Trending Models on HuggingFace

Let's start by finding the top trending models on HuggingFace to see the speed and accuracy in action. 

**What to watch for:**
- **Speed**: Response time with Groq + HuggingFace MCP 
- **Real-time data**: Current trending models and popularity metrics  
- **Fresh information**: Model data that's more recent than traditional LLM training data
- **Tool transparency**: See exactly which HuggingFace MCP tools were called

In [11]:
# Let's discover the top trending models on HuggingFace
# This will demonstrate both speed and real-time model discovery

result_trending = discover_huggingface_models(
    "Find the top trending model on HuggingFace and tell me about it, use groq_play_tts to speak."
)


Find the top trending model on HuggingFace and tell me about it, use groq_play_tts to speak.
HUGGINGFACE MODEL DISCOVERY RESULTS
**Top‑Trending Hugging Face Model (as of now)**  
**Model:** `tencent/SRPO`  
**Task:** Text‑to‑Image generation  
**Library:** Diffusers (uses `safetensors` format)  
**Created:** 8 Sep 2025 | **Last updated:** 15 Sep 2025  
**Downloads:** ≈ 5.8 K | **Likes:** ≈ 829 | **Trending score:** 779  

### What it does
SRPO (Stable‑Diffusion‑style **S**uper‑Resolution **R**econstruction **P**rompt‑Optimized) is a state‑of‑the‑art text‑to‑image diffusion model released by Tencent. It builds on the latest research (see arXiv 2509.06942) and is optimized for:

* **High‑fidelity image synthesis** from natural language prompts.  
* **Fast inference** thanks to the efficient `safetensors` checkpoint format.  
* **Versatile style control** – works well for photorealistic, artistic, and stylized outputs.  

### Key features
| Feature | Details |
|---------|---------|
| **Mo

## Demo 2: Search for Specific Datasets



In [12]:
result_image_datasets = discover_huggingface_models(
    "Search for the latest image datasets"
)


Search for the latest image datasets
HUGGINGFACE MODEL DISCOVERY RESULTS
Here are the **most‑recent image‑focused datasets** that have just been added to the Hugging Face Hub (sorted by creation date, newest first). All links go directly to the dataset pages where you can view the README, download files, and see usage examples.

| # | Dataset (link) | Size / Category | Modality | Format | Brief description / typical use‑case | Created (UTC) |
|---|----------------|----------------|----------|--------|--------------------------------------|---------------|
| 1 | **[Korea‑MES/Token‑Upperbound‑Herems2.5‑fix‑800‑with‑images](https://hf.co/datasets/Korea-MES/Token-Upperbound-Herems2.5-fix-800-with-images)** | ~1 K – 10 K samples | image + text | Parquet | Token‑upperbound dataset that pairs images with token‑level annotations – useful for OCR, layout‑aware language models, and multimodal token‑level training. | 19 Sep 2025 |
| 2 | **[svjack/Eula_Lawrence_Images_MiniCPM_V4_5_Captioned](https

## Performance Analysis: Why This Combination Works Well

Let's analyze what just happened and why Groq + HuggingFace MCP is effective:


In [13]:
# Let's analyze the performance from our model discovery sessions
print("PERFORMANCE BREAKDOWN")
print("=" * 50)

results = []
if "result_trending" in locals() and result_trending:
    results.append(("Top Trending Models", result_trending))
if "result_cv_models" in locals() and result_image_datasets:
    results.append(("Image Datasets", result_image_datasets))


if results:
    for query_name, result in results:
        print(f"\n{query_name} Discovery:")
        print(f"   Time: {result['total_time']:.2f} seconds")
        print(f"   MCP Calls: {result['mcp_calls_performed']}")
        print(f"   Response length: {len(result['content'])} characters")

    # Calculate averages
    avg_time = sum(r[1]["total_time"] for r in results) / len(results)
    avg_mcp_calls = sum(r[1]["mcp_calls_performed"] for r in results) / len(results)

    print("\nAVERAGES:")
    print(f"   Average response time: {avg_time:.2f} seconds")
    print(f"   Average MCP calls per query: {avg_mcp_calls:.1f}")

else:
    print("No results to analyze yet. Run the model discovery cells above first!")


PERFORMANCE BREAKDOWN

Top Trending Models Discovery:
   Time: 3.29 seconds
   MCP Calls: 2
   Response length: 2336 characters

AVERAGES:
   Average response time: 3.29 seconds
   Average MCP calls per query: 2.0


You just witnessed MCP - where models are equipped with tools and autonomously decide when to execute functions during generation. The model doesn't just generate text; it actively calls APIs and interacts with external systems, as demonstrated with HuggingFace's MCP server that includes tools such as model search, dataset search, and more.

Speed matters because models make real-time decisions about tool usage, often triggering multiple function calls per response. Groq's fast inference makes tool-enabled models feel instantaneous.
The HuggingFace integration here is just one example. MCP works with any tools - databases, APIs, file systems. Ready to build AI features that act, not just talk? We are too.