Unified Enterprise Knowledge Search (Advanced track) by vigbav36 · Pull Request #4 · mindsdb/examples

vigbav36 · 2025-11-05T05:58:48Z

Area51 - Unified Enterprise Knowledge Search (Advanced Track)

A production-ready AI-powered search system that unifies Zendesk tickets, Jira issues, and Confluence documentation into a single semantic search interface. Built with MindsDB Knowledge Bases for intelligent, context-aware discovery across all your enterprise knowledge sources.

Quick Start: See QUICKSTART.md for immediate setup instructions.

Demo

▶️ Watch the Demo Video

Featured

Blog Post: Unifying Enterprise Knowledge Search with MindsDB
LinkedIn: Project Announcement

Problem Statement

Modern enterprises battle knowledge fragmentation every day. Critical information is scattered across three distinct platforms:

Zendesk: Customer support tickets, resolutions, and interaction history
Jira: Engineering issues, bug reports, and project tracking
Confluence: Technical documentation, SOPs, and internal guides

This fragmentation creates real operational costs. Support agents waste 20+ minutes per ticket switching between systems, trying to piece together context that exists but isn't discoverable. An engineer investigating a bug can't easily see customer impact. Documentation teams operate blind to actual support needs.

The result: slower resolutions, duplicated effort, and knowledge that never reaches the people who need it.

Area51 solves this by creating a unified semantic search layer that understands relationships and context across all three platforms simultaneously.

Architecture

Area51 integrates Zendesk, Jira, and Confluence through MindsDB Knowledge Bases, using hybrid semantic search to provide intelligent results across all sources.

Core Components

Data Integration Layer

Three MindsDB datasources connecting directly to Zendesk, Jira, and Confluence APIs
Automatic authentication, pagination, and rate limiting handled by MindsDB connectors
No custom ETL code required

Knowledge Base Layer

Three separate knowledge bases indexed with Azure OpenAI embeddings
Automatic chunking and embedding generation
pgvector storage for fast similarity search

Automation Layer

Scheduled MindsDB jobs refresh each knowledge base incrementally
Zendesk KB updates hourly
Jira KB updates every 30 minutes
Confluence KB updates every 6 hours
No separate orchestration infrastructure needed

Query Layer
Three agent interfaces providing different interaction models:

LangGraph Agent: Conversational, multi-turn reasoning for complex queries
MindsDB SQL Agent: High-performance direct queries for dashboards and automation
MCP Server: Integration with AI assistants like Claude Desktop

Evaluation Framework
Built-in MindsDB evaluation metrics track search performance continuously:

Hit rate at K (H@1, H@5, H@10)
Mean Reciprocal Rank (MRR)
Cumulative recall
Query latency

Knowledge Base Schema

Each knowledge base follows a consistent schema that combines semantic search with structured metadata filtering.

Zendesk Knowledge Base

Storage: pgvector with Azure OpenAI embeddings

Field	Description
`id`	Unique ticket identifier
`subject`	Ticket subject line
`description`	Ticket description content
`priority`	Ticket priority level
`status`	Current ticket status
`assignee_id`	Assigned user ID
`requester_id`	Requesting user ID
`brand_id`	Associated brand ID
`created_at`	Ticket creation timestamp
`updated_at`	Last update timestamp
`type`	Ticket type classification
`tags`	Associated tags
`url`	Ticket web URL
`organization_id`	Related organization ID
`group_id`	Support group ID
`due_at`	Due date timestamp
`custom_fields`	Custom field values
`satisfaction_rating.score`	Customer satisfaction rating

Jira Knowledge Base

Field	Description
`id`	Issue identifier
`key`	Issue key (e.g., PROJ-123)
`project_id`	Project identifier
`project_key`	Project key code
`project_name`	Project display name
`summary`	Issue summary title
`description`	Issue description
`priority`	Priority level
`creator`	Creator username
`assignee`	Assignee username
`status`	Current status

Confluence Knowledge Base

Field	Description
`id`	Page identifier
`status`	Page status
`title`	Page title
`spaceId`	Space identifier
`authorId`	Author user ID
`ownerId`	Owner user ID
`createdAt`	Creation timestamp
`body_storage_value`	Page content
`version_number`	Version number
`_links_webui`	Web UI link

SQL Examples

Semantic Search

Find relevant tickets using natural language understanding:

SELECT *
FROM confluence_kb
WHERE content = 'docker'
ORDER BY relevance DESC
LIMIT 5;

The semantic search understands that "docker" relates to "open source hosting" without requiring exact keyword matches.

Structured Filtering

Query specific metadata fields for precise results:

SELECT 
    *
FROM jira_kb
WHERE status = 'In Progress';

Hybrid Search

Combine semantic understanding with structured metadata filters:

SELECT *
FROM zendesk_kb
WHERE content = '2fa'
and hybrid_search=true
and hybrid_search_alpha=0.6
LIMIT 10;

This query semantically searches for 2fa-related content while simultaneously filtering by while using hybrid search functionality of mindsDb.

Scheduled Refresh Jobs

Keep knowledge bases current with automatic incremental updates:

CREATE JOB jira_refresh_job (
    insert into jira_kb
    select * from jira_datasource.issues;
) EVERY hour;

Evaluation Metrics

Monitor search quality continuously:

EVALUATE KNOWLEDGE_BASE confluence_kb
USING 
    test_table = 'pgvector_datasource.confluence_test_table',
    version = 'doc_id',
    generate_data = false,
    evaluate = true;

Returns comprehensive metrics including Hit@1, Hit@5, Hit@10, MRR, and recall scores.

Performance Metrics

Area51 achieves strong performance across all three knowledge bases with sub-second query times and high relevance.

Knowledge Base	Hit Rate	Recall Score	Avg Query Time
Confluence KB	94.0%	0.940	0.15s

Detailed Evaluation Results

The evaluation framework generates comprehensive metrics showing retrieval performance across different query types and confidence thresholds.

Comparative Analysis

Side-by-side comparison of knowledge base performance across multiple evaluation dimensions.

Metrics are generated using MindsDB's built-in evaluation framework, which calculates hit rates, cumulative recall, and mean reciprocal rank. The system auto-generates test questions from knowledge base content for continuous quality monitoring.

Setup

Prerequisites

Docker and Docker Compose
Python 3.10+
Azure OpenAI API key
Zendesk API credentials
Jira API credentials
Confluence API credentials

Installation

For detailed installation instructions, see QUICKSTART.md.

Usage

LangGraph Conversational Agent

The LangGraph agent provides a visual development interface and conversational interaction with your knowledge bases through an intelligent routing system.

Starting the LangGraph Server

Launch the LangGraph development server:

uv run langgraph dev

The server will start on http://127.0.0.1:2024 and automatically open LangChain Studio in your browser.

Server output:

LANGGRAPH

API: http://127.0.0.1:2024
Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
API Docs: http://127.0.0.1:2024/docs

Registering graph with id 'support_agent'
Server started in 2.32s
Opening Studio in your browser...

Graph Architecture

The support_agent graph uses intelligent routing to search across all knowledge bases:

Router - Analyzes the user's query and determines which systems to search
Search Nodes - Three parallel search operations:
- search_jira - Search Jira issues
- search_zendesk - Search Zendesk tickets
- search_confluence - Search Confluence documentation
Aggregate Results - Synthesizes findings from all searches into a unified response
Error Handling - Manages failures gracefully

Using LangGraph Studio

The Studio UI opens automatically showing the graph visualization
Click the "Chat" tab to interact with the agent
Type your query in the input field (e.g., "401 errors on payment API")
The agent routes your query to relevant knowledge bases
Returns synthesized results with context from Jira, Zendesk, and Confluence

Example Interaction

Query: "authentication failures on payment API"

Agent Response:

Found 15 similar tickets in Zendesk with 401 errors. Most common resolution 
is customers using sandbox keys in production environment. The "API 
Authentication Troubleshooting Guide" in Confluence has a dedicated section 
on this with debugging steps. See resolved ticket ZD-1247 for a similar case 
that was resolved in 5 minutes.

API Access

You can also interact with the agent programmatically via the REST API:

import requests

response = requests.post(
    'http://127.0.0.1:2024/runs',
    json={
        'assistant_id': 'support_agent',
        'input': {'messages': [{'role': 'user', 'content': '401 errors on API'}]}
    }
)

Debugging & Monitoring

Use the "Trace" button in Studio to inspect the execution flow through each node, making it easy to debug routing decisions and understand how queries are processed.

The Studio interface provides real-time visibility into:

Which knowledge bases are being queried
Response confidence scores
Error paths and fallback handling
Thread state and conversation context

MindsDB SQL Agent

The MindsDB SQL agent provides a high-performance interface for direct queries to your knowledge bases. It's automatically created during setup and accessible through both the MindsDB Studio UI and the SDK.

Using MindsDB Studio UI

Access the MindsDB Studio at http://localhost:47334
Navigate to the SQL Editor
Query the support_agent using natural language questions:

SELECT answer
FROM support_agent
WHERE question = 'Anything that the engineering team is working on related to 2fa on jira?';

The agent understands your question semantically and returns relevant results from all knowledge bases, automatically routing to Jira, Zendesk, or Confluence as needed.

Example Query

Query:

SELECT answer
FROM support_agent
WHERE question = 'Anything that the engineering team is working on related to 2fa on jira?';

Response:

Here is what the engineering team is working on related to 2FA (Two-Factor Authentication) in Jira:

[Summary of Jira issues returned in tabular format with Issue Key, Summary, Status, Project Name, etc.]

This issue is currently marked as "In Progress" and is part of the "Production Issues" project. 
If you need more details or want to see related issues, let me know!

Direct Knowledge Base Queries

You can also query knowledge bases directly using MindsDB SDK:

import mindsdb_sdk

server = mindsdb_sdk.connect('http://localhost:47334')
project = server.projects.list()[0]
kb = project.knowledge_bases['zendesk_kb']

# Execute semantic search
results = kb.search('payment errors', limit=10)

Semantic Search Queries

Query knowledge bases semantically using SQL:

SELECT 
    ticket_id,
    title,
    description,
    confidence_score
FROM zendesk_kb
WHERE text MATCHES 'authentication errors payment API'
ORDER BY confidence_score DESC
LIMIT 5;

MCP Server

The MCP (Model Context Protocol) server enables AI assistants like Cursor and Claude Desktop to search across all knowledge bases through natural conversation.

Starting the Server

Launch the FastMCP server with Server-Sent Events (SSE) transport:

uv run fastmcp run server.py --transport=sse

The server will start on http://127.0.0.1:5000/sse with the name "Support Agent Assistant".

Server output:

FastMCP 2.0
Server: Support Agent Assistant
Transport: sse
URL: http://127.0.0.1:5000/sse
FastMCP version: 2.12.5
MCP SDK version: 1.16.0

Available Tools

The MCP server exposes these tools for AI assistants:

search_zendesk - Search Zendesk tickets semantically
search_jira - Search Jira issues semantically
search_confluence - Search Confluence documentation
search_all - Unified search across all knowledge bases
evaluate_kb - Get evaluation metrics for a knowledge base

Configure Cursor

To use Area51's MCP server in Cursor:

Open Cursor settings (Cmd/Ctrl + ,)
Navigate to "Features" → "MCP Servers"
Add a new MCP server configuration:

{
  "mcpServers": {
    "area51-support-agent": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "/path/to/enterprise-search/server.py", "--transport=sse"],
      "env": {
        "MINDSDB_URL": "http://localhost:47334"
      }
    }
  }
}

Update the path to match your installation directory
Restart Cursor

Once configured, you can ask Cursor: "Search for authentication errors in Zendesk" and it will use the MCP server to query your knowledge bases.

Session Management

The server uses session IDs for maintaining conversation context. Each interaction creates a session that persists throughout the conversation. Session IDs are automatically generated and managed by the server.

Alternative: Claude Desktop Configuration

For Claude Desktop, configure in ~/.claude_desktop_config.json:

{
  "mcpServers": {
    "enterprise-search": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "server.py", "--transport=sse"]
    }
  }
}

Documentation

Full MCP configuration details: https://gofastmcp.com

Why MindsDB

Traditional RAG systems require stitching together multiple components:

Vector database (Pinecone/Weaviate/PgVector)
Custom ETL pipelines per source
Embedding service management
Job orchestration (Airflow/Kubernetes)
Custom evaluation framework

MindsDB consolidates this into a single SQL-native platform. Creating a knowledge base requires one CREATE KNOWLEDGE_BASE statement. Scheduling incremental updates is a CREATE JOB statement. Evaluation is a single EVALUATE command.

The result: a production-ready, multi-source RAG system built in days instead of months, with orders of magnitude less code to maintain.

Advanced Track Submission

Advanced Track Requirement	How the PR Qualifies	Evidence / Implementation
Functional application using MindsDB Knowledge Bases	Area51 is a fully functional enterprise semantic search system built on MindsDB KBs.	Demonstrated through working integrations with Zendesk, Jira, and Confluence Knowledge Bases, each using MindsDB for semantic and hybrid search.
Data sources connected and ingested via `INSERT INTO`	Data from Zendesk, Jira, and Confluence is ingested into individual KBs using MindsDB connectors and `INSERT INTO` SQL statements.	Explicitly described under Data Integration Layer and Knowledge Base Layer where data is indexed via Azure OpenAI embeddings.
Hybrid Search: semantic + metadata filtering	The system performs hybrid searches that combine semantic understanding with structured filters.	Example query: `SELECT * FROM zendesk_kb WHERE content='2fa' AND hybrid_search=true AND hybrid_search_alpha=0.6 LIMIT 10;`.
Jobs Integration (`CREATE JOB`) for auto-updates	Automatic incremental KB refresh using scheduled MindsDB jobs.	Example provided: `CREATE JOB jira_refresh_job (INSERT INTO jira_kb SELECT * FROM jira_datasource.issues;) EVERY hour;`.
Agent Integration	Includes multiple agent interfaces for intelligent querying.	Implemented via LangGraph Agent (conversational), MindsDB SQL Agent, and MCP Server for integration with AI assistants like Claude Desktop and Cursor.
Metadata Filtering	Combines structured filters with semantic queries for precise results.	Shown in SQL examples with `WHERE status='In Progress'` and hybrid filters using `LIKE` and `BETWEEN` operators.
Evaluation of Knowledge Bases (MRR, Hit@k, Relevancy)	Integrated evaluation framework using MindsDB’s `EVALUATE KNOWLEDGE_BASE` command.	Returns metrics including Hit@1, Hit@5, Hit@10, MRR, recall, and latency. Displayed with evaluation charts.
Demonstrated Hybrid Search and Semantic Results	Unified semantic search retrieves meaningful results across multiple KBs.	Shown through examples where queries like “authentication failures on payment API” retrieve relevant Zendesk, Jira, and Confluence results.

Authors

Built by the Area51 team.
@vaarshha @vigbav36 @prasnna-saravanan

Pavan1471 · 2025-11-10T13:35:52Z

👍

amannn17 · 2025-11-10T13:56:24Z

👍

Init enterprise search project

bb83a56

torrmal approved these changes Nov 6, 2025

View reviewed changes

torrmal merged commit 7f948f3 into main Nov 6, 2025

vigbav36 changed the title ~~Hacktoberfest 2025 - Unified Enterprise Knowledge Search~~ Hacktoberfest 2025 - Unified Enterprise Knowledge Search (Advanced track) Nov 10, 2025

vigbav36 changed the title ~~Hacktoberfest 2025 - Unified Enterprise Knowledge Search (Advanced track)~~ Unified Enterprise Knowledge Search (Advanced track) Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified Enterprise Knowledge Search (Advanced track)#4

Unified Enterprise Knowledge Search (Advanced track)#4
torrmal merged 1 commit intomainfrom
unified-enterprise-search

vigbav36 commented Nov 5, 2025 •

edited

Loading

Uh oh!

Pavan1471 commented Nov 10, 2025

Uh oh!

amannn17 commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vigbav36 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Area51 - Unified Enterprise Knowledge Search (Advanced Track)

Demo

Featured

Problem Statement

Architecture

Core Components

Knowledge Base Schema

Zendesk Knowledge Base

Jira Knowledge Base

Confluence Knowledge Base

SQL Examples

Semantic Search

Structured Filtering

Hybrid Search

Scheduled Refresh Jobs

Evaluation Metrics

Performance Metrics

Detailed Evaluation Results

Comparative Analysis

Setup

Prerequisites

Installation

Usage

LangGraph Conversational Agent

MindsDB SQL Agent

MCP Server

Why MindsDB

Advanced Track Submission

Authors

Uh oh!

Pavan1471 commented Nov 10, 2025

Uh oh!

amannn17 commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vigbav36 commented Nov 5, 2025 •

edited

Loading