# Elasticsearch MCP Server for ChatGPT

This notebook demonstrates how to deploy an MCP (Model Context Protocol) server that connects ChatGPT to Elasticsearch, enabling natural language queries over internal GitHub issues and pull requests.

## What You'll Build
An MCP server that allows ChatGPT to search and retrieve information from your Elasticsearch index using natural language queries, combining semantic and keyword search for optimal results.

## Steps
- **Install Dependencies**: Set up required Python packages (fastmcp, elasticsearch, pyngrok, pandas)
- **Configure Environment**: Set up Elasticsearch credentials and ngrok token
- **Initialize Elasticsearch**: Connect to your Elasticsearch cluster
- **Create Index**: Define mappings with semantic_text field for ELSER
- **Load Sample Data**: Import GitHub issues/PRs dataset
- **Ingest Documents**: Bulk index documents into Elasticsearch
- **Define MCP Tools**: Create search and fetch functions for ChatGPT
- **Deploy Server**: Start MCP server with ngrok tunnel
- **Connect to ChatGPT**: Get public URL for ChatGPT connector setup

## Prerequisites
- Elasticsearch cluster with ELSER model deployed
- Ngrok account with auth token
- Python 3.8+

## Install Dependencies

This cell installs all required Python packages: `fastmcp` for the MCP server framework, `elasticsearch` for connecting to Elasticsearch, `pyngrok` for creating a public tunnel, and `pandas` for data manipulation.

In [None]:
!pip install fastmcp elasticsearch pyngrok pandas -q
print("Dependencies installed")

Dependencies installed


## Import Libraries

Import all necessary Python libraries for building and running the MCP server, including FastMCP for the server framework, Elasticsearch client for database connections, and pyngrok for tunneling.

In [None]:
import os
import json
import logging
import threading
import time
import pandas as pd
from typing import Dict, List, Any
from getpass import getpass
from fastmcp import FastMCP
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from pyngrok import ngrok
from pyngrok.conf import PyngrokConfig

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("Libraries imported successfully")

Libraries imported successfully


## Setup Configuration

Load required credentials from environment variables or prompt for manual input. You'll need:
- **ELASTICSEARCH_URL**: Your Elasticsearch cluster endpoint
- **ELASTICSEARCH_API_KEY**: API key with read/write access  
- **NGROK_TOKEN**: Free token from [ngrok.com](https://dashboard.ngrok.com/)
- **ELASTICSEARCH_INDEX**: Index name (defaults to 'github_internal')

In [None]:
os.environ["ELASTICSEARCH_URL"] = os.environ.get("ELASTICSEARCH_URL") or getpass("Enter your Elasticsearch URL: ")
os.environ["ELASTICSEARCH_API_KEY"] = os.environ.get("ELASTICSEARCH_API_KEY") or getpass("Enter your Elasticsearch API key: ")
os.environ["NGROK_TOKEN"] = os.environ.get("NGROK_TOKEN") or getpass("Enter your Ngrok Token: ")
os.environ["ELASTICSEARCH_INDEX"] = os.environ.get("ELASTICSEARCH_INDEX") or getpass("Enter your Elasticsearch Index name (default: github_internal): ") or "github_internal"

ELASTICSEARCH_URL = os.environ["ELASTICSEARCH_URL"]
ELASTICSEARCH_API_KEY = os.environ["ELASTICSEARCH_API_KEY"]
NGROK_TOKEN = os.environ["NGROK_TOKEN"]
INDEX_NAME = os.environ["ELASTICSEARCH_INDEX"]

print("Configuration loaded successfully")
print(f"Index name: {INDEX_NAME}")
print(f"Elasticsearch URL: {ELASTICSEARCH_URL[:30]}...")

## Initialize Elasticsearch Client

Create an Elasticsearch client using your credentials and verify the connection by pinging the cluster. This ensures your credentials are valid before proceeding.

In [None]:
es_client = Elasticsearch(
    ELASTICSEARCH_URL,
    api_key=ELASTICSEARCH_API_KEY
)

if es_client.ping():
    print("Elasticsearch connection successful")
    cluster_info = es_client.info()
    print(f"Cluster: {cluster_info['cluster_name']}")
    print(f"Version: {cluster_info['version']['number']}")
else:
    print("ERROR: Could not connect to Elasticsearch")

## Create Index with Mappings

Create an Elasticsearch index with optimized mappings for hybrid search. The key field is `text_semantic` which uses ELSER (`.elser-2-elasticsearch`) for semantic search, while other fields enable traditional keyword search.

In [None]:
try:
    es_client.indices.create(
        index=INDEX_NAME,
        body={
            "mappings": {
                "properties": {
                    "id": {"type": "keyword"},
                    "title": {"type": "text"},
                    "text": {"type": "text"},
                    "text_semantic": {
                        "type": "semantic_text",
                        "inference_id": ".elser-2-elasticsearch"
                    },
                    "url": {"type": "keyword"},
                    "type": {"type": "keyword"},
                    "status": {"type": "keyword"},
                    "priority": {"type": "keyword"},
                    "assignee": {"type": "keyword"},
                    "created_date": {"type": "date", "format": "iso8601"},
                    "resolved_date": {"type": "date", "format": "iso8601"},
                    "labels": {"type": "keyword"},
                    "related_pr": {"type": "keyword"}
                }
            }
        }
    )
    print(f"Index '{INDEX_NAME}' created successfully")
except Exception as e:
    if 'resource_already_exists_exception' in str(e):
        print(f"Index '{INDEX_NAME}' already exists")
    else:
        print(f"Error creating index: {e}")

## Load Sample Dataset

Load the sample GitHub dataset containing 15 documents with issues, pull requests, and RFCs. The dataset includes realistic content with descriptions, comments, assignees, priorities, and relationships between issues and PRs.

In [None]:
file_path = 'github_internal_dataset.json'
df = pd.read_json(file_path)

documents = df.to_dict('records')
print(f"Loaded {len(documents)} documents from dataset")

df

Loaded 15 documents from dataset


Unnamed: 0,id,title,text,url,type,status,priority,assignee,created_date,resolved_date,labels,related_pr
0,ISSUE-1712,Migrate from Elasticsearch 7.x to 8.x,Description: Current Elasticsearch cluster run...,https://internal-git.techcorp.com/issues/1712,issue,in_progress,medium,david_data,2025-09-01,,"[infrastructure, elasticsearch, migration, upg...",PR-598
1,RFC-038,API Versioning Strategy and Deprecation Policy,Abstract: Establishes a formal API versioning ...,https://internal-git.techcorp.com/rfcs/038,rfc,closed,medium,sarah_dev,2025-09-03,2025-09-25,"[api, architecture, design, rfc]",
2,ISSUE-1834,Add rate limiting per user endpoint,Description: Currently rate limiting is implem...,https://internal-git.techcorp.com/issues/1834,issue,closed,medium,john_backend,2025-09-05,2025-09-12,"[feature, api, redis, rate-limiting]",PR-543
3,ISSUE-1756,Implement OAuth2 support for external API inte...,Description: Product team requesting OAuth2 au...,https://internal-git.techcorp.com/issues/1756,issue,open,high,sarah_dev,2025-09-08,,"[feature, api, security, oauth]",
4,PR-543,Implement per-user rate limiting with Redis,Description: Implements sliding window rate li...,https://internal-git.techcorp.com/pulls/543,pull_request,closed,medium,john_backend,2025-09-10,2025-09-12,"[feature, redis, rate-limiting]",
5,RFC-045,Design Proposal: Microservices Migration Archi...,Abstract: This RFC proposes a phased approach ...,https://internal-git.techcorp.com/rfcs/045,rfc,open,high,tech_lead_mike,2025-09-14,,"[architecture, microservices, design, rfc]",
6,ISSUE-1847,API Gateway returning 429 errors during peak h...,Description: Users are experiencing 429 rate l...,https://internal-git.techcorp.com/issues/1847,issue,closed,critical,john_backend,2025-09-15,2025-09-18,"[bug, api, production, performance]",PR-567
7,PR-567,Fix connection pool exhaustion in API middleware,Description: Implements exponential backoff an...,https://internal-git.techcorp.com/pulls/567,pull_request,closed,critical,john_backend,2025-09-16,2025-09-18,"[bug-fix, api, performance]",
8,ISSUE-1889,SQL injection vulnerability in search endpoint,Description: Security audit identified SQL inj...,https://internal-git.techcorp.com/issues/1889,issue,closed,critical,sarah_dev,2025-09-18,2025-09-19,"[security, vulnerability, bug, sql]",PR-578
9,PR-578,Security hotfix: Patch SQL injection vulnerabi...,Description: CRITICAL SECURITY FIX for ISSUE-1...,https://internal-git.techcorp.com/pulls/578,pull_request,closed,critical,sarah_dev,2025-09-19,2025-09-19,"[security, hotfix, sql]",


## Ingest Documents to Elasticsearch

Bulk index all documents into Elasticsearch. The code copies the `text` field to `text_semantic` for ELSER processing, then waits 15 seconds for semantic embeddings to be generated before verifying the document count.

In [None]:
def generate_actions():
    for doc in documents:
        doc['text_semantic'] = doc['text']
        yield {
            '_index': INDEX_NAME,
            '_source': doc
        }

try:
    success, errors = bulk(es_client, generate_actions())
    print(f"Successfully indexed {success} documents")

    if errors:
        print(f"Errors during indexing: {errors}")

    print("Waiting 15 seconds for ELSER to process documents...")
    time.sleep(15)

    count = es_client.count(index=INDEX_NAME)['count']
    print(f"Total documents in index: {count}")

except Exception as e:
    print(f"Error during bulk indexing: {str(e)}")
    print("If you see timeout errors, wait a few seconds and try again")

## Define MCP Server

Define the MCP server with two tools that ChatGPT will use:
1. **search(query)**: Hybrid search combining semantic (ELSER) and keyword (BM25) search using RRF (Reciprocal Rank Fusion). Returns top 10 results with id, title, and url.
2. **fetch(id)**: Retrieves complete document details by ID, returning all fields including full text content and metadata.

In [None]:
server_instructions = """
This MCP server provides access to TechCorp's internal GitHub issues and pull requests.
Use search to find relevant issues/PRs, then fetch to get complete details.
"""

def create_server():
    mcp = FastMCP(
        name="Elasticsearch GitHub Issues MCP",
        instructions=server_instructions
    )

    @mcp.tool()
    async def search(query: str) -> Dict[str, List[Dict[str, Any]]]:
        """
        Search for internal issues and PRs using hybrid search.
        Returns list with id, title, and url.
        """
        if not query or not query.strip():
            return {"results": []}

        logger.info(f"Searching for: '{query}'")

        try:
            # Hybrid search using RRF: combines semantic (ELSER) + keyword (multi_match) results
            response = es_client.search(
                index=INDEX_NAME,
                size=10,
                source=["id", "title", "url", "type", "priority"],
                retriever={
                    "rrf": {
                        "retrievers": [
                            {
                                # Semantic retriever using ELSER embeddings
                                "standard": {
                                    "query": {
                                        "semantic": {
                                            "field": "text_semantic",
                                            "query": query
                                        }
                                    }
                                }
                            },
                            {
                                # Keyword retriever with fuzzy matching
                                "standard": {
                                    "query": {
                                        "multi_match": {
                                            "query": query,
                                            "fields": [
                                                "title^3",
                                                "text^2",
                                                "assignee^2",
                                                "type",
                                                "labels",
                                                "priority"
                                            ],
                                            "type": "best_fields",
                                            "fuzziness": "AUTO"
                                        }
                                    }
                                }
                            }
                        ],
                        "rank_window_size": 50,
                        "rank_constant": 60
                    }
                }
            )

            # Extract and format search results
            results = []
            if response and 'hits' in response:
                for hit in response['hits']['hits']:
                    source = hit['_source']
                    results.append({
                        "id": source.get('id', hit['_id']),
                        "title": source.get('title', 'Unknown'),
                        "url": source.get('url', '')
                    })

            logger.info(f"Found {len(results)} results")
            return {"results": results}

        except Exception as e:
            logger.error(f"Search error: {e}")
            raise ValueError(f"Search failed: {str(e)}")

    @mcp.tool()
    async def fetch(id: str) -> Dict[str, Any]:
        """
        Retrieve complete issue/PR details by ID.
        Returns id, title, text, url, and metadata.
        """
        if not id:
            raise ValueError("ID is required")

        logger.info(f"Fetching: {id}")

        try:
            # Query by ID to get full document
            response = es_client.search(
                index=INDEX_NAME,
                body={
                    "query": {
                        "term": {
                            "id": id
                        }
                    },
                    "size": 1
                }
            )

            if not response or not response['hits']['hits']:
                raise ValueError(f"Document with id '{id}' not found")

            hit = response['hits']['hits'][0]
            source = hit['_source']

            # Return all document fields
            result = {
                "id": source.get('id', id),
                "title": source.get('title', 'Unknown'),
                "text": source.get('text', ''),
                "url": source.get('url', ''),
                "type": source.get('type', ''),
                "status": source.get('status', ''),
                "priority": source.get('priority', ''),
                "assignee": source.get('assignee', ''),
                "created_date": source.get('created_date', ''),
                "resolved_date": source.get('resolved_date', ''),
                "labels": source.get('labels', ''),
                "related_pr": source.get('related_pr', '')
            }

            logger.info(f"Fetched: {result['title']}")
            return result

        except Exception as e:
            logger.error(f"Fetch error: {e}")
            raise ValueError(f"Failed to fetch '{id}': {str(e)}")

    return mcp

print("MCP server defined successfully")

MCP server defined successfully


## Start Ngrok Tunnel

Create a public HTTPS tunnel using ngrok to expose your local MCP server on port 8000. This allows ChatGPT to connect to your server from anywhere. Copy the displayed URL (ending in `/sse`) to use in ChatGPT's connector settings.

In [None]:
ngrok.set_auth_token(NGROK_TOKEN)

pyngrok_config = PyngrokConfig(region="us")
public_url = ngrok.connect(
    8000,
    "http",
    pyngrok_config=pyngrok_config,
    bind_tls=True
)

print("="*70)
print("MCP SERVER IS READY!")
print("="*70)
print(f"\nPublic URL (use in ChatGPT): {public_url}/sse")
print("\nIMPORTANT: Copy the URL above (including /sse at the end)")
print("\nTo connect in ChatGPT:")
print("1. Go to Settings > Connectors")
print("2. Click 'Create' or 'Add Custom Connector'")
print("3. Paste the URL above")
print("4. Save and start using!")
print("\nKeep this notebook running while using the connector")
print("="*70)

## Run MCP Server

Start the MCP server in a background thread using SSE (Server-Sent Events) transport. The server runs on `0.0.0.0:8000` and stays alive to handle requests from ChatGPT via the ngrok tunnel. Keep this cell running while using the connector.

In [None]:
server = create_server()

print("Starting MCP server...")
print("Server is running. To stop: Runtime > Interrupt execution")
print()

def run_server():
    server.run(transport="sse", host="0.0.0.0", port=8000)

server_thread = threading.Thread(target=run_server, daemon=True)
server_thread.start()

print("Server started successfully!")
print("Your ngrok URL is ready to use in ChatGPT")
print("Keep this cell running...")
print()

try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("\nServer stopped")

Starting MCP server...
Server is running. To stop: Runtime > Interrupt execution

Server started successfully!
Your ngrok URL is ready to use in ChatGPT
Keep this cell running...



INFO:     Started server process [47952]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:09-0300 lvl=info msg="join connections" obj=join id=2f547f1e02b9 l=127.0.0.1:8000 r=191.233.196.115:8612


INFO:     191.233.196.115:0 - "POST /sse HTTP/1.1" 405 Method Not Allowed


INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg="join connections" obj=join id=f157e39aac9d l=127.0.0.1:8000 r=191.233.196.120:47762


INFO:     191.233.196.120:0 - "GET /sse HTTP/1.1" 200 OK


INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg="join connections" obj=join id=5a9192136cfb l=127.0.0.1:8000 r=191.233.196.117:53796


INFO:     191.233.196.117:0 - "POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1" 202 Accepted
INFO:     191.233.196.117:0 - "POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1" 202 Accepted
INFO:     191.233.196.117:0 - "POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1" 202 Accepted


INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest
INFO:pyngrok.process.ngrok:t=2025-11-13T11:47:43-0300 lvl=info msg="received stop request" obj=app stopReq="{err:<nil> restart:false}"



Server stopped


## Cleanup (Optional)

Delete the Elasticsearch index to remove all demo data. 
**WARNING**: This permanently deletes all documents in the index. Only run this if you want to start fresh or clean up after the demo.

In [None]:
try:
    result = es_client.options(ignore_status=[400, 404]).indices.delete(index=INDEX_NAME)
    if result.get('acknowledged', False):
        print(f"Index '{INDEX_NAME}' deleted successfully")
    else:
        print(f"Error deleting index: {result}")
except Exception as e:
    print(f"Error: {e}")