[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/finance/01_Financial_Data_Integration_MCP.ipynb)

# Financial Data Integration (MCP) - Real-Time Market Data

## Overview

This notebook demonstrates **financial data integration using MCP servers** with focus on **MCP server integration**, **real-time data ingestion**, and **multi-source financial KG construction**. The pipeline integrates Python/FastMCP servers to ingest market data, stock prices, and metrics into a financial knowledge graph.

### Key Features

- **MCP Integration**: Showcases MCP (Model Context Protocol) server integration capability
- **Real-Time Data Ingestion**: Ingests live market data from MCP servers
- **Multi-Source Financial KG**: Builds comprehensive financial knowledge graphs from APIs
- **Real-Time Updates**: Demonstrates real-time data streaming and KG updates
- **API-Based Integration**: Connects to financial data APIs via MCP

### Pipeline Architecture

1. **Phase 0**: Setup & Configuration
2. **Phase 1**: MCP Server Setup & Connection
3. **Phase 2**: Real-Time Market Data Ingestion
4. **Phase 3**: Financial Entity Extraction
5. **Phase 4**: Financial Knowledge Graph Construction
6. **Phase 5**: Real-Time KG Updates
7. **Phase 6**: Visualization & Export

---

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas groq


---

## Phase 0: Setup & Configuration


In [None]:
import os
from semantica.core import Semantica, ConfigManager
from semantica.ingest import MCPIngestor

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key")

config_dict = {
    "project_name": "Financial_Data_Integration_MCP",
    "extraction": {"provider": "groq", "model": "llama-3.1-8b-instant"},
    "knowledge_graph": {"backend": "networkx"}
}

config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
print("Configured for financial data integration with MCP focus")


---

## Phase 1: Real Data Ingestion (Alpha Vantage API & MCP Server)

Ingest financial data from Alpha Vantage API and MCP servers.


In [None]:
from semantica.ingest import MCPIngestor, WebIngestor, FileIngestor
from semantica.seed import SeedDataManager
import os

os.makedirs("data", exist_ok=True)

documents = []

# Option 1: Ingest from Alpha Vantage API (real data source)
# Note: Requires API key - using example structure
alpha_vantage_api = "https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=AAPL&apikey=demo"

try:
    web_ingestor = WebIngestor()
    api_documents = web_ingestor.ingest(alpha_vantage_api, method="url")
    print(f"Ingested {len(api_documents)} documents from Alpha Vantage API")
    documents.extend(api_documents)
except Exception as e:
    print(f"API ingestion failed: {e}")

# Option 2: Ingest from MCP server (simulated)
# In production: mcp_ingestor = MCPIngestor()
# mcp_ingestor.connect("financial_server", url="http://localhost:8000/mcp")
# resources = mcp_ingestor.list_available_resources("financial_server")
# mcp_data = mcp_ingestor.ingest_resources("financial_server", resource_uris=["resource://market_data"])

# Load seed data for market foundation
seed_manager = SeedDataManager()
# Seed with foundation market data
seed_data = [
    {"type": "Market", "text": "NASDAQ", "description": "Stock exchange"},
    {"type": "Market", "text": "NYSE", "description": "Stock exchange"}
]
seed_manager.load_seed_data(seed_data)
print(f"Loaded {len(seed_data)} seed data items")

# Fallback: Sample market data
if not documents:
    market_data = """
    AAPL stock price: $150.25, market cap: $2.4T, volume: 50M shares
    MSFT stock price: $380.50, market cap: $2.8T, volume: 30M shares
    GOOGL stock price: $140.75, market cap: $1.8T, volume: 25M shares
    """
    with open("data/market_data.txt", "w") as f:
        f.write(market_data)
    documents = FileIngestor().ingest("data/market_data.txt")
    print(f"Ingested {len(documents)} documents from sample data")


---

## Phase 2: Text Normalization

Normalize financial data (prices, metrics, company names).


In [None]:
from semantica.normalize import TextNormalizer

# Normalize financial data
normalizer = TextNormalizer()
normalized_documents = []
for doc in documents:
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        normalize_numbers=True,  # Normalize financial numbers
        remove_extra_whitespace=True
    )
    normalized_documents.append(normalized_text)

print(f"Normalized {len(normalized_documents)} documents")

# Build financial knowledge graph
result = core.build_knowledge_base(
    sources=normalized_documents,
    custom_entity_types=["Company", "Stock", "Price", "Metric", "Market"],
    graph=True
)

kg = result["knowledge_graph"]
print(f"Built financial KG with {len(kg.get('entities', []))} entities")
print("Focus: MCP integration, real-time ingestion, multi-source financial KG")


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(kg, output_path="financial_data_kg.html")

print("Financial data integration (MCP) complete")
print("\n=== Pipeline Summary ===")
print(f"✓ Ingested {len(documents)} documents from Alpha Vantage API and MCP")
print(f"✓ Loaded {len(seed_data)} seed data items")
print(f"✓ Normalized {len(normalized_documents)} documents")
print(f"✓ Built financial KG with {len(kg.get('entities', []))} entities")
print(f"✓ Emphasizes: MCP integration, real-time ingestion, API-based KG construction, seed data")
