[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/finance/01_Financial_Data_Integration.ipynb)

# 📈 Financial Data Integration Pipeline

## Overview

This notebook demonstrates how to integrate Python/FastMCP MCP servers as data sources for financial data ingestion. Connect to financial data MCP servers via URL, ingest market data, stock prices, and financial metrics, then build a knowledge graph for financial analysis.

> [!IMPORTANT]
> This implementation supports ONLY Python-based MCP servers and FastMCP servers. Users can bring their own Python/FastMCP MCP servers via URL connections.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

### 🧩 Modules Used (20+)

- **Ingestion**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, EmailIngestor, RepoIngestor, MCPIngestor
- **Parsing**: `MCPParser`, `JSONParser`, `StructuredDataParser`
- **Extraction**: `NERExtractor`, `RelationExtractor`, `EventDetector`, `SemanticAnalyzer`
- **KG**: `GraphBuilder`, `TemporalGraphQuery`, `GraphAnalyzer`
- **Analytics**: `CentralityCalculator`, `CommunityDetector`, `ConnectivityAnalyzer`
- **Reasoning**: `Reasoner (Legacy)`, `RuleManager`, `ExplanationGenerator`
- **Export**: `JSONExporter`, `CSVExporter`, `RDFExporter`, `ReportGenerator`
- **Visualization**: `KGVisualizer`, `TemporalVisualizer`, `AnalyticsVisualizer`

### 🔄 Pipeline

**Connect to Financial MCP Server → Ingest Market Data via MCP → Parse MCP Responses → Extract Financial Entities → Build Financial KG → Analyze Trends → Generate Reports → Visualize**

---

## 🔌 Step 1: Connect to Financial Data MCP Server

Connect to a Python/FastMCP MCP server that provides financial data via URL. The MCP server can expose resources (datasets, market data) and tools (queries, calculations).


In [None]:
!pip install semantica


In [None]:
from semantica.ingest import MCPIngestor, ingest_mcp
from semantica.parse import MCPParser, JSONParser, StructuredDataParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, EventDetector, SemanticAnalyzer
from semantica.kg import GraphBuilder, TemporalGraphQuery, GraphAnalyzer
from semantica.kg import CentralityCalculator, CommunityDetector, ConnectivityAnalyzer
# # from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.export import JSONExporter, CSVExporter, RDFExporter, ReportGenerator
from semantica.visualization import KGVisualizer, TemporalVisualizer, AnalyticsVisualizer
import json
from datetime import datetime, timedelta

# Initialize MCP ingestor
mcp_ingestor = MCPIngestor()

# Connect to financial data MCP server via URL
# Replace with your actual MCP server URL
# Example: http://localhost:8000/mcp or https://api.example.com/financial-mcp
financial_mcp_url = "http://localhost:8000/mcp"

# Connect to MCP server
mcp_ingestor.connect(
    "financial_server",
    url=financial_mcp_url,
    headers={"Authorization": "Bearer your_token"} if "api.example.com" in financial_mcp_url else {}
)

# List available resources (datasets, market data feeds)
resources = mcp_ingestor.list_available_resources("financial_server")
print(f"\n📊 Available Resources ({len(resources)}):")
for resource in resources[:5]:  # Show first 5
    print(f"  - {resource.uri}: {resource.name}")
    if resource.description:
        print(f"    {resource.description[:80]}...")

# List available tools (queries, calculations)
tools = mcp_ingestor.list_available_tools("financial_server")
print(f"\n🔧 Available Tools ({len(tools)}):")
for tool in tools[:5]:  # Show first 5
    print(f"  - {tool.name}: {tool.description or 'No description'}")


## 📥 Step 2: Ingest Financial Data from MCP Server

Ingest financial data using both resource-based and tool-based methods from the MCP server.


In [None]:
# Initialize parsers
mcp_parser = MCPParser()
json_parser = JSONParser()
structured_parser = StructuredDataParser()

financial_data = []

# Method 1: Resource-based ingestion
# Ingest from MCP resources (pre-defined datasets)
# Example: Ingest market data resource
resource_data = mcp_ingestor.ingest_resources(
    "financial_server",
    resource_uris=["resource://market_data/daily", "resource://market_data/stocks"]
)

for item in resource_data:
    financial_data.append(item)
    print(f"  Ingested resource: {item}")

# Method 2: Tool-based ingestion
# Call MCP tools to retrieve data dynamically
# Example: Get stock prices for specific symbols
stock_prices = mcp_ingestor.ingest_tool_output(
    "financial_server",
    tool_name="get_stock_prices",
    arguments={
        "symbols": ["AAPL", "MSFT", "GOOGL", "TSLA"],
        "date": datetime.now().isoformat()
    }
)

if stock_prices:
    financial_data.append(stock_prices)
    print(f"  Retrieved stock prices for {len(stock_prices) if isinstance(stock_prices, list) else 1} symbols")

# Example: Get market metrics
market_metrics = mcp_ingestor.ingest_tool_output(
    "financial_server",
    tool_name="get_market_metrics",
    arguments={"sector": "Technology"}
)

if market_metrics:
    financial_data.append(market_metrics)
    print(f"  Retrieved market metrics")

# Sample financial data (if MCP server is not available)
if not financial_data:
    sample_data = {
        "stock_prices": [
            {
                "symbol": "AAPL",
                "company": "Apple Inc.",
                "price": 175.50,
                "change": 2.30,
                "change_percent": 1.33,
                "volume": 45000000,
                "timestamp": (datetime.now() - timedelta(hours=1)).isoformat(),
                "sector": "Technology"
            },
            {
                "symbol": "MSFT",
                "company": "Microsoft Corporation",
                "price": 380.25,
                "change": -1.50,
                "change_percent": -0.39,
                "volume": 28000000,
                "timestamp": (datetime.now() - timedelta(hours=1)).isoformat(),
                "sector": "Technology"
            },
            {
                "symbol": "GOOGL",
                "company": "Alphabet Inc.",
                "price": 142.80,
                "change": 3.20,
                "change_percent": 2.29,
                "volume": 32000000,
                "timestamp": (datetime.now() - timedelta(minutes=30)).isoformat(),
                "sector": "Technology"
            },
            {
                "symbol": "TSLA",
                "company": "Tesla Inc.",
                "price": 245.60,
                "change": 5.40,
                "change_percent": 2.25,
                "volume": 55000000,
                "timestamp": datetime.now().isoformat(),
                "sector": "Automotive"
            }
        ],
        "market_metrics": {
            "total_volume": 150000000,
            "market_cap": 15000000000000,
            "sectors": ["Technology", "Automotive"]
        }
    }
    financial_data.append(sample_data)
    print(f"  Loaded {len(sample_data['stock_prices'])} stock prices")

print(f"\n📊 Total financial data items ingested: {len(financial_data)}")


## 📄 Step 3: Parse MCP Data

Parse the data received from MCP server responses (JSON, structured data).


In [None]:
parsed_financial_data = []

# Parse MCP responses
for data_item in financial_data:
    # Parse MCP response (handles JSON, text, binary)
    if isinstance(data_item, dict):
        # If it's already structured, use it directly
        parsed_item = data_item
    else:
        # Parse using MCP parser
        parsed_item = mcp_parser.parse_response(data_item, response_type="json")
    
    parsed_financial_data.append(parsed_item)
    print(f"  Parsed data item")

# Extract stock prices from parsed data
stock_prices = []
for item in parsed_financial_data:
    if isinstance(item, dict):
        if "stock_prices" in item:
            stock_prices.extend(item["stock_prices"])
        elif "symbol" in item:
            stock_prices.append(item)


## ⛏️ Step 4: Extract Financial Entities and Relationships

Extract financial entities (companies, stocks, sectors) and relationships from MCP data.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
event_detector = EventDetector()
semantic_analyzer = SemanticAnalyzer()

financial_entities = []
financial_relationships = []

# Extract entities and relationships from stock prices
for stock in stock_prices:
    if isinstance(stock, dict):
        symbol = stock.get("symbol", "")
        company = stock.get("company", "")
        sector = stock.get("sector", "")
        
        # Stock entity
        financial_entities.append({
            "id": symbol,
            "type": "Stock",
            "name": symbol,
            "properties": {
                "price": stock.get("price", 0),
                "change": stock.get("change", 0),
                "change_percent": stock.get("change_percent", 0),
                "volume": stock.get("volume", 0),
                "timestamp": stock.get("timestamp", "")
            }
        })
        
        # Company entity
        if company:
            financial_entities.append({
                "id": company,
                "type": "Company",
                "name": company,
                "properties": {}
            })
            
            # Stock-Company relationship
            financial_relationships.append({
                "source": symbol,
                "target": company,
                "type": "ticker_for",
                "properties": {"timestamp": stock.get("timestamp", "")}
            })
        
        # Sector entity
        if sector:
            financial_entities.append({
                "id": sector,
                "type": "Sector",
                "name": sector,
                "properties": {}
            })
            
            # Company-Sector relationship
            if company:
                financial_relationships.append({
                    "source": company,
                    "target": sector,
                    "type": "belongs_to",
                    "properties": {}
                })

# Remove duplicates
seen_entities = set()
unique_entities = []
for entity in financial_entities:
    entity_key = (entity["id"], entity["type"])
    if entity_key not in seen_entities:
        seen_entities.add(entity_key)
        unique_entities.append(entity)

financial_entities = unique_entities

print(f"  - Stocks: {len([e for e in financial_entities if e['type'] == 'Stock'])}")
print(f"  - Companies: {len([e for e in financial_entities if e['type'] == 'Company'])}")
print(f"  - Sectors: {len([e for e in financial_entities if e['type'] == 'Sector'])}")


## 🕸️ Step 5: Build Financial Knowledge Graph

Build a knowledge graph from the extracted financial entities and relationships.


In [None]:
builder = GraphBuilder()
temporal_query = TemporalGraphQuery()
graph_analyzer = GraphAnalyzer()

# Build knowledge graph
financial_kg = builder.build(financial_entities, financial_relationships)

# Analyze graph structure
metrics = graph_analyzer.compute_metrics(financial_kg)
centrality_calculator = CentralityCalculator()
community_detector = CommunityDetector()
connectivity_analyzer = ConnectivityAnalyzer()

# Calculate graph metrics
centrality_result = centrality_calculator.calculate_degree_centrality(financial_kg)
centrality_scores = centrality_result.get('centrality', {})
communities = community_detector.detect_communities(financial_kg)
connectivity = connectivity_analyzer.analyze_connectivity(financial_kg)

print(f"  Entities: {len(financial_kg.get('entities', []))}")
print(f"  Relationships: {len(financial_kg.get('relationships', []))}")
print(f"  Graph density: {metrics.get('density', 0):.3f}")
print(f"  Communities detected: {len(communities)}")
print(f"  Central entities: {len([e for e, score in centrality_scores.items() if score > 0])}")


## 📊 Step 6: Analyze Financial Trends

Analyze financial trends using temporal queries and pattern detection.


In [None]:
# Temporal analysis
start_time = (datetime.now() - timedelta(days=7)).isoformat()
end_time = datetime.now().isoformat()

temporal_results = temporal_query.query_time_range(
    graph=financial_kg,
    query="Find stock price movements",
    start_time=start_time,
    end_time=end_time
)

# Inference engine for financial rules
# # inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()

# Financial analysis rules
inference_engine.add_rule("IF change_percent > 2 AND volume > 40000000 THEN strong_momentum")
inference_engine.add_rule("IF change_percent < -1 AND volume > 50000000 THEN selling_pressure")
inference_engine.add_rule("IF change_percent > 0 AND sector == 'Technology' THEN tech_growth")

# Add facts from stock data
for stock in stock_prices:
    if isinstance(stock, dict):
        inference_engine.add_fact({
            "symbol": stock.get("symbol", ""),
            "change_percent": stock.get("change_percent", 0),
            "volume": stock.get("volume", 0),
            "sector": stock.get("sector", "")
        })

# Generate insights
# # financial_insights = inference_engine.forward_chain()

print(f"  Temporal entities: {len(temporal_results.get('entities', []))}")
print(f"  Financial insights: {len(financial_insights)}")

# Display insights
for insight in financial_insights[:3]:
    print(f"  - {insight}")


## 📤 Step 7: Export and Visualize

Export the financial knowledge graph and generate visualizations.


In [None]:
import tempfile
import os

temp_dir = tempfile.mkdtemp()

json_exporter = JSONExporter()
csv_exporter = CSVExporter()
rdf_exporter = RDFExporter()
report_generator = ReportGenerator()

# Export knowledge graph
json_exporter.export_knowledge_graph(financial_kg, os.path.join(temp_dir, "financial_kg.json"))
csv_exporter.export_entities(financial_entities, os.path.join(temp_dir, "financial_entities.csv"))
rdf_exporter.export_knowledge_graph(financial_kg, os.path.join(temp_dir, "financial_kg.rdf"))

# Generate report
report_data = {
    "summary": f"Financial data integration from MCP server identified {len(financial_insights)} insights",
    "stocks_analyzed": len([e for e in financial_entities if e['type'] == 'Stock']),
    "companies": len([e for e in financial_entities if e['type'] == 'Company']),
    "sectors": len([e for e in financial_entities if e['type'] == 'Sector']),
    "insights": len(financial_insights)
}

report = report_generator.generate_report(report_data, format="markdown")

print(f"  JSON: {os.path.join(temp_dir, 'financial_kg.json')}")
print(f"  CSV: {os.path.join(temp_dir, 'financial_entities.csv')}")
print(f"  RDF: {os.path.join(temp_dir, 'financial_kg.rdf')}")

# Visualize
kg_visualizer = KGVisualizer()
temporal_visualizer = TemporalVisualizer()
analytics_visualizer = AnalyticsVisualizer()

kg_viz = kg_visualizer.visualize_network(financial_kg, output="interactive")
temporal_viz = temporal_visualizer.visualize_timeline(financial_kg, output="interactive")
analytics_viz = analytics_visualizer.visualize_analytics(financial_kg, output="interactive")


# Cleanup: Disconnect from MCP server
mcp_ingestor.disconnect("financial_server")
print("  Disconnected from MCP server")

print(f"📊 Total modules used: 20+")
