# GFQL Validation in Production

Production-ready patterns for GFQL validation in platform engineering and DevOps contexts.

## Target Audience
- Platform Engineers
- DevOps Teams
- Backend Developers
- System Architects

## What You'll Learn
- Plottable integration for validation
- Performance optimization and caching
- Testing and CI/CD integration
- Monitoring and observability
- API endpoint validation

In [None]:
# Production imports
import time
import json
import hashlib
from typing import Dict, List, Any, Optional, Tuple
from functools import lru_cache
import pandas as pd
import numpy as np
import graphistry

from graphistry.compute.validate import (
    validate_syntax,
    validate_schema,
    validate_query,
    extract_schema_from_plottable,
    ValidationIssue
)

print(f"PyGraphistry version: {graphistry.__version__}")
print("Production validation patterns loaded")

## Plottable Integration

Seamlessly validate queries against Plottable objects in production workflows.

In [None]:
# Create production-like dataset
def create_production_data(num_nodes=10000, num_edges=50000):
    """Create realistic production dataset."""
    
    nodes_df = pd.DataFrame({
        'node_id': range(num_nodes),
        'entity_type': np.random.choice(['user', 'device', 'transaction', 'merchant'], num_nodes),
        'risk_score': np.random.uniform(0, 100, num_nodes),
        'created_at': pd.date_range('2024-01-01', periods=num_nodes, freq='1min'),
        'country': np.random.choice(['US', 'UK', 'CA', 'AU', 'JP'], num_nodes),
        'status': np.random.choice(['active', 'inactive', 'suspended'], num_nodes)
    })
    
    edges_df = pd.DataFrame({
        'source': np.random.choice(range(num_nodes), num_edges),
        'target': np.random.choice(range(num_nodes), num_edges),
        'edge_type': np.random.choice(['transacted', 'connected', 'authorized', 'flagged'], num_edges),
        'amount': np.random.uniform(10, 10000, num_edges),
        'timestamp': pd.date_range('2024-01-01', periods=num_edges, freq='30s')
    })
    
    return nodes_df, edges_df

# Create Plottable
nodes_df, edges_df = create_production_data(1000, 5000)
g = graphistry.nodes(nodes_df, 'node_id').edges(edges_df, 'source', 'target')

print(f"Created Plottable with {len(nodes_df)} nodes and {len(edges_df)} edges")
print(f"Node columns: {list(nodes_df.columns)}")
print(f"Edge columns: {list(edges_df.columns)}")

In [None]:
class PlottableValidator:
    """Production validator for Plottable objects."""
    
    def __init__(self, plottable):
        self.plottable = plottable
        self.schema = extract_schema_from_plottable(plottable)
        self._cache = {}
    
    def validate(self, query: List[Dict]) -> Tuple[bool, List[ValidationIssue]]:
        """Validate query against Plottable schema."""
        
        # Check cache
        query_hash = self._hash_query(query)
        if query_hash in self._cache:
            return self._cache[query_hash]
        
        # Validate
        issues = validate_query(
            query,
            nodes_df=self.plottable._nodes,
            edges_df=self.plottable._edges
        )
        
        result = (len(issues) == 0, issues)
        self._cache[query_hash] = result
        return result
    
    def _hash_query(self, query: List[Dict]) -> str:
        """Create hash of query for caching."""
        query_str = json.dumps(query, sort_keys=True)
        return hashlib.md5(query_str.encode()).hexdigest()
    
    def get_schema_info(self) -> Dict[str, Any]:
        """Get schema information for documentation."""
        return {
            "node_columns": list(self.schema.node_columns.keys()),
            "edge_columns": list(self.schema.edge_columns.keys()),
            "node_types": {k: str(v) for k, v in self.schema.node_columns.items()},
            "edge_types": {k: str(v) for k, v in self.schema.edge_columns.items()}
        }

# Test PlottableValidator
validator = PlottableValidator(g)

# Valid query
valid_query = [
    {"type": "n", "filter": {"entity_type": {"eq": "user"}}},
    {"type": "e_forward", "filter": {"amount": {"gt": 1000}}},
    {"type": "n", "filter": {"risk_score": {"gte": 80}}}
]

is_valid, issues = validator.validate(valid_query)
print(f"Query valid: {is_valid}")
print(f"Schema info: {json.dumps(validator.get_schema_info(), indent=2)[:300]}...")

## Performance & Caching

Optimize validation performance for high-throughput systems.

In [None]:
class CachedSchemaValidator:
    """High-performance validator with schema caching."""
    
    def __init__(self, cache_size=1000, ttl_seconds=3600):
        self._schema_cache = {}
        self._query_cache = lru_cache(maxsize=cache_size)(self._validate_uncached)
        self.ttl_seconds = ttl_seconds
        self.stats = {
            "cache_hits": 0,
            "cache_misses": 0,
            "total_validations": 0
        }
    
    def extract_and_cache_schema(self, dataset_id: str, nodes_df: pd.DataFrame, 
                                edges_df: pd.DataFrame):
        """Extract and cache schema with TTL."""
        from graphistry.compute.validate import extract_schema_from_dataframes
        
        schema = extract_schema_from_dataframes(nodes_df, edges_df)
        self._schema_cache[dataset_id] = {
            "schema": schema,
            "timestamp": time.time(),
            "node_count": len(nodes_df),
            "edge_count": len(edges_df)
        }
        return schema
    
    def get_cached_schema(self, dataset_id: str) -> Optional[Any]:
        """Get schema from cache if valid."""
        if dataset_id not in self._schema_cache:
            return None
        
        cache_entry = self._schema_cache[dataset_id]
        age = time.time() - cache_entry["timestamp"]
        
        if age > self.ttl_seconds:
            del self._schema_cache[dataset_id]
            return None
        
        return cache_entry["schema"]
    
    def _validate_uncached(self, query_json: str, schema) -> Tuple[bool, str]:
        """Validate query (wrapped for LRU cache)."""
        query = json.loads(query_json)
        issues = validate_schema(query, schema)
        return len(issues) == 0, json.dumps([self._issue_to_dict(i) for i in issues])
    
    def validate_with_cache(self, query: List[Dict], dataset_id: str, 
                           schema=None) -> Tuple[bool, List[Dict]]:
        """Validate with caching."""
        self.stats["total_validations"] += 1
        
        # Get schema
        if schema is None:
            schema = self.get_cached_schema(dataset_id)
            if schema is None:
                raise ValueError(f"No cached schema for dataset {dataset_id}")
        
        # Check query cache
        query_json = json.dumps(query, sort_keys=True)
        
        # Use cached validation
        try:
            is_valid, issues_json = self._query_cache(query_json, schema)
            self.stats["cache_hits"] += 1
        except:
            self.stats["cache_misses"] += 1
            raise
        
        return is_valid, json.loads(issues_json)
    
    def _issue_to_dict(self, issue: ValidationIssue) -> Dict:
        return {
            "level": issue.level,
            "message": issue.message,
            "operation_index": issue.operation_index,
            "field": issue.field
        }
    
    def get_stats(self) -> Dict[str, Any]:
        """Get cache statistics."""
        hit_rate = (self.stats["cache_hits"] / 
                   max(1, self.stats["total_validations"])) * 100
        
        return {
            **self.stats,
            "hit_rate": f"{hit_rate:.1f}%",
            "cached_schemas": len(self._schema_cache)
        }

# Test caching performance
cached_validator = CachedSchemaValidator()

# Cache schema
schema = cached_validator.extract_and_cache_schema("prod_dataset_1", nodes_df, edges_df)

# Performance test
test_queries = [
    [{"type": "n", "filter": {"entity_type": {"eq": "user"}}}],
    [{"type": "n", "filter": {"risk_score": {"gt": 50}}}],
    [{"type": "n"}, {"type": "e_forward"}, {"type": "n"}]
]

# Run multiple times to test cache
print("Performance test with caching:")
for round_num in range(3):
    start = time.time()
    
    for _ in range(100):
        for query in test_queries:
            cached_validator.validate_with_cache(query, "prod_dataset_1", schema)
    
    elapsed = time.time() - start
    print(f"Round {round_num + 1}: {elapsed:.3f}s for 300 validations")

print(f"\nCache stats: {json.dumps(cached_validator.get_stats(), indent=2)}")

In [None]:
# Batch validation for efficiency
def batch_validate_queries(queries: List[List[Dict]], plottable) -> Dict[str, Any]:
    """Validate multiple queries efficiently."""
    
    start_time = time.time()
    schema = extract_schema_from_plottable(plottable)
    
    results = []
    error_count = 0
    warning_count = 0
    
    for i, query in enumerate(queries):
        issues = validate_query(
            query,
            nodes_df=plottable._nodes,
            edges_df=plottable._edges
        )
        
        errors = [iss for iss in issues if iss.level == "error"]
        warnings = [iss for iss in issues if iss.level == "warning"]
        
        error_count += len(errors)
        warning_count += len(warnings)
        
        results.append({
            "query_index": i,
            "valid": len(errors) == 0,
            "errors": len(errors),
            "warnings": len(warnings)
        })
    
    elapsed = time.time() - start_time
    
    return {
        "total_queries": len(queries),
        "valid_queries": sum(1 for r in results if r["valid"]),
        "total_errors": error_count,
        "total_warnings": warning_count,
        "elapsed_seconds": elapsed,
        "queries_per_second": len(queries) / elapsed,
        "results": results
    }

# Test batch validation
batch_queries = [
    [{"type": "n", "filter": {"entity_type": {"eq": "user"}}}],
    [{"type": "n", "filter": {"invalid_col": {"eq": "value"}}}],  # Invalid
    [{"type": "n"}, {"type": "e_forward", "hops": 2}, {"type": "n"}],
    [{"type": "invalid_op"}],  # Invalid
] * 25  # 100 queries total

batch_results = batch_validate_queries(batch_queries, g)
print(f"Batch validation results:")
print(json.dumps({k: v for k, v in batch_results.items() if k != "results"}, indent=2))

## Testing Patterns

Unit and integration testing strategies for GFQL validation.

In [None]:
# Example pytest fixtures and tests
example_test_code = '''
import pytest
import pandas as pd
from graphistry.compute.validate import validate_query, extract_schema_from_dataframes

@pytest.fixture
def sample_data():
    """Fixture providing sample graph data."""
    nodes = pd.DataFrame({
        'id': [1, 2, 3],
        'type': ['A', 'B', 'A'],
        'value': [10, 20, 30]
    })
    
    edges = pd.DataFrame({
        'src': [1, 2],
        'dst': [2, 3],
        'weight': [1.0, 2.0]
    })
    
    return nodes, edges

@pytest.fixture
def schema(sample_data):
    """Fixture providing schema."""
    nodes, edges = sample_data
    return extract_schema_from_dataframes(nodes, edges)

class TestGFQLValidation:
    """Test suite for GFQL validation."""
    
    def test_valid_query(self, sample_data):
        """Test validation of valid query."""
        nodes, edges = sample_data
        query = [
            {"type": "n", "filter": {"type": {"eq": "A"}}}
        ]
        
        issues = validate_query(query, nodes, edges)
        assert len(issues) == 0
    
    def test_invalid_column(self, sample_data):
        """Test detection of invalid column."""
        nodes, edges = sample_data
        query = [
            {"type": "n", "filter": {"invalid": {"eq": "X"}}}
        ]
        
        issues = validate_query(query, nodes, edges)
        assert len(issues) > 0
        assert any("not found" in issue.message for issue in issues)
    
    def test_performance(self, sample_data):
        """Test validation performance."""
        import time
        nodes, edges = sample_data
        query = [{"type": "n"}, {"type": "e_forward"}, {"type": "n"}]
        
        start = time.time()
        for _ in range(100):
            validate_query(query, nodes, edges)
        elapsed = time.time() - start
        
        # Should validate 100 queries in under 1 second
        assert elapsed < 1.0
'''

print("Example pytest test suite:")
print(example_test_code)

# Demonstrate test data generation
def generate_test_cases() -> List[Dict[str, Any]]:
    """Generate test cases for validation."""
    return [
        {
            "name": "valid_simple_query",
            "query": [{"type": "n"}],
            "expected_valid": True,
            "expected_errors": 0
        },
        {
            "name": "invalid_operation_type",
            "query": [{"type": "nodes"}],
            "expected_valid": False,
            "expected_errors": 1
        },
        {
            "name": "orphaned_edge",
            "query": [{"type": "e_forward"}],
            "expected_valid": True,  # Valid syntax but has warning
            "expected_warnings": 1
        }
    ]

test_cases = generate_test_cases()
print(f"\nGenerated {len(test_cases)} test cases")
for tc in test_cases:
    print(f"  - {tc['name']}: expects valid={tc['expected_valid']}")

## CI/CD Integration

Integrate GFQL validation into continuous integration pipelines.

In [None]:
# GitHub Actions workflow example
github_actions_yaml = '''
name: GFQL Query Validation

on:
  pull_request:
    paths:
      - 'queries/**/*.json'
      - 'src/**/*.py'

jobs:
  validate-queries:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install graphistry[ai]
        pip install pytest
    
    - name: Validate GFQL queries
      run: |
        python scripts/validate_queries.py queries/
    
    - name: Run validation tests
      run: |
        pytest tests/test_gfql_validation.py -v
    
    - name: Upload validation report
      if: failure()
      uses: actions/upload-artifact@v3
      with:
        name: validation-errors
        path: validation_report.json
'''

print("GitHub Actions workflow:")
print(github_actions_yaml)

# Validation script for CI
validation_script = '''
#!/usr/bin/env python
"""Validate GFQL queries in CI/CD pipeline."""

import sys
import json
import glob
from pathlib import Path
from graphistry.compute.validate import validate_syntax

def validate_query_files(directory):
    """Validate all query files in directory."""
    
    query_files = glob.glob(f"{directory}/**/*.json", recursive=True)
    results = {"total": 0, "passed": 0, "failed": 0, "errors": []}
    
    for file_path in query_files:
        results["total"] += 1
        
        try:
            with open(file_path) as f:
                query = json.load(f)
            
            issues = validate_syntax(query)
            
            if not any(i.level == "error" for i in issues):
                results["passed"] += 1
            else:
                results["failed"] += 1
                results["errors"].append({
                    "file": file_path,
                    "issues": [{
                        "level": i.level,
                        "message": i.message
                    } for i in issues]
                })
        
        except Exception as e:
            results["failed"] += 1
            results["errors"].append({
                "file": file_path,
                "error": str(e)
            })
    
    # Write report
    with open("validation_report.json", "w") as f:
        json.dump(results, f, indent=2)
    
    # Exit with error if any queries failed
    if results["failed"] > 0:
        print(f"❌ {results['failed']} queries failed validation")
        sys.exit(1)
    else:
        print(f"✅ All {results['total']} queries passed validation")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: validate_queries.py <directory>")
        sys.exit(1)
    
    validate_query_files(sys.argv[1])
'''

print("\nValidation script for CI:")
print(validation_script[:800] + "\n...")

In [None]:
# Pre-commit hook example
pre_commit_config = '''
# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: validate-gfql
        name: Validate GFQL Queries
        entry: python scripts/validate_gfql_hook.py
        language: system
        files: '\\.(json|py)$'
        pass_filenames: true
'''

# Pre-commit hook script
def create_pre_commit_hook():
    """Create pre-commit hook for GFQL validation."""
    
    hook_code = '''
#!/usr/bin/env python
"""Pre-commit hook for GFQL validation."""

import sys
import json
import re
from graphistry.compute.validate import validate_syntax

def extract_gfql_from_python(content):
    """Extract GFQL queries from Python code."""
    # Simple pattern matching for demonstration
    pattern = r'query\s*=\s*(\[.*?\])'
    matches = re.findall(pattern, content, re.DOTALL)
    
    queries = []
    for match in matches:
        try:
            query = eval(match)  # Unsafe in production!
            queries.append(query)
        except:
            pass
    
    return queries

def validate_file(filepath):
    """Validate GFQL in a file."""
    
    if filepath.endswith('.json'):
        with open(filepath) as f:
            query = json.load(f)
        queries = [query]
    
    elif filepath.endswith('.py'):
        with open(filepath) as f:
            content = f.read()
        queries = extract_gfql_from_python(content)
    
    else:
        return True
    
    # Validate all queries
    for query in queries:
        issues = validate_syntax(query)
        errors = [i for i in issues if i.level == "error"]
        
        if errors:
            print(f"\n❌ GFQL validation failed in {filepath}:")
            for error in errors:
                print(f"  - {error.message}")
            return False
    
    return True

if __name__ == "__main__":
    failed_files = []
    
    for filepath in sys.argv[1:]:
        if not validate_file(filepath):
            failed_files.append(filepath)
    
    if failed_files:
        print(f"\n{len(failed_files)} file(s) have GFQL validation errors")
        sys.exit(1)
'''
    
    return hook_code

print("Pre-commit configuration:")
print(pre_commit_config)
print("\nPre-commit hook script preview:")
print(create_pre_commit_hook()[:600] + "\n...")

## Monitoring & Logging

Production monitoring patterns for GFQL validation.

In [None]:
import logging
from datetime import datetime

class ValidationMonitor:
    """Monitor GFQL validation in production."""
    
    def __init__(self, logger=None):
        self.logger = logger or logging.getLogger(__name__)
        self.metrics = {
            "total_validations": 0,
            "validation_errors": 0,
            "validation_warnings": 0,
            "validation_time_ms": [],
            "error_types": {},
            "query_patterns": {}
        }
    
    def log_validation(self, query: List[Dict], issues: List[ValidationIssue], 
                      elapsed_ms: float, context: Dict[str, Any] = None):
        """Log validation event with metrics."""
        
        self.metrics["total_validations"] += 1
        self.metrics["validation_time_ms"].append(elapsed_ms)
        
        # Count errors and warnings
        errors = [i for i in issues if i.level == "error"]
        warnings = [i for i in issues if i.level == "warning"]
        
        if errors:
            self.metrics["validation_errors"] += 1
        if warnings:
            self.metrics["validation_warnings"] += 1
        
        # Track error types
        for issue in issues:
            error_type = self._categorize_error(issue)
            self.metrics["error_types"][error_type] = \
                self.metrics["error_types"].get(error_type, 0) + 1
        
        # Track query patterns
        pattern = self._extract_pattern(query)
        self.metrics["query_patterns"][pattern] = \
            self.metrics["query_patterns"].get(pattern, 0) + 1
        
        # Log event
        log_data = {
            "timestamp": datetime.utcnow().isoformat(),
            "validation_time_ms": elapsed_ms,
            "errors": len(errors),
            "warnings": len(warnings),
            "query_operations": len(query),
            "context": context or {}
        }
        
        if errors:
            self.logger.error(f"GFQL validation failed", extra=log_data)
        elif warnings:
            self.logger.warning(f"GFQL validation warnings", extra=log_data)
        else:
            self.logger.info(f"GFQL validation passed", extra=log_data)
    
    def _categorize_error(self, issue: ValidationIssue) -> str:
        """Categorize error for metrics."""
        if "Invalid operation type" in issue.message:
            return "invalid_operation"
        elif "Column" in issue.message and "not found" in issue.message:
            return "column_not_found"
        elif "Invalid filter" in issue.message:
            return "invalid_filter"
        elif "Invalid predicate" in issue.message:
            return "invalid_predicate"
        else:
            return "other"
    
    def _extract_pattern(self, query: List[Dict]) -> str:
        """Extract query pattern for tracking."""
        pattern_parts = []
        for op in query:
            op_type = op.get("type", "unknown")
            has_filter = "filter" in op
            pattern_parts.append(f"{op_type}{'[F]' if has_filter else ''}")
        return "-".join(pattern_parts)
    
    def get_metrics_summary(self) -> Dict[str, Any]:
        """Get metrics summary."""
        avg_time = sum(self.metrics["validation_time_ms"]) / \
                  max(1, len(self.metrics["validation_time_ms"]))
        
        return {
            "total_validations": self.metrics["total_validations"],
            "error_rate": self.metrics["validation_errors"] / 
                         max(1, self.metrics["total_validations"]),
            "warning_rate": self.metrics["validation_warnings"] / 
                           max(1, self.metrics["total_validations"]),
            "avg_validation_time_ms": avg_time,
            "top_error_types": sorted(
                self.metrics["error_types"].items(),
                key=lambda x: x[1],
                reverse=True
            )[:5],
            "top_query_patterns": sorted(
                self.metrics["query_patterns"].items(),
                key=lambda x: x[1],
                reverse=True
            )[:5]
        }

# Test monitoring
monitor = ValidationMonitor()

# Simulate production validations
test_scenarios = [
    ([{"type": "n"}, {"type": "e_forward"}, {"type": "n"}], []),  # Valid
    ([{"type": "node"}], [ValidationIssue("error", "Invalid operation type")]),  # Error
    ([{"type": "n", "filter": {"missing": {"eq": 1}}}], 
     [ValidationIssue("error", "Column 'missing' not found")]),  # Schema error
] * 10

for query, issues in test_scenarios:
    start = time.time()
    # Simulate validation time
    time.sleep(0.001)
    elapsed_ms = (time.time() - start) * 1000
    
    monitor.log_validation(query, issues, elapsed_ms, 
                          context={"user_id": "test123", "api_version": "v1"})

print("Monitoring Metrics Summary:")
print(json.dumps(monitor.get_metrics_summary(), indent=2))

## API Integration

REST API endpoint validation patterns.

In [None]:
# Flask API example
flask_api_code = '''
from flask import Flask, request, jsonify
from graphistry.compute.validate import validate_syntax, validate_query
import pandas as pd

app = Flask(__name__)

# Cache for schemas
schema_cache = {}

@app.route('/api/v1/validate', methods=['POST'])
def validate_gfql():
    """Validate GFQL query endpoint."""
    
    try:
        data = request.get_json()
        
        # Extract query and optional dataset ID
        query = data.get('query')
        dataset_id = data.get('dataset_id')
        validate_schema = data.get('validate_schema', False)
        
        if not query:
            return jsonify({
                'error': 'Missing query parameter'
            }), 400
        
        # Syntax validation
        syntax_issues = validate_syntax(query)
        
        # Schema validation if requested
        schema_issues = []
        if validate_schema and dataset_id:
            schema = schema_cache.get(dataset_id)
            if schema:
                schema_issues = validate_schema(query, schema)
        
        # Combine issues
        all_issues = syntax_issues + schema_issues
        
        # Format response
        response = {
            'valid': not any(i.level == 'error' for i in all_issues),
            'issues': [{
                'level': issue.level,
                'message': issue.message,
                'operation_index': issue.operation_index,
                'field': issue.field,
                'suggestion': issue.suggestion
            } for issue in all_issues],
            'metadata': {
                'query_operations': len(query),
                'syntax_checked': True,
                'schema_checked': validate_schema and dataset_id is not None
            }
        }
        
        return jsonify(response), 200
    
    except Exception as e:
        return jsonify({
            'error': f'Validation error: {str(e)}'
        }), 500

@app.route('/api/v1/schema/<dataset_id>', methods=['PUT'])
def update_schema(dataset_id):
    """Update cached schema for dataset."""
    
    try:
        data = request.get_json()
        
        # Extract node and edge schemas
        node_columns = data.get('node_columns', {})
        edge_columns = data.get('edge_columns', {})
        
        # Create and cache schema
        from graphistry.compute.validate import Schema
        schema = Schema(node_columns=node_columns, edge_columns=edge_columns)
        schema_cache[dataset_id] = schema
        
        return jsonify({
            'message': f'Schema updated for dataset {dataset_id}',
            'node_columns': list(node_columns.keys()),
            'edge_columns': list(edge_columns.keys())
        }), 200
    
    except Exception as e:
        return jsonify({
            'error': f'Schema update error: {str(e)}'
        }), 500

if __name__ == '__main__':
    app.run(debug=False, port=5000)
'''

print("Flask API Example:")
print(flask_api_code[:1500] + "\n...")

# Example API client
def create_api_client_example():
    """Create example API client code."""
    
    return '''
import requests
import json

class GFQLValidationClient:
    """Client for GFQL validation API."""
    
    def __init__(self, base_url):
        self.base_url = base_url
    
    def validate_query(self, query, dataset_id=None, validate_schema=False):
        """Validate GFQL query via API."""
        
        response = requests.post(
            f"{self.base_url}/api/v1/validate",
            json={
                "query": query,
                "dataset_id": dataset_id,
                "validate_schema": validate_schema
            }
        )
        
        response.raise_for_status()
        return response.json()
    
    def update_schema(self, dataset_id, node_columns, edge_columns):
        """Update schema for dataset."""
        
        response = requests.put(
            f"{self.base_url}/api/v1/schema/{dataset_id}",
            json={
                "node_columns": node_columns,
                "edge_columns": edge_columns
            }
        )
        
        response.raise_for_status()
        return response.json()

# Usage example
client = GFQLValidationClient("http://localhost:5000")

# Validate query
result = client.validate_query(
    query=[{"type": "n"}, {"type": "e_forward"}, {"type": "n"}],
    dataset_id="prod_graph",
    validate_schema=True
)

if result["valid"]:
    print("✅ Query is valid")
else:
    print("❌ Query has issues:")
    for issue in result["issues"]:
        print(f"  - {issue['level']}: {issue['message']}")
'''

print("\nAPI Client Example:")
print(create_api_client_example())

## Security Considerations

Security best practices for production GFQL validation.

In [None]:
class SecureValidator:
    """Secure GFQL validator with rate limiting and sanitization."""
    
    def __init__(self, max_query_size=1000, max_operations=50, 
                 rate_limit_per_minute=100):
        self.max_query_size = max_query_size
        self.max_operations = max_operations
        self.rate_limit_per_minute = rate_limit_per_minute
        self._request_times = {}
    
    def validate_secure(self, query: List[Dict], user_id: str) -> Dict[str, Any]:
        """Validate with security checks."""
        
        # Check rate limit
        if not self._check_rate_limit(user_id):
            return {
                "error": "Rate limit exceeded",
                "retry_after_seconds": 60
            }
        
        # Check query size
        query_str = json.dumps(query)
        if len(query_str) > self.max_query_size:
            return {
                "error": f"Query too large (max {self.max_query_size} chars)"
            }
        
        # Check operation count
        if len(query) > self.max_operations:
            return {
                "error": f"Too many operations (max {self.max_operations})"
            }
        
        # Sanitize query
        sanitized_query = self._sanitize_query(query)
        
        # Validate
        try:
            issues = validate_syntax(sanitized_query)
            return {
                "valid": not any(i.level == "error" for i in issues),
                "issues": [{"level": i.level, "message": i.message} 
                          for i in issues]
            }
        except Exception as e:
            # Don't expose internal errors
            return {"error": "Validation failed"}
    
    def _check_rate_limit(self, user_id: str) -> bool:
        """Check if user is within rate limit."""
        current_time = time.time()
        
        if user_id not in self._request_times:
            self._request_times[user_id] = []
        
        # Remove old requests
        self._request_times[user_id] = [
            t for t in self._request_times[user_id] 
            if current_time - t < 60
        ]
        
        # Check limit
        if len(self._request_times[user_id]) >= self.rate_limit_per_minute:
            return False
        
        self._request_times[user_id].append(current_time)
        return True
    
    def _sanitize_query(self, query: List[Dict]) -> List[Dict]:
        """Sanitize query to prevent injection."""
        import copy
        sanitized = copy.deepcopy(query)
        
        # Remove any potentially dangerous fields
        dangerous_keys = ['__proto__', 'constructor', 'prototype']
        
        def clean_dict(d):
            if isinstance(d, dict):
                return {k: clean_dict(v) for k, v in d.items() 
                       if k not in dangerous_keys}
            elif isinstance(d, list):
                return [clean_dict(item) for item in d]
            else:
                return d
        
        return clean_dict(sanitized)

# Test secure validation
secure_validator = SecureValidator()

# Test rate limiting
user_id = "test_user_123"
for i in range(5):
    result = secure_validator.validate_secure(
        [{"type": "n"}], 
        user_id
    )
    print(f"Request {i+1}: {'Valid' if result.get('valid') else 'Error'}")

# Test query size limit
large_query = [{"type": "n", "filter": {"x" * 100: {"eq": "y" * 100}}} for _ in range(20)]
result = secure_validator.validate_secure(large_query, "user2")
print(f"\nLarge query result: {result}")

## Summary & Best Practices

### Production Checklist
- ✅ **Plottable Integration**: Use `extract_schema_from_plottable()` for seamless validation
- ✅ **Caching**: Implement schema and query result caching
- ✅ **Batch Processing**: Validate multiple queries efficiently
- ✅ **Testing**: Comprehensive test coverage with fixtures
- ✅ **CI/CD**: Automated validation in pipelines
- ✅ **Monitoring**: Track metrics and error patterns
- ✅ **API Design**: RESTful endpoints with proper error handling
- ✅ **Security**: Rate limiting, size limits, and sanitization

### Performance Guidelines
1. Cache schemas with appropriate TTL
2. Use batch validation for multiple queries
3. Implement connection pooling for API servers
4. Monitor p95 validation times
5. Set reasonable query size limits

### Monitoring Metrics
- Validation success/failure rates
- Average validation time
- Common error patterns
- Cache hit rates
- API response times

### Next Steps
1. Implement production validation service
2. Set up monitoring dashboards
3. Create runbooks for common issues
4. Establish SLOs for validation performance
5. Build automated alerting

### Resources
- [GFQL Documentation](https://docs.graphistry.com/gfql/)
- [PyGraphistry API Reference](https://docs.graphistry.com/api/)
- [Production Deployment Guide](https://docs.graphistry.com/deployment/)