# GFQL Validation Fundamentals

Learn the basics of validating GFQL queries to catch errors early and build robust graph applications.

## What You'll Learn
- How to validate GFQL query syntax
- Understanding validation error messages
- Basic schema validation with DataFrames
- Common syntax errors and how to fix them

## Prerequisites
- Basic Python knowledge
- PyGraphistry installed (`pip install graphistry[ai]`)

## Setup and Imports

First, let's import the necessary modules and check our PyGraphistry version.

In [None]:
# Core imports
import pandas as pd
import graphistry

# Validation imports
from graphistry.compute.gfql.validate import (
    validate_syntax,
    validate_schema,
    validate_query,
    extract_schema_from_dataframes
)

# Check version
print(f"PyGraphistry version: {graphistry.__version__}")
print("\nValidation functions available:")
print("- validate_syntax(): Check query syntax")
print("- validate_schema(): Check query against data schema")
print("- validate_query(): Combined syntax + schema validation")

## Basic Syntax Validation

GFQL queries must follow specific syntax rules. Let's start with validating query syntax.

In [None]:
# Example 1: Valid query syntax
valid_query = [
    {"type": "n"},
    {"type": "e_forward", "hops": 1},
    {"type": "n", "filter": {"type": {"eq": "customer"}}}
]

# Validate syntax
issues = validate_syntax(valid_query)

print("Query:", valid_query)
print(f"\nValidation issues: {len(issues)}")
if not issues:
    print("[OK] Query syntax is valid!")
else:
    for issue in issues:
        print(f"- {issue.level}: {issue.message}")

## Common Syntax Errors

Let's look at common syntax errors and how validation catches them.

In [None]:
# Example 2: Invalid operation type
invalid_query_1 = [
    {"type": "node"},  # Should be "n"
    {"type": "e_forward"}
]

issues = validate_syntax(invalid_query_1)
print("Query with invalid operation type:")
print(invalid_query_1)
print(f"\nValidation found {len(issues)} issue(s):")
for issue in issues:
    print(f"- {issue.level}: {issue.message}")
    if issue.operation_index is not None:
        print(f"  At operation {issue.operation_index}: {invalid_query_1[issue.operation_index]}")

In [None]:
# Example 3: Invalid filter structure
invalid_query_2 = [
    {"type": "n", "filter": {"name": "Alice"}}  # Missing operator
]

issues = validate_syntax(invalid_query_2)
print("Query with invalid filter:")
print(invalid_query_2)
print(f"\nValidation found {len(issues)} issue(s):")
for issue in issues:
    print(f"- {issue.level}: {issue.message}")

In [None]:
# Example 4: Semantic warning - orphaned edges
warning_query = [
    {"type": "e_forward", "hops": 1}  # Edge without starting node
]

issues = validate_syntax(warning_query)
print("Query with semantic warning:")
print(warning_query)
print(f"\nValidation found {len(issues)} issue(s):")
for issue in issues:
    print(f"- {issue.level.upper()}: {issue.message}")

## Understanding Validation Issues

Validation issues have different levels and provide helpful information.

In [None]:
# Let's examine a validation issue in detail
from graphistry.compute.gfql.validate import ValidationIssue

# Create a query with multiple issues
complex_invalid = [
    {"type": "n"},
    {"type": "edge"},  # Invalid type
    {"type": "n", "filter": {"score": {"greater": 5}}}  # Invalid operator
]

issues = validate_syntax(complex_invalid)
print(f"Found {len(issues)} validation issues:\n")

for i, issue in enumerate(issues):
    print(f"Issue {i+1}:")
    print(f"  Level: {issue.level}")
    print(f"  Message: {issue.message}")
    print(f"  Operation index: {issue.operation_index}")
    print(f"  Field: {issue.field}")
    if issue.suggestion:
        print(f"  Suggestion: {issue.suggestion}")
    print()

## Simple Schema Validation

Now let's validate queries against actual data schemas.

In [None]:
# Create sample data
nodes_df = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'type': ['customer', 'customer', 'product', 'product', 'customer'],
    'score': [100, 85, 95, 120, 110]
})

edges_df = pd.DataFrame({
    'src': [1, 2, 3, 4, 5],
    'dst': [3, 4, 1, 2, 3],
    'weight': [1.0, 2.5, 0.8, 1.2, 3.0],
    'timestamp': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'])
})

print("Nodes DataFrame:")
print(nodes_df)
print("\nEdges DataFrame:")
print(edges_df)

In [None]:
# Extract schema from DataFrames
schema = extract_schema_from_dataframes(nodes_df, edges_df)

print("Extracted Schema:")
print(f"\nNode columns: {list(schema.node_columns.keys())}")
print(f"Edge columns: {list(schema.edge_columns.keys())}")

# Show column types
print("\nNode column types:")
for col, dtype in schema.node_columns.items():
    print(f"  {col}: {dtype}")

In [None]:
# Valid query using existing columns
schema_valid_query = [
    {"type": "n", "filter": {"type": {"eq": "customer"}}},
    {"type": "e_forward"},
    {"type": "n", "filter": {"score": {"gte": 100}}}
]

# Validate against schema
issues = validate_schema(schema_valid_query, schema)

print("Query using valid columns:")
print(schema_valid_query)
print(f"\nSchema validation issues: {len(issues)}")
if not issues:
    print("[OK] Query is valid for this schema!")

## Column Not Found Errors

The most common schema error is referencing non-existent columns.

In [None]:
# Query with non-existent column
invalid_column_query = [
    {"type": "n", "filter": {"category": {"eq": "VIP"}}}  # 'category' doesn't exist
]

issues = validate_schema(invalid_column_query, schema)

print("Query with non-existent column:")
print(invalid_column_query)
print(f"\nValidation found {len(issues)} issue(s):")
for issue in issues:
    print(f"\n- {issue.level}: {issue.message}")
    if issue.suggestion:
        print(f"  Suggestion: {issue.suggestion}")

## Type Mismatch Errors

Validation also catches when you use the wrong predicate type for a column.

In [None]:
# String predicate on numeric column
type_mismatch_query = [
    {"type": "n", "filter": {"score": {"contains": "100"}}}  # 'contains' is for strings
]

issues = validate_schema(type_mismatch_query, schema)

print("Query with type mismatch:")
print(type_mismatch_query)
print(f"\nValidation found {len(issues)} issue(s):")
for issue in issues:
    print(f"\n- {issue.level}: {issue.message}")
    if issue.suggestion:
        print(f"  Suggestion: {issue.suggestion}")

## Complete Example: Building a Query Step by Step

Let's build a query incrementally, validating at each step.

In [None]:
# Step 1: Start with finding customers
query_v1 = [
    {"type": "n", "filter": {"type": {"eq": "customer"}}}
]

issues = validate_query(query_v1, nodes_df=nodes_df, edges_df=edges_df)
print("Step 1 - Find customers:")
print(f"Issues: {len(issues)}")
print("[OK] Valid!" if not issues else "[X] Has issues")

# Step 2: Add edge traversal
query_v2 = [
    {"type": "n", "filter": {"type": {"eq": "customer"}}},
    {"type": "e_forward", "hops": 1}
]

issues = validate_query(query_v2, nodes_df=nodes_df, edges_df=edges_df)
print("\nStep 2 - Add edge traversal:")
print(f"Issues: {len(issues)}")
print("[OK] Valid!" if not issues else "[X] Has issues")

# Step 3: Complete with destination filter
query_v3 = [
    {"type": "n", "filter": {"type": {"eq": "customer"}}},
    {"type": "e_forward", "hops": 1},
    {"type": "n", "filter": {"type": {"eq": "product"}}}
]

issues = validate_query(query_v3, nodes_df=nodes_df, edges_df=edges_df)
print("\nStep 3 - Add destination filter:")
print(f"Issues: {len(issues)}")
print("[OK] Valid!" if not issues else "[X] Has issues")

print("\nFinal query finds: Customers connected to products")

## Quick Reference

### Error Levels
- **error**: Query will fail if executed
- **warning**: Query may work but has potential issues

### Common Fixes
1. **Invalid operation type**: Use `n`, `e_forward`, `e_reverse`, or `e`
2. **Missing operator**: Add comparison operator like `eq`, `gte`, `contains`
3. **Column not found**: Check available columns with `schema.node_columns`
4. **Type mismatch**: Use numeric operators for numbers, string operators for text

## Summary & Next Steps

You've learned the fundamentals of GFQL validation:
- [OK] Syntax validation catches structural errors
- [OK] Schema validation ensures columns exist and types match
- [OK] Combined validation provides comprehensive checking
- [OK] Clear error messages help fix issues quickly

### Next Steps
1. **Advanced Patterns**: Learn complex queries and multi-hop validation
2. **LLM Integration**: Use validation for AI-generated queries
3. **Production Use**: Implement validation in your applications

### Resources
- [GFQL Documentation](https://docs.graphistry.com/gfql/)
- [GFQL Language Specification](https://docs.graphistry.com/gfql/spec/language/)