# 3.3 GraphRAG Retrieval Patterns - Ungraph

Este notebook cubre los patrones de b√∫squeda GraphRAG b√°sicos: c√≥mo buscar informaci√≥n en el grafo usando diferentes estrategias que aprovechan la estructura del grafo.

## Objetivos

1. **Entender patrones GraphRAG** - Qu√© son y cu√°ndo usarlos
2. **Basic Retriever** - B√∫squeda full-text simple
3. **Metadata Filtering** - Filtrar por metadatos de documentos
4. **Parent-Child Retriever** - B√∫squeda jer√°rquica con contexto
5. **Comparar patrones** - Cu√°ndo usar cada uno

## Patrones Cubiertos

1. ‚úÖ **Basic Retriever** - B√∫squeda full-text simple y r√°pida
2. ‚úÖ **Metadata Filtering** - B√∫squeda con filtros por metadatos
3. ‚úÖ **Parent-Child Retriever** - B√∫squeda que incluye contexto jer√°rquico

**Nota:** Los patrones avanzados (Community Summary, Graph-Enhanced Vector Search) est√°n cubiertos en el notebook `3.4 Advanced GraphRAG Patterns`.

**Referencias:**
- [GraphRAG Pattern Catalog](https://graphrag.com/reference/)
- [API de B√∫squeda](../../docs/api/search-patterns.md)


In [26]:
def add_src_to_path(path_folder: str):
    ''' 
    Helper function for adding the "path_folder" directory to the path.
    in order to work on notebooks and scripts
    '''
    import sys
    from pathlib import Path

    base_path = Path().resolve()
    for parent in [base_path] + list(base_path.parents):
        candidate = parent / path_folder
        if candidate.exists():
            parent_dir = candidate.parent
            if str(parent_dir) not in sys.path:
                sys.path.insert(0, str(parent_dir))
                print(f"Path Folder parent added: {parent_dir}")
            if str(candidate) not in sys.path:
                sys.path.append(str(candidate))
                print(f"Path Folder {path_folder} added: {candidate}")
            return
    print(f"Not found '{path_folder}' folder on the hierarchy of directories")

# Agregar carpetas necesarias al path
add_src_to_path(path_folder="src")
add_src_to_path(path_folder="src/utils")
add_src_to_path(path_folder="src/data")


In [27]:
# Importar librer√≠as necesarias
import sys
from pathlib import Path
from typing import List, Dict, Any

# Importar handlers
from src.utils.handlers import find_in_project

# Importar ungraph
try:
    import ungraph
    print("‚úÖ Ungraph importado como paquete instalado")
except ImportError:
    import src
    ungraph = src
    print("‚úÖ Ungraph importado desde src/ (modo desarrollo)")

# Importar GraphRAGSearchPatterns para usar directamente
from infrastructure.services.graphrag_search_patterns import GraphRAGSearchPatterns

print(f"üì¶ Ungraph version: {ungraph.__version__}")
print(f"‚úÖ GraphRAGSearchPatterns importado correctamente")


‚úÖ Ungraph importado desde src/ (modo desarrollo)
üì¶ Ungraph version: 0.1.0
‚úÖ GraphRAGSearchPatterns importado correctamente


## Parte 1: Introducci√≥n a GraphRAG Retrieval Patterns

Los patrones GraphRAG aprovechan la estructura del grafo para mejorar las b√∫squedas. A diferencia de la b√∫squeda vectorial pura, estos patrones consideran las relaciones entre nodos para proporcionar contexto m√°s rico.

### ¬øQu√© son los Patrones GraphRAG?

Los patrones GraphRAG son estrategias de b√∫squeda que:
- **Aprovechan la estructura del grafo** - Usan relaciones entre nodos
- **Proporcionan contexto** - Incluyen informaci√≥n relacionada
- **Mejoran la precisi√≥n** - Filtran y expanden resultados seg√∫n la estructura

### Patrones B√°sicos Disponibles

- ‚úÖ **Basic Retriever** - B√∫squeda full-text simple y r√°pida
- ‚úÖ **Metadata Filtering** - Filtra por propiedades de documentos
- ‚úÖ **Parent-Child Retriever** - Incluye contexto jer√°rquico (Page ‚Üí Chunks)

**Nota:** Los patrones avanzados est√°n cubiertos en `3.4 Advanced GraphRAG Patterns`.


In [28]:
# Lista de patrones de b√∫squeda GraphRAG disponibles
print("üìã PATRONES DE B√öSQUEDA GRAPHRAG")
print("=" * 60)

search_patterns = {
    "basic": {
        "nombre": "Basic Retriever",
        "descripci√≥n": "B√∫squeda full-text simple usando √≠ndice de texto completo",
        "cuando_usar": "B√∫squedas por palabras clave, consultas simples",
        "velocidad": "‚ö°‚ö°‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê"
    },
    "pattern_matching": {
        "nombre": "Pattern Matching",
        "descripci√≥n": "Busca usando patrones Cypher espec√≠ficos",
        "cuando_usar": "B√∫squedas con estructura espec√≠fica del grafo",
        "velocidad": "‚ö°‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê‚≠ê"
    },
    "metadata_filtering": {
        "nombre": "Metadata Filtering",
        "descripci√≥n": "B√∫squeda full-text con filtros por metadatos",
        "cuando_usar": "Buscar solo en documentos espec√≠ficos, filtrar por fecha/autor",
        "velocidad": "‚ö°‚ö°‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê‚≠ê"
    },
    "parent_child": {
        "nombre": "Parent-Child Retriever",
        "descripci√≥n": "Busca en nodos padre y expande a todos sus hijos",
        "cuando_usar": "Cuando necesitas contexto completo de una secci√≥n",
        "velocidad": "‚ö°‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê‚≠ê"
    },
    "community": {
        "nombre": "Community Summary (Global)",
        "descripci√≥n": "Encuentra comunidades de nodos relacionados",
        "cuando_usar": "Necesitas contexto amplio sobre un tema",
        "velocidad": "‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê"
    },
    "local": {
        "nombre": "Local Retriever",
        "descripci√≥n": "Similar a Community pero para comunidades peque√±as",
        "cuando_usar": "Exploraci√≥n de conocimiento espec√≠fico y focalizado",
        "velocidad": "‚ö°‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê‚≠ê"
    },
    "graph_enhanced_vector": {
        "nombre": "Graph-Enhanced Vector Search",
        "descripci√≥n": "Combina b√∫squeda vectorial con estructura del grafo",
        "cuando_usar": "B√∫squedas avanzadas que combinan sem√°ntica y estructura",
        "velocidad": "‚ö°",
        "precisi√≥n": "‚≠ê‚≠ê‚≠ê‚≠ê"
    }
}

for pattern_id, info in search_patterns.items():
    print(f"\n{info['nombre']} ({pattern_id}):")
    print(f"  üìù {info['descripci√≥n']}")
    print(f"  üéØ Cu√°ndo usar: {info['cuando_usar']}")
    print(f"  ‚ö° Velocidad: {info['velocidad']} | üéØ Precisi√≥n: {info['precisi√≥n']}")


üìã PATRONES DE B√öSQUEDA GRAPHRAG

Basic Retriever (basic):
  üìù B√∫squeda full-text simple usando √≠ndice de texto completo
  üéØ Cu√°ndo usar: B√∫squedas por palabras clave, consultas simples
  ‚ö° Velocidad: ‚ö°‚ö°‚ö° | üéØ Precisi√≥n: ‚≠ê‚≠ê

Pattern Matching (pattern_matching):
  üìù Busca usando patrones Cypher espec√≠ficos
  üéØ Cu√°ndo usar: B√∫squedas con estructura espec√≠fica del grafo
  ‚ö° Velocidad: ‚ö°‚ö° | üéØ Precisi√≥n: ‚≠ê‚≠ê‚≠ê

Metadata Filtering (metadata_filtering):
  üìù B√∫squeda full-text con filtros por metadatos
  üéØ Cu√°ndo usar: Buscar solo en documentos espec√≠ficos, filtrar por fecha/autor
  ‚ö° Velocidad: ‚ö°‚ö°‚ö° | üéØ Precisi√≥n: ‚≠ê‚≠ê‚≠ê

Parent-Child Retriever (parent_child):
  üìù Busca en nodos padre y expande a todos sus hijos
  üéØ Cu√°ndo usar: Cuando necesitas contexto completo de una secci√≥n
  ‚ö° Velocidad: ‚ö°‚ö° | üéØ Precisi√≥n: ‚≠ê‚≠ê‚≠ê

Community Summary (Global) (community):
  üìù Encuentra comunidades de nodos rel

In [29]:
# Queries Cypher de ejemplo para cada patr√≥n
print("üìù QUERIES CYPHER DE EJEMPLO")
print("=" * 80)

# 1. Basic Retriever
basic_query = """
CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node, score
RETURN node.page_content as content, 
       score,
       node.chunk_id as chunk_id,
       node.chunk_id_consecutive as chunk_id_consecutive
ORDER BY score DESC
LIMIT $limit
"""

print("\n1. BASIC RETRIEVER:")
print("-" * 80)
print(basic_query)
print("\n‚úÖ Usa par√°metros: $query_text, $limit")
print("‚úÖ Sintaxis v√°lida para Neo4j")


üìù QUERIES CYPHER DE EJEMPLO

1. BASIC RETRIEVER:
--------------------------------------------------------------------------------

CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node, score
RETURN node.page_content as content, 
       score,
       node.chunk_id as chunk_id,
       node.chunk_id_consecutive as chunk_id_consecutive
ORDER BY score DESC
LIMIT $limit


‚úÖ Usa par√°metros: $query_text, $limit
‚úÖ Sintaxis v√°lida para Neo4j


In [None]:
# Generar queries usando los patrones implementados (EJECUTABLE)
print("üìù GENERAR QUERIES CYPHER CON PATRONES IMPLEMENTADOS")
print("=" * 80)

# 1. Basic Retriever
print("\n1. BASIC RETRIEVER:")
print("-" * 80)
query_basic, params_basic = GraphRAGSearchPatterns.basic_retriever(
    "test query",
    limit=5
)
print("Query generado:")
print(query_basic)
print(f"\nPar√°metros: {list(params_basic.keys())}")
print("‚úÖ Query generado correctamente con par√°metros seguros")

# 2. Metadata Filtering
print("\n2. METADATA FILTERING:")
print("-" * 80)
query_meta, params_meta = GraphRAGSearchPatterns.metadata_filtering(
    "test query",
    metadata_filters={"filename": "test.md", "page_number": 1},
    limit=5
)
print("Query generado:")
print(query_meta)
print(f"\nPar√°metros: {list(params_meta.keys())}")
print("‚úÖ Query generado correctamente con par√°metros seguros")

# 3. Parent-Child Retriever
print("\n3. PARENT-CHILD RETRIEVER:")
print("-" * 80)
query_parent, params_parent = GraphRAGSearchPatterns.parent_child_retriever(
    "test query",
    parent_label="Page",
    child_label="Chunk",
    relationship_type="HAS_CHUNK",
    limit=5
)
print("Query generado:")
print(query_parent)
print(f"\nPar√°metros: {list(params_parent.keys())}")
print("‚úÖ Query generado correctamente con par√°metros seguros")


üìù GENERAR QUERIES CYPHER CON PATRONES IMPLEMENTADOS

1. BASIC RETRIEVER:
--------------------------------------------------------------------------------
Query generado:

        CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
        YIELD node, score
        RETURN node.page_content as content,
               score,
               node.chunk_id as chunk_id,
               node.chunk_id_consecutive as chunk_id_consecutive
        ORDER BY score DESC
        LIMIT $limit
        

Par√°metros: ['query_text', 'limit']
‚úÖ Query generado correctamente con par√°metros seguros

2. METADATA FILTERING:
--------------------------------------------------------------------------------
Query generado:

        CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
        YIELD node, score
        WHERE node.filename = $filename AND node.page_number = $page_number
        RETURN node.page_content as content,
               score,
               node.chunk_id as chunk_

## Parte 3: Usar Patrones con Datos Reales

Ahora usemos los patrones con datos reales. Primero necesitamos tener documentos ingeridos en Neo4j.


In [31]:
# Probar search_with_pattern con datos reales (EJECUTABLE)
print("üîç PROBANDO search_with_pattern CON DATOS REALES")
print("=" * 80)

# ‚ö†Ô∏è Requiere Neo4j configurado y datos ingeridos
# Este c√≥digo probar√° los 3 patrones implementados si hay datos disponibles

try:
    # 1. Probar Basic Retriever (usando search_with_pattern)
    print("\n1. Probando Basic Retriever...")
    print("-" * 80)
    basic_results = ungraph.search_with_pattern(
        "test",
        pattern_type="basic",
        limit=5
    )
    print(f"   ‚úÖ Basic Retriever: {len(basic_results)} resultados")
    
    if len(basic_results) > 0:
        print(f"\n   Primer resultado:")
        print(f"   - Score: {basic_results[0].score:.3f}")
        print(f"   - Chunk ID: {basic_results[0].chunk_id}")
        print(f"   - Content preview: {basic_results[0].content[:80]}...")
        
        # Obtener filename del primer resultado para usar en filtros
        # Necesitamos obtener el filename desde Neo4j
        from src.utils.graph_operations import graph_session
        driver = graph_session()
        with driver.session() as session:
            result = session.run(
                "MATCH (c:Chunk {chunk_id: $chunk_id})-[:HAS_CHUNK]-(p:Page)-[:CONTAINS]-(f:File) "
                "RETURN f.filename as filename LIMIT 1",
                chunk_id=basic_results[0].chunk_id
            )
            record = result.single()
            if record:
                test_filename = record["filename"]
            else:
                test_filename = None
        driver.close()
        
        if test_filename:
            # 2. Probar Metadata Filtering
            print("\n2. Probando Metadata Filtering...")
            print("-" * 80)
            metadata_results = ungraph.search_with_pattern(
                "test",
                pattern_type="metadata_filtering",
                metadata_filters={"filename": test_filename},
                limit=5
            )
            print(f"   ‚úÖ Metadata Filtering: {len(metadata_results)} resultados")
            print(f"   üìÑ Filtrando por: filename='{test_filename}'")
            
            if len(metadata_results) > 0:
                print(f"\n   Primer resultado:")
                print(f"   - Score: {metadata_results[0].score:.3f}")
                print(f"   - Chunk ID: {metadata_results[0].chunk_id}")
                print(f"   - Content preview: {metadata_results[0].content[:80]}...")
        
        # 3. Probar Parent-Child Retriever
        print("\n3. Probando Parent-Child Retriever...")
        print("-" * 80)
        parent_results = ungraph.search_with_pattern(
            "test",
            pattern_type="parent_child",
            parent_label="Page",
            child_label="Chunk",
            relationship_type="HAS_CHUNK",
            limit=5
        )
        print(f"   ‚úÖ Parent-Child Retriever: {len(parent_results)} resultados")
        
        if len(parent_results) > 0:
            print(f"\n   Primer resultado:")
            print(f"   - Score: {parent_results[0].score:.3f}")
            print(f"   - Chunk ID: {parent_results[0].chunk_id}")
            print(f"   - Content preview: {parent_results[0].content[:80]}...")
            # Parent-child retorna estructura especial con children
            if hasattr(parent_results[0], 'next_chunk_content') and parent_results[0].next_chunk_content:
                print(f"   - Contexto de hijos incluido: {len(parent_results[0].next_chunk_content)} caracteres")
        
        # Resumen comparativo
        print("\n" + "=" * 80)
        print("üìä RESUMEN COMPARATIVO")
        print("=" * 80)
        print(f"Basic Retriever:        {len(basic_results)} resultados")
        if test_filename:
            print(f"Metadata Filtering:     {len(metadata_results)} resultados (filtrado por '{test_filename}')")
        print(f"Parent-Child Retriever: {len(parent_results)} resultados")
        
    else:
        print("\n‚ö†Ô∏è  No se encontraron resultados. Aseg√∫rate de:")
        print("   1. Haber ingerido documentos con ungraph.ingest_document()")
        print("   2. Que los documentos contengan la palabra 'test'")
        print("   3. Que Neo4j est√© configurado correctamente")
    
except Exception as e:
    print(f"\n‚ùå Error: {e}")
    import traceback
    traceback.print_exc()
    print("\nüí° Aseg√∫rate de:")
    print("   1. Tener Neo4j configurado y corriendo")
    print("   2. Haber ingerido documentos primero con ungraph.ingest_document()")
    print("   3. Verificar configuraci√≥n en .env o con ungraph.configure()")
    print("   4. Que el √≠ndice 'chunk_content' exista (se crea autom√°ticamente con ingest_document)")



üîç PROBANDO search_with_pattern CON DATOS REALES

1. Probando Basic Retriever...
--------------------------------------------------------------------------------
   ‚úÖ Basic Retriever: 0 resultados

‚ö†Ô∏è  No se encontraron resultados. Aseg√∫rate de:
   1. Haber ingerido documentos con ungraph.ingest_document()
   2. Que los documentos contengan la palabra 'test'
   3. Que Neo4j est√© configurado correctamente


## Parte 4: Comparar Patrones Implementados

Comparar resultados entre diferentes patrones de b√∫squeda.


In [32]:
# Comparar resultados entre patrones implementados
print("üìä COMPARACI√ìN DE PATRONES IMPLEMENTADOS")
print("=" * 80)

# Este c√≥digo compara los 3 patrones implementados con la misma query
query = "test"  # Cambiar por tu query de prueba

try:
    print(f"\nüîç Query de prueba: '{query}'")
    print("-" * 80)
    
    # 1. Basic Retriever
    print("\n1. Basic Retriever:")
    basic_results = ungraph.search_with_pattern(
        query,
        pattern_type="basic",
        limit=5
    )
    print(f"   Resultados: {len(basic_results)}")
    if basic_results:
        avg_basic = sum(r.score for r in basic_results) / len(basic_results)
        print(f"   Score promedio: {avg_basic:.3f}")
        print(f"   Score m√°ximo: {max(r.score for r in basic_results):.3f}")
        print(f"   Score m√≠nimo: {min(r.score for r in basic_results):.3f}")
    
    # 2. Metadata Filtering (necesitamos un filename)
    # Intentar obtener un filename de los resultados b√°sicos
    test_filename = None
    if basic_results:
        from src.utils.graph_operations import graph_session
        driver = graph_session()
        with driver.session() as session:
            result = session.run(
                "MATCH (c:Chunk {chunk_id: $chunk_id})-[:HAS_CHUNK]-(p:Page)-[:CONTAINS]-(f:File) "
                "RETURN f.filename as filename LIMIT 1",
                chunk_id=basic_results[0].chunk_id
            )
            record = result.single()
            if record:
                test_filename = record["filename"]
        driver.close()
    
    if test_filename:
        print(f"\n2. Metadata Filtering (filtrado por '{test_filename}'):")
        metadata_results = ungraph.search_with_pattern(
            query,
            pattern_type="metadata_filtering",
            metadata_filters={"filename": test_filename},
            limit=5
        )
        print(f"   Resultados: {len(metadata_results)}")
        if metadata_results:
            avg_metadata = sum(r.score for r in metadata_results) / len(metadata_results)
            print(f"   Score promedio: {avg_metadata:.3f}")
            print(f"   Score m√°ximo: {max(r.score for r in metadata_results):.3f}")
            print(f"   Score m√≠nimo: {min(r.score for r in metadata_results):.3f}")
    else:
        print("\n2. Metadata Filtering:")
        print("   ‚ö†Ô∏è  No se pudo obtener filename para filtrar")
    
    # 3. Parent-Child Retriever
    print("\n3. Parent-Child Retriever:")
    parent_results = ungraph.search_with_pattern(
        query,
        pattern_type="parent_child",
        parent_label="Page",
        child_label="Chunk",
        relationship_type="HAS_CHUNK",
        limit=5
    )
    print(f"   Resultados: {len(parent_results)}")
    if parent_results:
        avg_parent = sum(r.score for r in parent_results) / len(parent_results)
        print(f"   Score promedio: {avg_parent:.3f}")
        print(f"   Score m√°ximo: {max(r.score for r in parent_results):.3f}")
        print(f"   Score m√≠nimo: {min(r.score for r in parent_results):.3f}")
    
    # Resumen comparativo
    print("\n" + "=" * 80)
    print("üìä RESUMEN COMPARATIVO")
    print("=" * 80)
    print(f"{'Patr√≥n':<25} {'Resultados':<12} {'Score Promedio':<15}")
    print("-" * 80)
    if basic_results:
        print(f"{'Basic Retriever':<25} {len(basic_results):<12} {avg_basic:.3f}")
    if test_filename and metadata_results:
        print(f"{'Metadata Filtering':<25} {len(metadata_results):<12} {avg_metadata:.3f}")
    if parent_results:
        print(f"{'Parent-Child Retriever':<25} {len(parent_results):<12} {avg_parent:.3f}")
    
    print("\nüí° Interpretaci√≥n:")
    print("   - Basic Retriever: M√°s r√°pido, resultados generales")
    print("   - Metadata Filtering: M√°s preciso cuando filtras por documento espec√≠fico")
    print("   - Parent-Child: Incluye contexto jer√°rquico (Page + Chunks)")
    
except Exception as e:
    print(f"\n‚ùå Error: {e}")
    print("\nüí° Aseg√∫rate de:")
    print("   1. Tener Neo4j configurado y corriendo")
    print("   2. Haber ingerido documentos primero")
    print("   3. Que la query tenga resultados en tu base de datos")


üìä COMPARACI√ìN DE PATRONES IMPLEMENTADOS

üîç Query de prueba: 'test'
--------------------------------------------------------------------------------

1. Basic Retriever:
   Resultados: 0

2. Metadata Filtering:
   ‚ö†Ô∏è  No se pudo obtener filename para filtrar

3. Parent-Child Retriever:


Error in search_with_pattern (parent_child): {code: Neo.ClientError.Statement.SyntaxError} {message: In a WITH/RETURN with DISTINCT or an aggregation, it is not possible to access variables declared before the WITH/RETURN: parent_score (line 16, column 18 (offset: 574))
"        ORDER BY parent_score DESC"
                  ^}
neo4j.exceptions.GqlError: {gql_status: 42N44} {gql_status_description: error: syntax error or access rule violation - inaccessible variable. It is not possible to access the variable `parent_score` declared before the RETURN clause when using `DISTINCT` or an aggregation.} {message: 42N44: It is not possible to access the variable `parent_score` declared before the RETURN clause when using `DISTINCT` or an aggregation.} {diagnostic_record: {'_classification': 'CLIENT_ERROR', '_position': {'offset': 574, 'column': 18, 'line': 16}, 'OPERATION': '', 'OPERATION_CODE': '0', 'CURRENT_SCHEMA': '/'}} {raw_classification: CLIENT_ERROR}

The above exception was the direct


‚ùå Error: {code: Neo.ClientError.Statement.SyntaxError} {message: In a WITH/RETURN with DISTINCT or an aggregation, it is not possible to access variables declared before the WITH/RETURN: parent_score (line 16, column 18 (offset: 574))
"        ORDER BY parent_score DESC"
                  ^}

üí° Aseg√∫rate de:
   1. Tener Neo4j configurado y corriendo
   2. Haber ingerido documentos primero
   3. Que la query tenga resultados en tu base de datos


In [33]:
# 3. Parent-Child Retriever
parent_child_query = """
CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as parent_node, score as parent_score

OPTIONAL MATCH (parent_node:Page)-[:HAS_CHUNK]->(child_node:Chunk)

RETURN {
    parent_content: parent_node.page_content,
    parent_score: parent_score,
    parent_chunk_id: parent_node.chunk_id,
    children: collect(DISTINCT {
        content: child_node.page_content,
        chunk_id: child_node.chunk_id
    })
} as result
ORDER BY parent_score DESC
LIMIT $limit
"""

print("\n3. PARENT-CHILD RETRIEVER:")
print("-" * 80)
print(parent_child_query)
print("\n‚úÖ Expande a nodos hijos relacionados")
print("‚úÖ Retorna estructura jer√°rquica")



3. PARENT-CHILD RETRIEVER:
--------------------------------------------------------------------------------

CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as parent_node, score as parent_score

OPTIONAL MATCH (parent_node:Page)-[:HAS_CHUNK]->(child_node:Chunk)

RETURN {
    parent_content: parent_node.page_content,
    parent_score: parent_score,
    parent_chunk_id: parent_node.chunk_id,
    children: collect(DISTINCT {
        content: child_node.page_content,
        chunk_id: child_node.chunk_id
    })
} as result
ORDER BY parent_score DESC
LIMIT $limit


‚úÖ Expande a nodos hijos relacionados
‚úÖ Retorna estructura jer√°rquica


In [34]:
# 4. Community Summary Retriever
community_query = """
CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as central_node, score

MATCH path = (central_node)-[*1..2]-(community_node:Chunk)
WHERE community_node <> central_node

WITH central_node, score,
     collect(DISTINCT community_node) as community,
     count(DISTINCT community_node) as community_size

WHERE community_size >= $community_threshold

RETURN {
    central_content: central_node.page_content,
    central_score: score,
    central_chunk_id: central_node.chunk_id,
    central_chunk_id_consecutive: central_node.chunk_id_consecutive,
    community_size: community_size,
    community_summary: reduce(
        summary = "",
        node IN community |
        summary + " " + coalesce(node.page_content, "")
    )
} as result
ORDER BY score DESC, community_size DESC
LIMIT $limit
"""

print("\n4. COMMUNITY SUMMARY RETRIEVER:")
print("-" * 80)
print(community_query)
print("\n‚úÖ Encuentra comunidades de nodos relacionados")
print("‚úÖ Genera resumen de la comunidad")
print("‚úÖ Usa par√°metros: $query_text, $community_threshold, $limit")



4. COMMUNITY SUMMARY RETRIEVER:
--------------------------------------------------------------------------------

CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as central_node, score

MATCH path = (central_node)-[*1..2]-(community_node:Chunk)
WHERE community_node <> central_node

WITH central_node, score,
     collect(DISTINCT community_node) as community,
     count(DISTINCT community_node) as community_size

WHERE community_size >= $community_threshold

RETURN {
    central_content: central_node.page_content,
    central_score: score,
    central_chunk_id: central_node.chunk_id,
    central_chunk_id_consecutive: central_node.chunk_id_consecutive,
    community_size: community_size,
    community_summary: reduce(
        summary = "",
        node IN community |
        summary + " " + coalesce(node.page_content, "")
    )
} as result
ORDER BY score DESC, community_size DESC
LIMIT $limit


‚úÖ Encuentra comunidades de nodos relacionados
‚úÖ Genera resumen

In [35]:
# 5. Graph-Enhanced Vector Search
graph_enhanced_query = """
CALL db.index.vector.queryNodes('chunk_embeddings', toInteger($limit), $query_vector)
YIELD node as vec_node, score as vec_score

OPTIONAL MATCH path = (vec_node)-[:NEXT_CHUNK|HAS_CHUNK]*1..2-(related_node:Chunk)
WHERE related_node IS NOT NULL

WITH vec_node, vec_score,
     collect(DISTINCT related_node) as related_nodes,
     count(DISTINCT related_node) as related_count

CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as text_node, score as text_score
WHERE text_node = vec_node

RETURN {
    content: vec_node.page_content,
    vector_score: vec_score,
    text_score: text_score,
    combined_score: (vec_score * 0.6 + text_score * 0.4),
    chunk_id: vec_node.chunk_id,
    chunk_id_consecutive: vec_node.chunk_id_consecutive,
    related_count: related_count
} as result
ORDER BY result.combined_score DESC
LIMIT $limit
"""

print("\n5. GRAPH-ENHANCED VECTOR SEARCH:")
print("-" * 80)
print(graph_enhanced_query)
print("\n‚úÖ Combina b√∫squeda vectorial con estructura del grafo")
print("‚úÖ Considera nodos relacionados para enriquecer contexto")
print("‚úÖ Usa par√°metros: $query_text, $query_vector, $limit")



5. GRAPH-ENHANCED VECTOR SEARCH:
--------------------------------------------------------------------------------

CALL db.index.vector.queryNodes('chunk_embeddings', toInteger($limit), $query_vector)
YIELD node as vec_node, score as vec_score

OPTIONAL MATCH path = (vec_node)-[:NEXT_CHUNK|HAS_CHUNK]*1..2-(related_node:Chunk)
WHERE related_node IS NOT NULL

WITH vec_node, vec_score,
     collect(DISTINCT related_node) as related_nodes,
     count(DISTINCT related_node) as related_count

CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as text_node, score as text_score
WHERE text_node = vec_node

RETURN {
    content: vec_node.page_content,
    vector_score: vec_score,
    text_score: text_score,
    combined_score: (vec_score * 0.6 + text_score * 0.4),
    chunk_id: vec_node.chunk_id,
    chunk_id_consecutive: vec_node.chunk_id_consecutive,
    related_count: related_count
} as result
ORDER BY result.combined_score DESC
LIMIT $limit


‚úÖ Combina b√∫squeda ve

## Parte 3: Validaci√≥n de Sintaxis Cypher

Verificamos que los queries sean sint√°cticamente v√°lidos (validaci√≥n b√°sica).


In [36]:
# Validaci√≥n b√°sica de sintaxis Cypher
print("üîç VALIDACI√ìN DE SINTAXIS CYPHER")
print("=" * 80)

def validate_cypher_basic(query: str) -> Dict[str, Any]:
    """Validaci√≥n b√°sica de sintaxis Cypher."""
    issues = []
    warnings = []
    
    # Verificar que usa par√°metros
    if "$" not in query:
        warnings.append("No se detectaron par√°metros ($param)")
    
    # Verificar palabras clave comunes
    required_keywords = ["RETURN", "MATCH", "CALL"]
    found_keywords = [kw for kw in required_keywords if kw in query]
    
    if not found_keywords:
        issues.append("No se encontraron palabras clave Cypher comunes")
    
    # Verificar que no hay interpolaci√≥n directa peligrosa
    dangerous_patterns = ['f"', "f'", '${', '%s', '%d']
    for pattern in dangerous_patterns:
        if pattern in query:
            warnings.append(f"Posible interpolaci√≥n directa detectada: {pattern}")
    
    return {
        "valid": len(issues) == 0,
        "issues": issues,
        "warnings": warnings,
        "found_keywords": found_keywords
    }

# Generar queries usando los m√©todos implementados (EJECUTABLE)
query_basic_val, _ = GraphRAGSearchPatterns.basic_retriever("test", limit=5)
query_meta_val, _ = GraphRAGSearchPatterns.metadata_filtering("test", {"filename": "test.md"}, limit=5)
query_parent_val, _ = GraphRAGSearchPatterns.parent_child_retriever("test", limit=5)

# Validar cada query implementado (EJECUTABLE)
queries_to_validate = {
    "Basic Retriever": query_basic_val,
    "Metadata Filtering": query_meta_val,
    "Parent-Child Retriever": query_parent_val,
}

# Tambi√©n validar queries de ejemplo (documentados pero no implementados)
# Definir queries de ejemplo para validaci√≥n
community_query = """
CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as central_node, score
MATCH path = (central_node)-[*1..2]-(community_node:Chunk)
WHERE community_node <> central_node
WITH central_node, score,
     collect(DISTINCT community_node) as community,
     count(DISTINCT community_node) as community_size
WHERE community_size >= $community_threshold
RETURN {
    central_content: central_node.page_content,
    central_score: score,
    community_size: community_size
} as result
ORDER BY score DESC, community_size DESC
LIMIT $limit
"""

graph_enhanced_query = """
CALL db.index.vector.queryNodes('chunk_embeddings', toInteger($limit), $query_vector)
YIELD node as vec_node, score as vec_score
OPTIONAL MATCH path = (vec_node)-[:NEXT_CHUNK|HAS_CHUNK]*1..2-(related_node:Chunk)
WHERE related_node IS NOT NULL
WITH vec_node, vec_score,
     collect(DISTINCT related_node) as related_nodes,
     count(DISTINCT related_node) as related_count
CALL db.index.fulltext.queryNodes("chunk_content", $query_text)
YIELD node as text_node, score as text_score
WHERE text_node = vec_node
RETURN {
    content: vec_node.page_content,
    vector_score: vec_score,
    text_score: text_score,
    combined_score: (vec_score * 0.6 + text_score * 0.4)
} as result
ORDER BY result.combined_score DESC
LIMIT $limit
"""

queries_examples = {
    "Community Summary (ejemplo)": community_query,
    "Graph-Enhanced Vector (ejemplo)": graph_enhanced_query
}

print("\n‚úÖ PATRONES IMPLEMENTADOS:")
for name, query in queries_to_validate.items():
    print(f"\n{name}:")
    result = validate_cypher_basic(query)
    if result["valid"]:
        print(f"  ‚úÖ Sintaxis b√°sica v√°lida")
    else:
        print(f"  ‚ùå Problemas encontrados: {result['issues']}")
    
    if result["warnings"]:
        print(f"  ‚ö†Ô∏è  Advertencias: {result['warnings']}")
    
    print(f"  üìù Keywords encontradas: {', '.join(result['found_keywords'])}")

print("\nüìù PATRONES DOCUMENTADOS (ejemplos):")
for name, query in queries_examples.items():
    print(f"\n{name}:")
    result = validate_cypher_basic(query)
    if result["valid"]:
        print(f"  ‚úÖ Sintaxis b√°sica v√°lida")
    else:
        print(f"  ‚ùå Problemas encontrados: {result['issues']}")
    
    if result["warnings"]:
        print(f"  ‚ö†Ô∏è  Advertencias: {result['warnings']}")
    
    print(f"  üìù Keywords encontradas: {', '.join(result['found_keywords'])}")


üîç VALIDACI√ìN DE SINTAXIS CYPHER

‚úÖ PATRONES IMPLEMENTADOS:

Basic Retriever:
  ‚úÖ Sintaxis b√°sica v√°lida
  üìù Keywords encontradas: RETURN, CALL

Metadata Filtering:
  ‚úÖ Sintaxis b√°sica v√°lida
  üìù Keywords encontradas: RETURN, CALL

Parent-Child Retriever:
  ‚úÖ Sintaxis b√°sica v√°lida
  üìù Keywords encontradas: RETURN, MATCH, CALL

üìù PATRONES DOCUMENTADOS (ejemplos):

Community Summary (ejemplo):
  ‚úÖ Sintaxis b√°sica v√°lida
  üìù Keywords encontradas: RETURN, MATCH, CALL

Graph-Enhanced Vector (ejemplo):
  ‚úÖ Sintaxis b√°sica v√°lida
  üìù Keywords encontradas: RETURN, MATCH, CALL


In [37]:
# Tabla comparativa de patrones
print("üìä TABLA COMPARATIVA DE PATRONES")
print("=" * 80)

import pandas as pd

# Crear DataFrame comparativo
comparison_data = []
for pattern_id, info in search_patterns.items():
    comparison_data.append({
        "Patr√≥n": info["nombre"],
        "ID": pattern_id,
        "Velocidad": info["velocidad"],
        "Precisi√≥n": info["precisi√≥n"],
        "Uso Recomendado": info["cuando_usar"][:50] + "..."
    })

df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))

print("\nüí° Nota: Velocidad y precisi√≥n son estimaciones relativas")
print("   ‚ö°‚ö°‚ö° = Muy r√°pido | ‚ö° = Lento")
print("   ‚≠ê‚≠ê‚≠ê‚≠ê = Muy preciso | ‚≠ê‚≠ê = Menos preciso")


üìä TABLA COMPARATIVA DE PATRONES
                      Patr√≥n                    ID Velocidad Precisi√≥n                                       Uso Recomendado
             Basic Retriever                 basic       ‚ö°‚ö°‚ö°        ‚≠ê‚≠ê    B√∫squedas por palabras clave, consultas simples...
            Pattern Matching      pattern_matching        ‚ö°‚ö°       ‚≠ê‚≠ê‚≠ê      B√∫squedas con estructura espec√≠fica del grafo...
          Metadata Filtering    metadata_filtering       ‚ö°‚ö°‚ö°       ‚≠ê‚≠ê‚≠ê Buscar solo en documentos espec√≠ficos, filtrar por...
      Parent-Child Retriever          parent_child        ‚ö°‚ö°       ‚≠ê‚≠ê‚≠ê  Cuando necesitas contexto completo de una secci√≥n...
  Community Summary (Global)             community         ‚ö°        ‚≠ê‚≠ê            Necesitas contexto amplio sobre un tema...
             Local Retriever                 local        ‚ö°‚ö°       ‚≠ê‚≠ê‚≠ê Exploraci√≥n de conocimiento espec√≠fico y focalizad...
Graph-Enhanced Vector S

## Parte 5: Ejemplos de Uso Futuro

Ejemplos de c√≥mo se usar√°n estos patrones cuando est√©n implementados.


In [38]:
# Ejemplos de uso futuro (cuando est√©n implementados)
print("üí° EJEMPLOS DE USO FUTURO")
print("=" * 60)

examples = {
    "basic": """
# B√∫squeda b√°sica
results = ungraph.search_with_pattern(
    "computaci√≥n cu√°ntica",
    pattern_type="basic",
    limit=5
)
""",
    
    "metadata_filtering": """
# B√∫squeda con filtros de metadatos
results = ungraph.search_with_pattern(
    "machine learning",
    pattern_type="metadata_filtering",
    metadata_filters={
        "filename": "ai_paper.md",
        "page_number": 1
    },
    limit=10
)
""",
    
    "parent_child": """
# B√∫squeda Parent-Child
results = ungraph.search_with_pattern(
    "inteligencia artificial",
    pattern_type="parent_child",
    parent_label="Page",
    child_label="Chunk",
    relationship_type="HAS_CHUNK",
    limit=5
)
""",
    
    "community": """
# B√∫squeda Community Summary
results = ungraph.search_with_pattern(
    "deep learning",
    pattern_type="community",
    community_threshold=5,
    max_depth=2,
    limit=3
)
""",
    
    "graph_enhanced_vector": """
# B√∫squeda Graph-Enhanced Vector
from ungraph import HuggingFaceEmbeddingService

embedding_service = HuggingFaceEmbeddingService()
query_embedding = embedding_service.generate_embedding("deep learning")

results = ungraph.search_with_pattern(
    "deep learning",
    pattern_type="graph_enhanced_vector",
    query_vector=query_embedding.vector,
    relationship_types=["NEXT_CHUNK", "HAS_CHUNK"],
    limit=5
)
"""
}

for pattern_id, example_code in examples.items():
    pattern_name = search_patterns[pattern_id]["nombre"]
    print(f"\n{pattern_name} ({pattern_id}):")
    print("-" * 60)
    print(example_code)
    print("‚è≥ Disponible en Fase 3 del plan")


üí° EJEMPLOS DE USO FUTURO

Basic Retriever (basic):
------------------------------------------------------------

# B√∫squeda b√°sica
results = ungraph.search_with_pattern(
    "computaci√≥n cu√°ntica",
    pattern_type="basic",
    limit=5
)

‚è≥ Disponible en Fase 3 del plan

Metadata Filtering (metadata_filtering):
------------------------------------------------------------

# B√∫squeda con filtros de metadatos
results = ungraph.search_with_pattern(
    "machine learning",
    pattern_type="metadata_filtering",
    metadata_filters={
        "filename": "ai_paper.md",
        "page_number": 1
    },
    limit=10
)

‚è≥ Disponible en Fase 3 del plan

Parent-Child Retriever (parent_child):
------------------------------------------------------------

# B√∫squeda Parent-Child
results = ungraph.search_with_pattern(
    "inteligencia artificial",
    pattern_type="parent_child",
    parent_label="Page",
    child_label="Chunk",
    relationship_type="HAS_CHUNK",
    limit=5
)

‚è≥ Dis

## Parte 6: Resumen y Estado

Resumen de lo probado y estado de implementaci√≥n.

**‚úÖ Los patrones b√°sicos est√°n implementados y funcionando:**
- `basic` / `basic_retriever` - B√∫squeda full-text simple
- `metadata_filtering` - B√∫squeda con filtros por metadatos  
- `parent_child` / `parent_child_retriever` - B√∫squeda jer√°rquica

**üìö Ver ejemplos de uso:**
- `docs/examples/phase3_search_patterns.md` - Ejemplos completos
- `docs/api/search-patterns.md` - Documentaci√≥n de API


In [None]:
# Probar b√∫squeda b√°sica existente (si hay datos en Neo4j)
print("üîç PROBANDO B√öSQUEDA B√ÅSICA EXISTENTE")
print("=" * 60)

# Descomentar si quieres probar con Neo4j real
# ‚ö†Ô∏è Requiere Neo4j configurado y datos ingeridos


# Configurar Neo4j (si no est√° configurado)
ungraph.configure(
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="tu_contrase√±a"
)

# Probar b√∫squeda b√°sica
try:
    results = ungraph.search("test query", limit=3)
    print(f"‚úÖ B√∫squeda exitosa: {len(results)} resultados")
    for i, result in enumerate(results, 1):
        print(f"\nResultado {i}:")
        print(f"  Score: {result.score:.3f}")
        print(f"  Content: {result.content[:100]}...")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("üí° Aseg√∫rate de haber ingerido documentos primero")


print("üí° Descomenta el c√≥digo anterior para probar con Neo4j real")
print("‚ö†Ô∏è  Requiere Neo4j configurado y datos ingeridos")


üîç PROBANDO B√öSQUEDA B√ÅSICA EXISTENTE


Failed to create a graph session: {code: Neo.ClientError.Security.Unauthorized} {message: The client is unauthorized due to authentication failure.}
URI: bolt://localhost:7687
User: neo4j
Please check:
1. Neo4j is running
2. Credentials are correct
3. URI is accessible


‚ùå Error: Failed to create a graph session: {code: Neo.ClientError.Security.Unauthorized} {message: The client is unauthorized due to authentication failure.}
URI: bolt://localhost:7687
User: neo4j
Please check:
1. Neo4j is running
2. Credentials are correct
3. URI is accessible
üí° Aseg√∫rate de haber ingerido documentos primero
üí° Descomenta el c√≥digo anterior para probar con Neo4j real
‚ö†Ô∏è  Requiere Neo4j configurado y datos ingeridos


## Parte 6: Resumen y Mejores Pr√°cticas

Resumen de los patrones y cu√°ndo usar cada uno.


In [None]:
print("\n" + "=" * 80)
print("üìä RESUMEN DE PATRONES GRAPHRAG B√ÅSICOS")
print("=" * 80)

print("\n‚úÖ Patrones B√°sicos Disponibles:")
print("  1. Basic Retriever - B√∫squeda full-text simple")
print("  2. Metadata Filtering - Filtrado por metadatos")
print("  3. Parent-Child Retriever - Contexto jer√°rquico")

print("\nüìã Cu√°ndo Usar Cada Patr√≥n:")
print("\n  Basic Retriever:")
print("    ‚úÖ B√∫squedas simples por palabras clave")
print("    ‚úÖ Necesitas resultados r√°pidos")
print("    ‚úÖ No necesitas filtrar por documento espec√≠fico")
print("\n  Metadata Filtering:")
print("    ‚úÖ Buscar solo en documentos espec√≠ficos")
print("    ‚úÖ Filtrar por fecha, autor, tipo de documento")
print("    ‚úÖ Necesitas precisi√≥n en documentos conocidos")
print("\n  Parent-Child Retriever:")
print("    ‚úÖ Necesitas contexto completo de una secci√≥n")
print("    ‚úÖ Buscar en Pages y obtener todos sus Chunks")
print("    ‚úÖ Quieres mantener estructura jer√°rquica")

print("\nüí° Mejores Pr√°cticas:")
print("  1. Empieza con Basic Retriever para b√∫squedas simples")
print("  2. Usa Metadata Filtering cuando conozcas el documento")
print("  3. Usa Parent-Child cuando necesites contexto completo")
print("  4. Combina con b√∫squeda vectorial para mejor precisi√≥n (ver 3.2)")
print("=" * 80)



üìä RESUMEN DE TESTS DE RETRIEVALS
Patrones Documentados: 7
Queries Cypher Validados: 3
Estado de Implementaci√≥n: ‚è≥ Pendiente (Fase 3)

üìã Patrones de B√∫squeda GraphRAG:
  üìù Planificado - Basic Retriever (basic)
  üìù Planificado - Pattern Matching (pattern_matching)
  üìù Planificado - Metadata Filtering (metadata_filtering)
  üìù Planificado - Parent-Child Retriever (parent_child)
  üìù Planificado - Community Summary (Global) (community)
  üìù Planificado - Local Retriever (local)
  üìù Planificado - Graph-Enhanced Vector Search (graph_enhanced_vector)

üéØ Pr√≥ximos Pasos:
  1. Fase 3: Implementar GraphRAGSearchPatterns
  2. Fase 3: Crear SearchWithPatternUseCase
  3. Fase 3: Exponer en API p√∫blica (ungraph.search_with_pattern)
  4. Probar con datos reales y comparar resultados


## Resumen Final

### Patrones B√°sicos GraphRAG

| Patr√≥n | Velocidad | Precisi√≥n | Mejor Para |
|--------|-----------|-----------|------------|
| **Basic Retriever** | ‚ö°‚ö°‚ö° | ‚≠ê‚≠ê | B√∫squedas simples y r√°pidas |
| **Metadata Filtering** | ‚ö°‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê | Filtrar por documento espec√≠fico |
| **Parent-Child Retriever** | ‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê | Contexto jer√°rquico completo |

### Siguiente Paso

Una vez que dominas los patrones b√°sicos, contin√∫a con:
- **3.4 Advanced GraphRAG Patterns** - Patrones avanzados con GDS y vector search mejorado
- **3.2 Basic Retrieval Patterns** - B√∫squeda vectorial e h√≠brida

## Referencias

- [GraphRAG Pattern Catalog](https://graphrag.com/reference/)
- [API de B√∫squeda](../../docs/api/search-patterns.md)
- [Neo4j Cypher Manual](https://neo4j.com/docs/cypher-manual/)
- [GraphRAGSearchPatterns Service](../../src/infrastructure/services/graphrag_search_patterns.py)
