# GraphRAG ChatBot For Stakeholder Model Whitepaper

This notebook will show the process of taking the Godot Stakeholder Model PDF and converting it into nodes and relationships in a NEO4J graqph database as well as using ChatGPT to communicate with the data in the database.

In [2]:
import os
from dotenv import load_dotenv

from langchain.document_loaders import WikipediaLoader
from langchain.evaluation.qa.eval_chain import QAEvalChain
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_community.document_loaders import PyPDFLoader
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain_openai import ChatOpenAI

In [5]:
load_dotenv(dotenv_path=".env")

URI = os.getenv("NEO4J_URI")
USER = os.getenv("NEO4J_USER")
PWD = os.getenv("NEO4J_PASSWORD")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [6]:
pdf_path = "The GODOT Stakeholder Value Model_ Whitepaper + Game Theory.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load()

In [11]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(pages)

In [12]:
len(chunks)

168

In [15]:
graph = Neo4jGraph(url=URI, username=USER, password=PWD)
llm = ChatOpenAI(temperature=0, model_name="gpt-4o", openai_api_key=OPENAI_API_KEY)
llm_transformer = LLMGraphTransformer(llm=llm)

In [15]:
graph_documents = llm_transformer.convert_to_graph_documents(chunks)

In [16]:
graph.add_graph_documents(graph_documents)

In [8]:
enhanced_graph = Neo4jGraph(url=URI, username=USER, password=PWD, enhanced_schema=True)

In [18]:
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Cypher developer.
Translate the user's natural language question INTO Cypher queries
that retrieve information ONLY from the GODOT Stakeholder Value Model graph.

Schema:
{schema}

Rules:
- Only use labels and relationship types that exist in the schema.
- Scope all queries to the GODOT Stakeholder Value Model context. Do NOT return generic definitions.
- Use explicit property filters with WHERE, e.g. MATCH (c:Company) WHERE c.`id` = "Godot".
- For broad questions ("What is X?"), treat X as a graph concept and expand context with OPTIONAL MATCH.
- Return informative properties (`id`, `name`, `title`, `summary`, `description`) and relationship targets.
- Use clear aliases, e.g. RETURN c.`id` AS company, collect(DISTINCT p.`id`) AS policies.
- Do NOT invent labels or properties that aren’t in the schema.
- If nothing is found, still produce a minimal MATCH/WHERE that returns zero rows.

Few-shot guidance:

Q: "What is Godot as a company?"
Cypher:
MATCH (c:Company) WHERE c.`id` = "Godot"
OPTIONAL MATCH (c)-[:HAS_POLICY]->(p:Policy)
OPTIONAL MATCH (c)-[:HAS_STRUCTURE]->(s:Structure)
OPTIONAL MATCH (c)-[:HAS_EQUITY_TRUST]->(t:EquityTrust)
OPTIONAL MATCH (c)-[:RELATES_TO]->(k:Concept)
RETURN c.`id` AS company,
       collect(DISTINCT p.`id`) AS policies,
       collect(DISTINCT s.`id`) AS structures,
       collect(DISTINCT t.`id`) AS equity_trust_elements,
       collect(DISTINCT k.`id`) AS related_concepts

Q: "What is game theory?"
Cypher:
MATCH (g:Concept)
WHERE toLower(g.`id`) = "game theory" OR toLower(g.`name`) CONTAINS "game theory"
OPTIONAL MATCH (g)<-[:APPLIES_FRAMEWORK]-(:Analysis)-[:USES]->(f:GameTheoryFramework)
OPTIONAL MATCH (g)<-[:RELATES_TO]-(co:Company)-[:RELATES_TO]->(c:Concept)
WHERE co.`id` = "Godot"
RETURN g.`id` AS concept,
       coalesce(g.`summary`, g.`description`, g.`id`) AS concept_summary,
       collect(DISTINCT f.`id`) AS frameworks,
       collect(DISTINCT c.`id`) AS nearby_company_concepts

Q: "How are bonuses calculated?"
Cypher:
MATCH (co:Company) WHERE co.`id` = "Godot"
OPTIONAL MATCH (co)-[:HAS_EQUITY_TRUST]->(t:EquityTrust)
OPTIONAL MATCH (t)-[:ALLOCATES]->(b:BonusPolicy)
OPTIONAL MATCH (b)-[:DEFINED_BY]->(kpi:KPI)
RETURN b.`id` AS bonus_policy,
       coalesce(b.`formula`, b.`summary`, b.`description`) AS bonus_formula,
       kpi.`id` AS kpi_id, kpi.`definition` AS kpi_definition

Q: "Summarize the compensation model."
Cypher:
MATCH (co:Company) WHERE co.`id` = "Godot"
OPTIONAL MATCH (co)-[:HAS_POLICY]->(p:Policy)
OPTIONAL MATCH (co)-[:HAS_EQUITY_TRUST]->(t:EquityTrust)-[:ALLOCATES]->(alloc)
WITH co, p, t, collect(DISTINCT labels(alloc)[0] + ":" + coalesce(alloc.`id`, alloc.`name`)) AS trust_allocations
RETURN co.`id` AS company,
       collect(DISTINCT p.`id`) AS policies,
       trust_allocations

User Question:
{question}
"""







cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "query"]
)


In [19]:
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=enhanced_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests=True
)


In [62]:
cypher_chain.invoke({"query":"What is game theory?"})



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (g:Concept)
WHERE toLower(g.`id`) = "game theory" OR toLower(g.`name`) CONTAINS "game theory"
OPTIONAL MATCH (g)<-[:APPLIES_FRAMEWORK]-(:Analysis)-[:USES]->(f:GameTheoryFramework)
OPTIONAL MATCH (g)<-[:RELATES_TO]-(co:Company)-[:RELATES_TO]->(c:Concept)
WHERE co.`id` = "Godot"
RETURN g.`id` AS concept,
       coalesce(g.`summary`, g.`description`, g.`id`) AS concept_summary,
       collect(DISTINCT f.`id`) AS frameworks,
       collect(DISTINCT c.`id`) AS nearby_company_concepts
[0m




Full Context:
[32;1m[1;3m[{'concept': 'Game Theory', 'concept_summary': 'Analytical lens in the model to evaluate incentive compatibility and cooperation.', 'frameworks': [], 'nearby_company_concepts': []}][0m

[1m> Finished chain.[0m


{'query': 'What is game theory?',
 'result': 'Game theory is an analytical lens in the model to evaluate incentive compatibility and cooperation.'}

In [1]:
cypher_chain.invoke({"query":"What are the incentives for employees?"})

NameError: name 'cypher_chain' is not defined

In [9]:
# Let's inspect the graph schema to understand what was extracted
print("=== GRAPH SCHEMA ===")
print(enhanced_graph.schema)
print("\n=== NODE LABELS ===")
result = enhanced_graph.query("CALL db.labels() YIELD label RETURN label ORDER BY label")
for record in result:
    print(f"- {record['label']}")

print("\n=== RELATIONSHIP TYPES ===")
result = enhanced_graph.query("CALL db.relationshipTypes() YIELD relationshipType RETURN relationshipType ORDER BY relationshipType")
for record in result:
    print(f"- {record['relationshipType']}")

print("\n=== SAMPLE NODES BY TYPE ===")
labels_result = enhanced_graph.query("CALL db.labels() YIELD label RETURN label LIMIT 5")
for label_record in labels_result:
    label = label_record['label']
    sample_query = f"MATCH (n:{label}) RETURN n LIMIT 3"
    try:
        samples = enhanced_graph.query(sample_query)
        print(f"\n{label} nodes:")
        for sample in samples:
            node = sample['n']
            print(f"  - {dict(node)}")
    except Exception as e:
        print(f"  Error querying {label}: {e}")

=== GRAPH SCHEMA ===
Node properties:
- **Person**
  - `id`: STRING Example: "Employees"
- **Organization**
  - `id`: STRING Example: "Godot"
- **Concept**
  - `id`: STRING Example: "Compensation Philosophy"
  - `summary`: STRING Available options: ['Analytical lens in the model to evaluate incentive']
- **Entity**
  - `id`: STRING Example: "Business"
- **Policy**
  - `id`: STRING Available options: ['Right to Dignity', 'Fair Pay with Aligned Incentives', 'Human-Centered Design', 'Equal Upside', 'Fair Pay With Aligned Incentives', 'Compensation Policy', 'Policy: The Right To Dignity', 'Uniform Compensation Policy', 'Risk Reduction', 'Engagement and Retention']
  - `summary`: STRING Available options: ['Human-centered policy: support autonomy, competenc', 'Transparent base salary plus equalized bonuses/ben', 'Design org/processes around psychological needs to', 'Founder places 65% equity into a trust; upside sha', 'Democratized communication, diverse teams, and ali', 'High base pay + cu

In [13]:
# IMPROVED CHUNKING STRATEGY
# Let's try semantic chunking for better entity extraction

def create_semantic_chunks(documents, chunk_size=800, overlap=100):
    """Create chunks that better preserve semantic meaning"""
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n\n", "\n", ". ", " ", ""],
        keep_separator=True
    )
    return text_splitter.split_documents(documents)

# Create semantic chunks
semantic_chunks = create_semantic_chunks(pages)
print(f"Original chunks: {len(chunks)}")
print(f"Semantic chunks: {len(semantic_chunks)}")
print(f"\nSample semantic chunk:")
print(semantic_chunks[0].page_content[:500] + "...")

Original chunks: 168
Semantic chunks: 192

Sample semantic chunk:
GODOT  COMPENSATION  WHITE  
PAPER-
 
The
 
Stakeholder
 
value
 
model
 
  
Compensation  Philosophy  
GODOT’s  compensation  ideas  reflect  the  value  we  place  on  employee  
contributions.
 
Our
 
total
 
compensation
 
package
 
is
 
comprised
 
of
 
direct
 
and
 
indirect
 
benefits.
 
These
 
benefits
 
go
 
beyond
 
monetary
 
support
 
for
 
benefits
 
like
 
healthcare
 
or
 
retirement
 
and
 
include
 
opportunities
 
for
 
growth,
 
capacity
 
development,
 
flexibility,
 
job
 ...


In [16]:
# IMPROVED GRAPH TRANSFORMER WITH CUSTOM EXTRACTION PROMPTS

from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.schema import Document

# Custom extraction prompt for better entity and relationship identification
ENTITY_EXTRACTION_PROMPT = """
Extract entities and relationships from the following text about the GODOT Stakeholder Value Model.

Focus on extracting:
1. ENTITIES (people, organizations, concepts, policies, frameworks, models, strategies)
2. RELATIONSHIPS (connections between entities)

For GODOT-specific content, pay special attention to:
- Stakeholder types (employees, investors, customers, etc.)
- Compensation/incentive mechanisms
- Governance structures
- Game theory concepts
- Value distribution models
- Equity trust structures

Text: {text}

Extract in a structured format that captures:
- Entity names and types
- Relationship labels that describe connections
- Key properties of entities (descriptions, roles, mechanisms)
"""

# Create improved transformer
improved_llm_transformer = LLMGraphTransformer(
    llm=llm,
    node_properties=["name", "description", "type", "role", "mechanism", "value"],
    allowed_nodes=["Company", "Stakeholder", "Employee", "Investor", "Customer", 
                   "Policy", "Framework", "Strategy", "Concept", "GameTheory",
                   "Incentive", "Compensation", "EquityTrust", "Governance"],
    allowed_relationships=["HAS_STAKEHOLDER", "IMPLEMENTS", "USES", "DEFINES",
                          "ALLOCATES", "GOVERNS", "INCENTIVIZES", "APPLIES_TO",
                          "RELATES_TO", "INFLUENCES", "DISTRIBUTES", "MANAGES"]
)

print("Improved transformer configured with domain-specific entities and relationships")

Improved transformer configured with domain-specific entities and relationships


In [22]:
# HYBRID RETRIEVAL: Vector Search + Graph Traversal

from langchain.vectorstores import Neo4jVector
from langchain.embeddings import OpenAIEmbeddings

# Create vector index for semantic search
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Vector store for semantic similarity
vector_store = Neo4jVector.from_documents(
    semantic_chunks,
    embeddings,
    url=URI,
    username=USER,
    password=PWD,
    index_name="document_embeddings"
)

def hybrid_retrieval_qa(question: str, k: int = 5):
    """
    Hybrid approach: Vector similarity + Graph traversal
    """
    # 1. Vector similarity search for relevant chunks
    similar_docs = vector_store.similarity_search(question, k=k)
    
    # 2. Extract key entities from the question using LLM
    entity_extraction_prompt = f"""
    Extract key entities from this question that might exist in a graph about the GODOT Stakeholder Value Model:
    Question: {question}
    
    Return only the entity names, separated by commas (e.g., "employee, incentive, bonus"):
    """
    
    entities_response = llm.invoke(entity_extraction_prompt)
    entities = [e.strip() for e in entities_response.content.split(",")]
    
    # 3. Graph traversal to find connected entities
    graph_context = []
    for entity in entities[:3]:  # Limit to first 3 entities
        graph_query = f"""
        MATCH (n) 
        WHERE toLower(n.name) CONTAINS toLower('{entity}') 
           OR toLower(n.id) CONTAINS toLower('{entity}')
        OPTIONAL MATCH (n)-[r]-(connected)
        RETURN n, type(r) as relationship, connected
        LIMIT 10
        """
        
        try:
            results = enhanced_graph.query(graph_query)
            for result in results:
                graph_context.append(result)
        except:
            continue
    
    # 4. Combine context
    combined_context = ""
    combined_context += "=== SIMILAR DOCUMENTS ===\n"
    for doc in similar_docs:
        combined_context += f"{doc.page_content}\n\n"
    
    combined_context += "=== GRAPH CONNECTIONS ===\n"
    for ctx in graph_context:
        if ctx.get('n') and ctx.get('connected'):
            combined_context += f"Entity: {dict(ctx['n'])} -> {ctx.get('relationship', 'RELATED')} -> {dict(ctx['connected'])}\n"
    
    return combined_context, similar_docs, graph_context

print("Hybrid retrieval system configured")

  embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)


Hybrid retrieval system configured


In [23]:
# COMPREHENSIVE Q&A SYSTEM WITH EVALUATION

def comprehensive_qa_system(question: str):
    """
    Complete Q&A pipeline with multiple retrieval strategies
    """
    print(f"🔍 Question: {question}")
    print("=" * 50)
    
    # Method 1: Cypher-based approach
    print("📊 Method 1: Cypher Query Approach")
    try:
        cypher_result = cypher_chain.invoke({"query": question})
        print(f"Answer: {cypher_result.get('result', 'No result')}")
        print(f"Generated Query: {cypher_result.get('intermediate_steps', [{}])[-1] if cypher_result.get('intermediate_steps') else 'N/A'}")
    except Exception as e:
        print(f"Error: {e}")
    
    print("\n" + "=" * 50)
    
    # Method 2: Hybrid retrieval approach
    print("🔗 Method 2: Hybrid Retrieval Approach")
    try:
        combined_context, docs, graph_ctx = hybrid_retrieval_qa(question)
        
        # Generate answer using retrieved context
        qa_prompt = f"""
        Based on the following context about the GODOT Stakeholder Value Model, answer the question.
        
        Context:
        {combined_context}
        
        Question: {question}
        
        Provide a comprehensive answer based on the context:
        """
        
        hybrid_answer = llm.invoke(qa_prompt)
        print(f"Answer: {hybrid_answer.content}")
        print(f"Sources: {len(docs)} documents, {len(graph_ctx)} graph connections")
    except Exception as e:
        print(f"Error: {e}")
    
    print("\n" + "=" * 50)

# Test questions for evaluation
test_questions = [
    "What are the incentives for employees?",
    "How does the equity trust work?",
    "What is game theory in the context of GODOT?",
    "How are bonuses calculated?",
    "What are the main stakeholder groups?",
    "How does the compensation model work?",
    "What governance structures are mentioned?"
]

print("Comprehensive Q&A system ready. Test with:")
for i, q in enumerate(test_questions, 1):
    print(f"{i}. {q}")

Comprehensive Q&A system ready. Test with:
1. What are the incentives for employees?
2. How does the equity trust work?
3. What is game theory in the context of GODOT?
4. How are bonuses calculated?
5. What are the main stakeholder groups?
6. How does the compensation model work?
7. What governance structures are mentioned?


In [26]:
# TEST THE COMPREHENSIVE SYSTEM
# Run this after setting up all components above

# Test with the employee incentives question
comprehensive_qa_system("What are the main stakeholder groups?")

🔍 Question: What are the main stakeholder groups?
📊 Method 1: Cypher Query Approach


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (g:Group)
RETURN collect(DISTINCT g.`id`) AS stakeholder_groups
[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (g:Group)
RETURN collect(DISTINCT g.`id`) AS stakeholder_groups
[0m
Full Context:
[32;1m[1;3m[{'stakeholder_groups': ['Employees', 'Management', 'Workers', 'Managers', 'Best People', 'Homogenous Groups', 'Large Diverse Groups', 'Workforce', 'Academics', 'Policymakers', 'Everyone', 'External Investors', 'Investors', 'Employee', 'Stakeholder', 'Owners', 'Individuals', 'Departments', 'Actors', 'Prospective Employees', 'Teams', 'Employees (Ex Ante Identical)', 'Founder/Investors', 'Sub-Coalitions', 'Department', 'Subgroup Of Employees', 'Highly Skilled Engineers', 'Top Salespeople', 'Top Talent', 'Applicant Pool', 'People', 'Partners', 'Leadership Team', 'Leaders', 'Outsiders', 'Trust', 'Finance Te



Answer: Based on the context provided about the GODOT Stakeholder Value Model, the main stakeholder groups can be inferred as follows:

1. **Employees**: The model emphasizes the importance of employees sharing in the value they create, fostering an ownership mindset, and turning them into true stakeholders. This suggests that employees are a central stakeholder group, with a focus on their well-being, motivation, and equitable compensation.

2. **Founders/Leadership**: The model describes a managed cooperative where the founder retains decision control. This indicates that founders or leadership are key stakeholders, responsible for maintaining governance and decision-making to avoid inefficiencies.

3. **Governmental and Policy Organizations**: The model has implications for these groups, as it provides a template for balancing stakeholder interests without legislative mandates. Policymakers are considered stakeholders who could encourage such models through incentives.

4. **Academi

In [28]:
# ENHANCED GRAPH EXTRACTION - Run this to improve entity connections

def create_enhanced_graph_documents(chunks, batch_size=10):
    """
    Enhanced graph extraction with better entity resolution
    """
    print("Creating enhanced graph documents...")
    
    # Custom prompt for GODOT-specific extraction
    enhanced_prompt = """
    Extract entities and relationships from this text about the GODOT Stakeholder Value Model.
    
    ENTITIES TO FOCUS ON:
    - Stakeholder types: Employee, Founder, Investor, Customer, Government, Academic
    - Business concepts: Compensation, Equity, Trust, Governance, Value
    - Game theory concepts: Nash equilibrium, Cooperative game, Strategy
    - Organizational elements: Policy, Framework, Structure, Model
    
    RELATIONSHIPS TO CREATE:
    - BELONGS_TO (stakeholder belongs to organization)
    - RECEIVES (stakeholder receives compensation/value)
    - PARTICIPATES_IN (stakeholder participates in governance)
    - APPLIES (theory/framework applies to situation)
    - INFLUENCES (entity influences another)
    
    Text: {text}
    
    Create specific, meaningful relationships that capture the stakeholder dynamics.
    """
    
    # Configure transformer with specific settings
    enhanced_transformer = LLMGraphTransformer(
        llm=llm,
        node_properties=["name", "description", "type", "role", "value", "mechanism"],
        allowed_nodes=[
            "Employee", "Founder", "Investor", "Customer", "Government", "Academic",
            "Compensation", "Equity", "Trust", "Governance", "Value", "Policy",
            "GameTheory", "Strategy", "Framework", "Model", "Organization"
        ],
        allowed_relationships=[
            "BELONGS_TO", "RECEIVES", "PARTICIPATES_IN", "APPLIES", "INFLUENCES",
            "MANAGES", "CREATES", "DISTRIBUTES", "GOVERNS", "IMPLEMENTS"
        ]
    )
    
    # Process in batches for better performance
    all_graph_docs = []
    for i in range(0, len(chunks), batch_size):
        batch = chunks[i:i+batch_size]
        print(f"Processing batch {i//batch_size + 1}/{(len(chunks) + batch_size - 1)//batch_size}")
        
        try:
            batch_docs = enhanced_transformer.convert_to_graph_documents(batch)
            all_graph_docs.extend(batch_docs)
        except Exception as e:
            print(f"Error in batch {i//batch_size + 1}: {e}")
            continue
    
    return all_graph_docs

# Optional: Re-extract with enhanced settings (uncomment to run)
enhanced_graph_docs = create_enhanced_graph_documents(semantic_chunks)
graph.add_graph_documents(enhanced_graph_docs)
print("Enhanced extraction function ready. Uncomment the last 2 lines to re-extract.")

Creating enhanced graph documents...
Processing batch 1/20
Processing batch 2/20
Processing batch 3/20
Processing batch 4/20
Processing batch 5/20
Processing batch 6/20
Processing batch 7/20
Processing batch 8/20
Processing batch 9/20
Processing batch 10/20
Processing batch 11/20
Processing batch 12/20
Processing batch 13/20
Processing batch 14/20
Processing batch 15/20
Processing batch 16/20
Processing batch 17/20
Processing batch 18/20
Processing batch 19/20
Processing batch 20/20
Enhanced extraction function ready. Uncomment the last 2 lines to re-extract.


In [29]:
# QUERY ANALYSIS & OPTIMIZATION

import time
from datetime import datetime

def analyze_question_performance(questions_list):
    """
    Analyze performance across different question types
    """
    results = []
    
    for question in questions_list:
        print(f"\n🔍 Analyzing: {question}")
        start_time = time.time()
        
        # Test both methods
        result = {
            'question': question,
            'timestamp': datetime.now().isoformat(),
            'cypher_success': False,
            'hybrid_success': False,
            'cypher_time': 0,
            'hybrid_time': 0,
            'cypher_answer_length': 0,
            'hybrid_answer_length': 0
        }
        
        # Test Cypher approach
        try:
            cypher_start = time.time()
            cypher_result = cypher_chain.invoke({"query": question})
            result['cypher_time'] = time.time() - cypher_start
            result['cypher_success'] = True
            if cypher_result.get('result'):
                result['cypher_answer_length'] = len(str(cypher_result['result']))
            print(f"  ✅ Cypher: {result['cypher_time']:.2f}s")
        except Exception as e:
            print(f"  ❌ Cypher failed: {e}")
        
        # Test Hybrid approach
        try:
            hybrid_start = time.time()
            combined_context, docs, graph_ctx = hybrid_retrieval_qa(question)
            qa_prompt = f"""
            Based on the following context about the GODOT Stakeholder Value Model, answer the question.
            Context: {combined_context[:2000]}...
            Question: {question}
            Provide a brief answer:
            """
            hybrid_answer = llm.invoke(qa_prompt)
            result['hybrid_time'] = time.time() - hybrid_start
            result['hybrid_success'] = True
            result['hybrid_answer_length'] = len(hybrid_answer.content)
            print(f"  ✅ Hybrid: {result['hybrid_time']:.2f}s, {len(docs)} docs, {len(graph_ctx)} graph connections")
        except Exception as e:
            print(f"  ❌ Hybrid failed: {e}")
        
        result['total_time'] = time.time() - start_time
        results.append(result)
    
    return results

# Performance analysis
analysis_questions = [
    "What are the main stakeholder groups?",
    "How does compensation work?",
    "What is game theory?",
    "Who are the employees?",
    "What governance structures exist?"
]

print("Query analysis function ready. Run: analyze_question_performance(analysis_questions)")

Query analysis function ready. Run: analyze_question_performance(analysis_questions)


In [30]:
# ANSWER QUALITY SCORING SYSTEM

def score_answer_quality(question: str, answer: str, context_sources: int = 0):
    """
    Score answer quality based on multiple factors
    """
    scoring_prompt = f"""
    Evaluate the quality of this answer about the GODOT Stakeholder Value Model on a scale of 1-10.
    
    Question: {question}
    Answer: {answer}
    Context sources used: {context_sources}
    
    Rate based on:
    1. Relevance to question (1-3 points)
    2. Accuracy and specificity to GODOT model (1-3 points) 
    3. Completeness of information (1-2 points)
    4. Use of context sources (1-2 points)
    
    Return only a number from 1-10 and a brief explanation:
    """
    
    try:
        score_response = llm.invoke(scoring_prompt)
        # Extract numeric score
        score_text = score_response.content
        score = float(score_text.split()[0]) if score_text.split()[0].replace('.', '').isdigit() else 5.0
        return min(max(score, 1.0), 10.0), score_text
    except:
        return 5.0, "Unable to score"

def comprehensive_qa_with_scoring(question: str):
    """
    Enhanced Q&A with quality scoring
    """
    print(f"🔍 Question: {question}")
    print("=" * 60)
    
    # Method 1: Cypher approach
    print("📊 Method 1: Cypher Query Approach")
    cypher_score = 0
    try:
        cypher_result = cypher_chain.invoke({"query": question})
        answer = cypher_result.get('result', 'No result')
        cypher_score, score_explanation = score_answer_quality(question, str(answer), 1)
        
        print(f"Answer: {answer}")
        print(f"Quality Score: {cypher_score}/10")
        print(f"Scoring: {score_explanation}")
    except Exception as e:
        print(f"Error: {e}")
    
    print("\n" + "=" * 60)
    
    # Method 2: Hybrid approach
    print("🔗 Method 2: Hybrid Retrieval Approach")
    hybrid_score = 0
    try:
        combined_context, docs, graph_ctx = hybrid_retrieval_qa(question)
        
        qa_prompt = f"""
        Based on the following context about the GODOT Stakeholder Value Model, answer the question.
        
        Context:
        {combined_context[:3000]}
        
        Question: {question}
        
        Provide a comprehensive answer based on the context:
        """
        
        hybrid_answer = llm.invoke(qa_prompt)
        answer = hybrid_answer.content
        hybrid_score, score_explanation = score_answer_quality(question, answer, len(docs) + len(graph_ctx))
        
        print(f"Answer: {answer}")
        print(f"Sources: {len(docs)} documents, {len(graph_ctx)} graph connections")
        print(f"Quality Score: {hybrid_score}/10")
        print(f"Scoring: {score_explanation}")
    except Exception as e:
        print(f"Error: {e}")
    
    print("\n" + "=" * 60)
    print(f"🏆 Recommendation: {'Cypher' if cypher_score > hybrid_score else 'Hybrid'} method performed better")
    
    return cypher_score, hybrid_score

print("Enhanced Q&A with scoring ready!")

Enhanced Q&A with scoring ready!


In [31]:
# GRAPH ENRICHMENT - Add inferred relationships

def enrich_graph_relationships():
    """
    Add inferred relationships to improve graph connectivity
    """
    print("🔧 Enriching graph with inferred relationships...")
    
    enrichment_queries = [
        # Connect stakeholders to related concepts
        """
        MATCH (s) WHERE s.name =~ '.*[Ee]mployee.*' OR s.id =~ '.*[Ee]mployee.*'
        MATCH (c) WHERE c.name =~ '.*[Cc]ompensation.*' OR c.id =~ '.*[Cc]ompensation.*'
        WHERE NOT (s)-[:RECEIVES]-(c)
        CREATE (s)-[:RECEIVES]->(c)
        """,
        
        # Connect game theory to stakeholders
        """
        MATCH (g) WHERE g.name =~ '.*[Gg]ame.*' OR g.id =~ '.*[Gg]ame.*'
        MATCH (s) WHERE s.name =~ '.*[Ss]takeholder.*' OR s.id =~ '.*[Ss]takeholder.*'
        WHERE NOT (g)-[:APPLIES_TO]-(s)
        CREATE (g)-[:APPLIES_TO]->(s)
        """,
        
        # Connect governance to policies
        """
        MATCH (g) WHERE g.name =~ '.*[Gg]overnance.*' OR g.id =~ '.*[Gg]overnance.*'
        MATCH (p) WHERE p.name =~ '.*[Pp]olicy.*' OR p.id =~ '.*[Pp]olicy.*'
        WHERE NOT (g)-[:IMPLEMENTS]-(p)
        CREATE (g)-[:IMPLEMENTS]->(p)
        """,
        
        # Create GODOT organization node if it doesn't exist
        """
        MERGE (godot:Organization {name: "GODOT", type: "Company"})
        WITH godot
        MATCH (n) WHERE n.name =~ '.*GODOT.*' OR n.id =~ '.*GODOT.*'
        WHERE NOT (n)-[:PART_OF]-(godot) AND n <> godot
        CREATE (n)-[:PART_OF]->(godot)
        """
    ]
    
    for i, query in enumerate(enrichment_queries):
        try:
            result = enhanced_graph.query(query)
            print(f"  ✅ Enrichment query {i+1} completed")
        except Exception as e:
            print(f"  ❌ Enrichment query {i+1} failed: {e}")
    
    # Check connectivity improvement
    connectivity_query = """
    MATCH ()-[r]->() 
    RETURN count(r) as total_relationships
    """
    
    result = enhanced_graph.query(connectivity_query)
    total_rels = result[0]['total_relationships'] if result else 0
    print(f"📊 Total relationships in graph: {total_rels}")
    
    return total_rels

def check_graph_statistics():
    """
    Get comprehensive graph statistics
    """
    stats_queries = {
        "Total Nodes": "MATCH (n) RETURN count(n) as count",
        "Total Relationships": "MATCH ()-[r]->() RETURN count(r) as count",
        "Node Types": "MATCH (n) RETURN labels(n)[0] as label, count(n) as count ORDER BY count DESC",
        "Relationship Types": "MATCH ()-[r]->() RETURN type(r) as rel_type, count(r) as count ORDER BY count DESC",
        "Most Connected Nodes": """
            MATCH (n)-[r]-() 
            RETURN n.name as node_name, labels(n)[0] as type, count(r) as connections 
            ORDER BY connections DESC 
            LIMIT 10
        """
    }
    
    print("📈 GRAPH STATISTICS")
    print("=" * 40)
    
    for stat_name, query in stats_queries.items():
        try:
            result = enhanced_graph.query(query)
            print(f"\n{stat_name}:")
            
            if stat_name in ["Total Nodes", "Total Relationships"]:
                print(f"  {result[0]['count']}")
            else:
                for record in result[:5]:  # Limit to top 5
                    if 'label' in record:
                        print(f"  {record['label']}: {record['count']}")
                    elif 'rel_type' in record:
                        print(f"  {record['rel_type']}: {record['count']}")
                    elif 'node_name' in record:
                        name = record['node_name'] or 'Unnamed'
                        print(f"  {name} ({record['type']}): {record['connections']} connections")
        except Exception as e:
            print(f"  Error: {e}")

print("Graph enrichment functions ready!")
print("Run: enrich_graph_relationships() to improve connectivity")
print("Run: check_graph_statistics() to see current graph state")

Graph enrichment functions ready!
Run: enrich_graph_relationships() to improve connectivity
Run: check_graph_statistics() to see current graph state


In [33]:
# QUICK TEST OF IMPROVEMENTS

# 1. Check current graph state
print("1. Current Graph Statistics:")
check_graph_statistics()

print("\n" + "="*60 + "\n")

# 2. Test enhanced Q&A with scoring
print("2. Testing Enhanced Q&A with Scoring:")
comprehensive_qa_with_scoring("How does the GODOT model incentivize employees?")

print("\n" + "="*60 + "\n")

# 3. Optional: Enrich graph (uncomment to run)
print("3. Enriching Graph Relationships:")
enrich_graph_relationships()
print("3. Graph enrichment ready (uncomment lines above to run)")

1. Current Graph Statistics:
📈 GRAPH STATISTICS

Total Nodes:
  2287

Total Relationships:
  2445

Node Types:
  Concept: 1024
  Value: 196
  Chunk: 192
  Person: 86
  Model: 84

Relationship Types:
  INFLUENCES: 339
  APPLIES: 99
  IMPLEMENTS: 80
  BELONGS_TO: 72
  INCLUDES: 68

Most Connected Nodes:
  Unnamed (Concept): 2308 connections
  Unnamed (Organization): 431 connections
  Unnamed (Person): 265 connections
  Unnamed (Value): 229 connections
  Unnamed (Model): 204 connections


2. Testing Enhanced Q&A with Scoring:
🔍 Question: How does the GODOT model incentivize employees?
📊 Method 1: Cypher Query Approach


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (co:Company) WHERE co.`id` = "Godot"
OPTIONAL MATCH (co)-[:HAS_POLICY]->(p:Policy)
OPTIONAL MATCH (co)-[:HAS_EQUITY_TRUST]->(t:EquityTrust)
OPTIONAL MATCH (t)-[:ALLOCATES]->(b:BonusPolicy)
OPTIONAL MATCH (b)-[:DEFINED_BY]->(kpi:KPI)
OPTIONAL MATCH (co)-[:RELATES_TO]->(c:Concept)


In [34]:
comprehensive_qa_system("What are the main stakeholder groups?")

🔍 Question: What are the main stakeholder groups?
📊 Method 1: Cypher Query Approach


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (g:Group)
RETURN collect(DISTINCT g.`id`) AS stakeholder_groups
[0m
Full Context:
[32;1m[1;3m[{'stakeholder_groups': ['Employees', 'Management', 'Workers', 'Managers', 'Best People', 'Homogenous Groups', 'Large Diverse Groups', 'Workforce', 'Academics', 'Policymakers', 'Everyone', 'External Investors', 'Investors', 'Employee', 'Stakeholder', 'Owners', 'Individuals', 'Departments', 'Actors', 'Prospective Employees', 'Teams', 'Employees (Ex Ante Identical)', 'Founder/Investors', 'Sub-Coalitions', 'Department', 'Subgroup Of Employees', 'Highly Skilled Engineers', 'Top Salespeople', 'Top Talent', 'Applicant Pool', 'People', 'Partners', 'Leadership Team', 'Leaders', 'Outsiders', 'Trust', 'Finance Team', 'Communities', 'Godotians', 'Group Working On Hackathon', 'Group', 'Founders']}][0m

[1m> Finished chain.