<a href="https://colab.research.google.com/github/edmar-silva/colabs/blob/main/falkordb_graphiti_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 How to Build a Knowledge Graph with FalkorDB & Graphiti
### 🗓️ July 29 | 10:00 AM PDT | 1:00 PM EDT | 7:00 PM CEST

**Workshop Overview:**
- 🧠 Build knowledge graphs from structured + unstructured data
- ⚽ Case study: Football/Soccer data knowledge graph
- 🚀 Powered by Zep's [Graphiti](https://zep.dev) and [FalkorDB](https://www.falkordb.com/)

---


## 🎯 Learning Objectives

By the end of this workshop, you'll know how to:

1. **🔄 Data Transformation**: Convert structured & unstructured data into graph form
2. **⚙️ Automation**: Automate data ingestion using Graphiti's pipelines
3. **🔍 Smart Retrieval**: Enable semantic search & question retrieval from graphs
4. **📈 Real-time Updates**: Keep your knowledge graph current with new information

---

## 🛠️ Setup & Installation

First, let's install the required dependencies:

In [None]:
# Install Graphiti with FalkorDB support
!pip install graphiti-core[falkordb]==v0.17.7



## 📦 Import Libraries & Setup Environment

Let's import all necessary libraries and set up our environment:



In [None]:
import os
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
from google.colab import userdata
from graphiti_core import Graphiti
from datetime import datetime, timezone
from graphiti_core.nodes import EpisodeType
from graphiti_core.driver.falkordb_driver import FalkorDriver
from graphiti_core.search.search_config_recipes import NODE_HYBRID_SEARCH_RRF

# Configuration
os.environ["MODEL_NAME"] = "gpt-4.1"
group_id = "la-liga"

## 🔧 Helper Functions

Let's define utility functions for data processing and graph operations:





In [None]:
async def add_episodes_to_graph(graphiti, episodes, group_id, prefix="Episode"):
    """Add a list of episodes to the graph using Graphiti."""
    print(f"📝 Adding {len(episodes)} episodes to graph...")

    for i, episode in enumerate(episodes):
        name = episode.get('name', f"{prefix} {i+1}")
        content = episode['content']

        # Convert non-string content to JSON
        if not isinstance(content, str):
            content = json.dumps(content)

        # Graphiti method for addin data
        await graphiti.add_episode(
            name=name,
            episode_body=content,
            source=episode['type'],
            source_description=episode['description'],
            reference_time=datetime.now(timezone.utc),
            group_id=group_id
        )

    print(f"✅ Successfully added {len(episodes)} episodes!")


def get_standing_table(url, teams_filter, date):
    """Extract La Liga standings from Wikipedia."""
    print(f"🏆 Fetching standings data for {date}...")

    standing_table = pd.read_html(url)[5]

    # Clean and rename columns
    standing_table = standing_table.rename(columns={
        "Teamvte": "TeamName",
        "Pts": "Season Points",
        "Pos": "Season Position",
        "W": "Season Wins"
    })

    standing_table = standing_table[["TeamName", "Season Points", "Season Position", "Season Wins"]]

    # Clean team names (remove champion indicator)
    standing_table.at[0, "TeamName"] = standing_table.at[0, "TeamName"].replace(" (C)", "")
    standing_table['Relevant Period'] = date

    episodes = []
    for row in standing_table.to_dict(orient="records"):
        if row["TeamName"] in teams_filter:
            episode = {
                "content": row,
                "type": EpisodeType.json,
                "description": f"Extract the {date} La Liga Standing into entities"
            }
            episodes.append(episode)

    print(f"✅ Found {len(episodes)} relevant team standings")
    return episodes


def get_topscorers_table(url, teams_filter, date):
    """Extract top scorers data from Wikipedia."""
    print(f"⚽ Fetching top scorers data for {date}...")

    top_soccer_table = pd.read_html(url)[7]
    top_soccer_table = top_soccer_table.rename(columns={
        "Goals[51]": "Season Goals"})

    top_soccer_table['Seasion'] = date

    episodes = []
    for row in top_soccer_table.to_dict(orient="records"):
        if row["Club"] in teams_filter:
            episode = {
                "content": row,
                "type": EpisodeType.json,
                "description": f"Extract the {date} La Liga top player stats into different entities"
            }
            episodes.append(episode)

    print(f"✅ Found {len(episodes)} relevant top scorers")
    return episodes


def get_article_from_url(url):
    """Scrape article content from a URL."""
    print(f"📰 Fetching article from: {url}")

    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract date
    date_meta = soup.find("meta", {"name": "DC.date.issued"})
    article_date = date_meta["content"] if date_meta and date_meta.get("content") else "Date not found"

    # Extract article text
    paragraphs = soup.find_all("p")
    filtered = [p.get_text(strip=True) for p in paragraphs if len(p.get_text(strip=True)) > 50]
    article_text = "\n\n".join(filtered).encode("utf-8", "ignore").decode("utf-8")
    article_text = article_text.replace("í", "i")

    print("✅ Article extracted successfully")
    return article_date, article_text


async def search_and_display(graphiti, query, num_results=5):
    """Search the graph and display results in a clean format."""
    print(f"🔍 Searching for: '{query}'")
    print("-" * 50)

    results = await graphiti.search(query, num_results=num_results)

    for i, r in enumerate(results, 1):
        print(f"{i}. {r.fact}")
        print(f"   Label: {r.name}")
        print(f"   📅 Valid from: {r.valid_at}")
        if r.invalid_at:
            print(f"   ❌ Invalid at: {r.invalid_at}")
        print()

    return results


## 🚀 Initialize Database Connection

Now let's connect to FalkorDB and initialize our Graphiti instance:

In [None]:
# Set up API keys and database connection
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Initialize FalkorDB driver
falkor_driver = FalkorDriver(
    host=userdata.get('FALKORDB_HOST'),
    port=userdata.get('FALKORDB_PORT'),
    username=userdata.get('FALKORDB_USERNAME'),
    password=userdata.get('FALKORDB_PASSWORD'),
    database=group_id
)

# Initialize Graphiti
graphiti = Graphiti(graph_driver=falkor_driver)

# Build necessary indices and constraints
await graphiti.build_indices_and_constraints()


## 📊 Part 1: Structured Data - La Liga Standings

Let's start by ingesting structured data from Wikipedia about La Liga standings:

La Liga Standings (2023/24)

In [None]:
# Configuration for our data extraction
standing_2324_url = 'https://en.wikipedia.org/wiki/2023%E2%80%9324_La_Liga'
TARGET_TEAMS = {"Real Madrid", "Barcelona"}  # Focus on El Clasico teams
relevant_date = '2023-07-25-2024-05-25'

# Extract standings data
episodes_2324 = get_standing_table(standing_2324_url, TARGET_TEAMS, relevant_date)

# Display what we extracted
print("\n📋 2023/24 Season Data:")
for episode in episodes_2324:
    content = episode['content']
    print(f"Team: {content['TeamName']}, Points: {content['Season Points']}, Position: {content['Season Position']}")

# Add 2023/24 standings to the graph
await add_episodes_to_graph(graphiti, episodes_2324, group_id, prefix="LaLiga 23/24 Standings")

🏆 Fetching standings data for 2023-07-25-2024-05-25...
✅ Found 2 relevant team standings

📋 2023/24 Season Data:
Team: Real Madrid, Points: 95, Position: 1
Team: Barcelona, Points: 85, Position: 2
📝 Adding 2 episodes to graph...
✅ Successfully added 2 episodes!


Now let's add the current season data:

La Liga Standings (2024/25)

In [None]:
standing_2425_url = 'https://en.wikipedia.org/wiki/2024%E2%80%9325_La_Liga'
relevant_date = '2024-07-25-2025-05-25'

# Extract current season standings
episodes_2425 = get_standing_table(standing_2425_url, TARGET_TEAMS, relevant_date)

# Display current season data
print("\n📋 2024/25 Season Data:")
for episode in episodes_2425:
    content = episode['content']
    print(f"Team: {content['TeamName']}, Points: {content['Season Points']}, Position: {content['Season Position']}")

# Add to graph
await add_episodes_to_graph(graphiti, episodes_2425, group_id, prefix="LaLiga 24/25 Standings")

🏆 Fetching standings data for 2024-07-25-2025-05-25...
✅ Found 2 relevant team standings

📋 2024/25 Season Data:
Team: Barcelona, Points: 88, Position: 1
Team: Real Madrid, Points: 84, Position: 2
📝 Adding 2 episodes to graph...
✅ Successfully added 2 episodes!


## ⚽ Part 2: Player Performance Data - Top Scorers

Let's also add player performance data:

In [None]:
# Extract top scorers data
top_scorers_url = 'https://en.wikipedia.org/wiki/2024%E2%80%9325_La_Liga'
relevant_date = 'Season 24/25'

episodes_scorers = get_topscorers_table(top_scorers_url, TARGET_TEAMS, relevant_date)

# Add top scorers to graph
await add_episodes_to_graph(graphiti, episodes_scorers, group_id, prefix="LaLiga 24/25 Top Scorers")

⚽ Fetching top scorers data for Season 24/25...
✅ Found 3 relevant top scorers
📝 Adding 3 episodes to graph...
✅ Successfully added 3 episodes!


## 🔍 Graph-Based Query #1

Let's test our knowledge graph with some queries:

In [None]:
# Query about Real Madrid's performance
fact_results = await search_and_display(graphiti, "What was Real Madrid’s final points at the end of each season?")

🔍 Searching for: 'What was Real Madrid’s final points at the end of each season?'
--------------------------------------------------
1. Real Madrid has 84 season points in the 2024-07-25 to 2025-05-25 period.
   Label: HAS_SEASON_POINTS
   📅 Valid from: 2024-07-25 00:00:00+00:00
   ❌ Invalid at: 2025-05-25 00:00:00+00:00

2. Real Madrid has 95 season points in the period 2023-07-25 to 2024-05-25.
   Label: HAS_SEASON_POINTS
   📅 Valid from: 2023-07-25 00:00:00+00:00
   ❌ Invalid at: 2024-05-25 00:00:00+00:00

3. Real Madrid has 26 season wins in the 2024-07-25 to 2025-05-25 period.
   Label: HAS_SEASON_WINS
   📅 Valid from: 2024-07-25 00:00:00+00:00
   ❌ Invalid at: 2025-05-25 00:00:00+00:00

4. Real Madrid has 29 season wins in the period 2023-07-25 to 2024-05-25.
   Label: HAS_SEASON_WINS
   📅 Valid from: 2023-07-25 00:00:00+00:00
   ❌ Invalid at: 2024-05-25 00:00:00+00:00

5. Real Madrid has season position 1 in the period 2023-07-25 to 2024-05-25.
   Label: HAS_SEASON_POSITION
   📅

## 📰 Part 3: Unstructured Data - News Article

Now let's add some unstructured data from news articles:

In [None]:
# Scrape a football news article
article_url = "https://www.espn.com/soccer/story/_/id/45783151/marcus-rashford-arrives-barcelona-loan-man-united"
article_date, article_text = get_article_from_url(article_url)

# Create episode from article
espn_episode = {
    'content': f"{article_date}\n\n{article_text}",
    'type': EpisodeType.text,
    'description': "Football transfer news and rumors"
}
print("\n📰 Article Preview:")
print(article_text[:800] + "...\n")

# Add article to graph
await add_episodes_to_graph(graphiti, [espn_episode], group_id, prefix="ESPN Transfer News")



📰 Fetching article from: https://www.espn.com/soccer/story/_/id/45783151/marcus-rashford-arrives-barcelona-loan-man-united
✅ Article extracted successfully

📰 Article Preview:
Mark Ogden discusses Marcus Rashford's potential move to Barcelona after the Spanish club were given permission to speak to the player. (2:01)

Marcus Rashfordlanded inBarcelonaon Sunday ahead of completing a season-long loan move fromManchester United.

Rashford, 27, will undergo a medical early in the week and, if everything goes to plan, will be presented as a Barça player before the club head off on tour, a source told ESPN.

Barça fly to Asia on Thursday and coach Hansi Flick was keen to have Rashford with the team in Japan and South Korea to give him as much time as possible to bed in before the season starts in August.

- Sources:Rashford close to Barcelona loan move-Nico Williams explains 10-year Athletic extension- Sources:Man Utd close on Mbeumo before U.S. tour

Rashford was cle...

📝 Adding 1 episodes

## 🔍 Graph-Based Query #2

Let's query about transfer rumors:

In [None]:
# Query about Barcelona transfer rumors
fact_results = await search_and_display(graphiti, "Who are the players rumored to move to Barcelona?")

🔍 Searching for: 'Who are the players rumored to move to Barcelona?'
--------------------------------------------------
1. Mark Ogden discusses Marcus Rashford's potential move to Barcelona after the Spanish club were given permission to speak to the player.
   Label: DISCUSS
   📅 Valid from: 2025-07-20 20:37:00+00:00

2. Robert Lewandowski plays for Barcelona.
   Label: PLAYS_FOR
   📅 Valid from: 2025-08-07 13:39:34+00:00

3. Raphinha plays for Barcelona.
   Label: PLAYS_FOR
   📅 Valid from: 2025-08-07 13:39:40+00:00

4. Marcus Rashford landed in Barcelona on Sunday ahead of completing a season-long loan move from Manchester United.
   Label: LOANED_TO
   📅 Valid from: 2025-07-20 00:00:00+00:00

5. Barcelona has season position 2 in the 2023-07-25 to 2024-05-25 period.
   Label: HAS_SEASON_POSITION
   📅 Valid from: 2023-07-25 00:00:00+00:00
   ❌ Invalid at: 2024-05-25 00:00:00+00:00



## 🎯 Node Search

Graphiti also supports searching for specific entities (nodes) in the graph:

In [None]:
# Configure node search
node_search_config = NODE_HYBRID_SEARCH_RRF.model_copy(deep=True)
node_search_config.limit = 2

# Search for specific entity
node_search_results = await graphiti._search(
    query='Hansi Flick',
    config=node_search_config
)

print("🔍 Node Search Results:")
print("-" * 30)
for i, node in enumerate(node_search_results.nodes, 1):
    print(f"{i}. Name: {node.name}")
    print(f"   Summary: {node.summary[:200]}...")
    print()

🔍 Node Search Results:
------------------------------
1. Name: Hansi Flick
   Summary: Hansi Flick is a football coach who was keen to have Marcus Rashford join Barcelona in the upcoming season. The transfer involved Rashford moving on a season-long loan from Manchester United, with an ...

2. Name: Robert Lewandowski
   Summary: Marcus Rashford is set to join Barcelona on a season-long loan from Manchester United, with an option for a permanent transfer next summer. The deal is reported to be around €30 million. Rashford, 27,...



## 🔍 Graph-Based Query #3

Let's ask about Barcelona's recent performance:

In [None]:
# Query about Barcelona's wins
fact_results = await search_and_display(graphiti, "What the latest news about Manchester United")

🔍 Searching for: 'What the latest news about Manchester United'
--------------------------------------------------
1. Mark Ogden discusses Marcus Rashford's potential move to Barcelona after the Spanish club were given permission to speak to the player.
   Label: DISCUSS
   📅 Valid from: 2025-07-20 20:37:00+00:00

2. Marcus Rashford was told by coach Ruben Amorim that he does not feature in his plans at Old Trafford.
   Label: COACHED_BY
   📅 Valid from: None

3. Marcus Rashford is on loan from Manchester United.
   Label: OWNED_BY
   📅 Valid from: 2025-07-20 00:00:00+00:00

4. Marcus Rashford has not played for United since facing Viktoria Plzen in the Europa League last December.
   Label: PLAYED_AGAINST
   📅 Valid from: 2024-12-01 00:00:00+00:00

5. Marcus Rashford landed in Barcelona on Sunday ahead of completing a season-long loan move from Manchester United.
   Label: LOANED_TO
   📅 Valid from: 2025-07-20 00:00:00+00:00



## 🔄 Part 4: Real-time Updates

One of the key advantages of knowledge graphs is their ability to handle updates and maintain temporal consistency:

In [None]:
# Simulate new transfer updates
new_updates = [
    {
        "content": "Lionel Messi is rumored to be transferring to Barcelona from Inter Miami.",
        "type": EpisodeType.message,
        "description": "Latest transfer rumor update"
    },
    {
        "content": "Mark Ogden reports that Marcus Rashford has renewed his contract with Manchester United until 2028 and he is no longer connected to any move or loan to Barcelona anymore.",
        "type": EpisodeType.message,
        "description": "Previous facts updates - update the old facts"
    }
]

# Add updates to graph
await add_episodes_to_graph(graphiti, new_updates, group_id, prefix="Transfer Update")

📝 Adding 2 episodes to graph...
✅ Successfully added 2 episodes!


## 🔍 Updated Query Results

Let's see how our graph handles the updated information:

In [None]:
# Query again about Barcelona transfers - should show updated information
fact_results = await search_and_display(graphiti, "Who are the players rumored to move to Barcelona?")

🔍 Searching for: 'Who are the players rumored to move to Barcelona?'
--------------------------------------------------
1. Lionel Messi is rumored to be transferring to Barcelona from Inter Miami.
   Label: RUMORED_TO_TRANSFER_FROM
   📅 Valid from: 2025-08-07 13:46:58+00:00

2. Lionel Messi is rumored to be transferring to Barcelona from Inter Miami.
   Label: RUMORED_TO_TRANSFER_TO
   📅 Valid from: 2025-08-07 13:46:58+00:00

3. Mark Ogden discusses Marcus Rashford's potential move to Barcelona after the Spanish club were given permission to speak to the player.
   Label: DISCUSS
   📅 Valid from: 2025-07-20 20:37:00+00:00

4. Robert Lewandowski plays for Barcelona.
   Label: PLAYS_FOR
   📅 Valid from: 2025-08-07 13:39:34+00:00

5. Raphinha plays for Barcelona.
   Label: PLAYS_FOR
   📅 Valid from: 2025-08-07 13:39:40+00:00



In [None]:
fact_results = await search_and_display(graphiti, "What is the latest news about Manchester United?")

🔍 Searching for: 'What is the latest news about Manchester United?'
--------------------------------------------------
1. Mark Ogden reports that Marcus Rashford has renewed his contract with Manchester United until 2028.
   Label: REPORTED_BY
   📅 Valid from: 2025-08-07 13:47:06+00:00

2. Marcus Rashford has renewed his contract with Manchester United until 2028.
   Label: HAS_RENEWED_CONTRACT_WITH
   📅 Valid from: 2025-08-07 13:47:06+00:00
   ❌ Invalid at: 2028-01-01 00:00:00+00:00

3. Marcus Rashford has not played for United since facing Viktoria Plzen in the Europa League last December.
   Label: PLAYED_AGAINST
   📅 Valid from: 2024-12-01 00:00:00+00:00

4. Marcus Rashford is on loan from Manchester United.
   Label: OWNED_BY
   📅 Valid from: 2025-07-20 00:00:00+00:00

5. Marcus Rashford was told by coach Ruben Amorim that he does not feature in his plans at Old Trafford.
   Label: COACHED_BY
   📅 Valid from: None



## 🎬 Complex Queries

Let's demonstrate the power of graph-based retrieval with complex queries:

In [None]:
from openai import OpenAI
from typing import Optional
from datetime import datetime

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

class FactWithTime:
    def __init__(self, fact: str, valid_at: datetime, invalid_at: Optional[datetime]):
        self.fact = fact
        self.valid_at = valid_at
        self.invalid_at = invalid_at

def format_temporal_context(facts: list[FactWithTime]) -> str:
    lines = []
    for f in facts:
        line = f"- {f.fact.strip()} (valid from"
        if f.valid_at:
          line += f"{f.valid_at.strftime('%Y-%m-%d')}"
        else:
          line += "now"

        if f.invalid_at:
            line += f" to {f.invalid_at.strftime('%Y-%m-%d')}"
        else:
            line += ", still valid"
        line += ")"
        lines.append(line)
    return "\n".join(lines)

async def graphiti_agent(graphiti, query: str, num_results: int = 12, model: str = "gpt-4") -> str:
    print(f"🔍 Searching Graphiti for: '{query}'")
    results = await graphiti.search(query, num_results=num_results)

    facts = [
        FactWithTime(fact=r.fact, valid_at=r.valid_at, invalid_at=r.invalid_at)
        for r in results
    ]
    context = format_temporal_context(facts)
    system_prompt = (
    "You are Graphiti, an expert knowledge graph agent.\n"
    "You are given a list of facts, each with a date range indicating when it was valid.\n"
    "Some facts describe past events but may have been added or validated later.\n\n"
    "Your task is to answer the user's question using the facts that describe the relevant time period.\n"
    "The date when a fact became valid is not the same as the time it describes — for example, a fact added in 2025 can still describe the 2023/24 season.\n"
    "Therefore, use any fact that describes the relevant time period, even if the fact itself became valid later.\n\n"
    "If no fact describes the relevant time period, explain clearly why the available facts are insufficient to answer the question.\n"
    "Be transparent about missing or incomplete data if you can't fully answer.\n\n"
    "Respond with a **concise**, factual answer. Use bullet points if helpful, but avoid repeating the context or adding unnecessary commentary."
    )
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Facts:\n{context}\n\nQuestion:\n{query.strip()}"}
        ],
        temperature=0.5,
        max_tokens=500
    )

    return response.choices[0].message.content.strip()


In [None]:
# 🎯 Complex Query Demo — requires reasoning across multiple facts
print("🎯 Complex Query Demo")
print("=" * 50)

queries = [
    "Compare Real Madrid and Barcelona performance between the 2023/24 and 2024/25 seasons.",
    "Where does the player with the most goals currently play?",
    "How many points was to the club that Messi rumored to",
]

for query in queries:
    try:
        answer = await graphiti_agent(graphiti, query)
        print("✅ Answer:", answer)
    except Exception as e:
        print("❌ Error:", str(e))
    print("-" * 50)


🎯 Complex Query Demo
🔍 Searching Graphiti for: 'Compare Real Madrid and Barcelona performance between the 2023/24 and 2024/25 seasons.'
✅ Answer: **2023/24 Season:**
- Real Madrid:
  - Position: 1
  - Points: 95
  - Wins: 29
- Barcelona:
  - Position: 2
  - Points: 85
  - Wins: 26

**2024/25 Season:**
- Real Madrid:
  - Position: 2
  - Points: 84
  - Wins: 26
- Barcelona:
  - Position: 1
  - Points: 88
  - Wins: 28

In the 2023/24 season, Real Madrid performed better than Barcelona, but in the 2024/25 season, Barcelona outperformed Real Madrid.
--------------------------------------------------
🔍 Searching Graphiti for: 'Where does the player with the most goals currently play?'
✅ Answer: The player with the most goals in the 24/25 season, according to the provided facts, is Kylian Mbappé with 31 goals. He played for Real Madrid during that season.
--------------------------------------------------
🔍 Searching Graphiti for: 'How many points was to the club that Messi rumored to'
✅ Answ