fix: remove PII from log messages by prasmussen15 · Pull Request #1237 · getzep/graphiti

prasmussen15 · 2026-02-17T19:47:46Z

Summary

Remove entity names, edge names/facts, and full LLM message content from log messages to prevent PII from leaking into logs
Replace with UUIDs, counts, and structural metadata only
Affects WARNING, ERROR, and DEBUG level logs across the ingestion pipeline

Files changed

graphiti_core/utils/maintenance/edge_operations.py — Remove entity names from 3 WARNING-level logs (source/target entity not found). Replace full edge objects and (name, uuid) tuples with UUID-only lists in 6 DEBUG logs.
graphiti_core/utils/maintenance/node_operations.py — Remove entity names from 2 WARNING logs (unknown entity summary, invalid duplicate name). Replace (name, uuid) tuples with UUID-only lists in 5 DEBUG logs. Remove entity names from dedup sampling logs.
graphiti_core/llm_client/client.py — Replace full message content dump in _get_failed_generation_log() (called at ERROR level by Gemini client) with message count and role metadata.

What was leaking

# BEFORE — entity names in WARNING logs
WARNING: Source entity name "John Smith" not found in nodes for edge WORKS_AT
WARNING: LLM returned summary for unknown entity: Jane Doe

# BEFORE — full LLM input content in ERROR logs
ERROR: Input messages: [{"role": "user", "content": "Extract entities from: John called Mary..."}]

# AFTER — only UUIDs and metadata
WARNING: Source entity not found in nodes for extracted edge
WARNING: LLM returned summary for unknown entity
ERROR: Input messages: 2 message(s), roles: ['system', 'user']

Test plan

ruff check — 0 errors
pyright — 0 errors, 0 warnings
282 unit tests pass (1 pre-existing failure unrelated to this change)

🤖 Generated with Claude Code

Remove entity names, edge facts, and LLM input/output content from log messages to prevent personally identifiable information from leaking into logs. Replace with UUIDs, counts, and structural metadata only. Changes: - edge_operations.py: Remove entity names from WARNING logs, replace full edge objects and name tuples with UUIDs in DEBUG logs - node_operations.py: Remove entity names from WARNING and DEBUG logs, log only UUIDs and counts instead of (name, uuid) tuples - llm_client/client.py: Replace full message content dump in _get_failed_generation_log with message count and role metadata Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-17T19:49:11Z

graphiti_core/llm_client/client.py

-                log += f'Raw output: {output[:2000]}... (truncated) ...{output[-2000:]}\n'
-            else:
-                log += f'Raw output: {output}\n'
+            log += f'Raw output length: {len(output)} chars\n'


Completely omitting the raw output content removes valuable debugging information. When LLM generation fails, knowing the actual malformed output (JSON parsing errors, unexpected format, etc.) is crucial for diagnosing issues.

Consider truncating the output instead of just showing the length:

Suggested change

log += f'Raw output length: {len(output)} chars\n'

if output is not None:

# Truncate to avoid PII while preserving debug value

truncated = output[:500] + '...' if len(output) > 500 else output

log += f'Raw output (truncated): {truncated}\n'

This balances PII protection (500 chars is unlikely to contain substantial user data in a structured LLM response) with debuggability.

claude · 2026-02-17T19:49:14Z

graphiti_core/utils/maintenance/edge_operations.py

-                f'Source entity name "{source_name}" not found in nodes '
-                f'for edge {edge_data.relation_type}'
-            )
+            logger.warning('Source entity not found in nodes for extracted edge')


The warning now lacks enough context to diagnose the issue. When an LLM returns invalid entity names, operators need to know which edge relationship was affected to understand if there's a pattern (e.g., certain relation types consistently failing).

Consider keeping the relation type which is schema/metadata, not user PII:

Suggested change

logger.warning('Source entity not found in nodes for extracted edge')

logger.warning(

f'Source entity not found in nodes for edge relation: {edge_data.relation_type}'

)

claude · 2026-02-17T19:49:16Z

graphiti_core/utils/maintenance/edge_operations.py

-                f'Target entity name "{target_name}" not found in nodes '
-                f'for edge {edge_data.relation_type}'
-            )
+            logger.warning('Target entity not found in nodes for extracted edge')


Same concern - the relation type is not PII and helps diagnose which edge extractions are failing.

claude · 2026-02-17T19:49:19Z

graphiti_core/utils/maintenance/node_operations.py

        # Check if this entity type should be excluded
        if excluded_entity_types and entity_type_name in excluded_entity_types:
-            logger.debug(f'Excluding entity "{extracted_entity.name}" of type "{entity_type_name}"')
+            logger.debug(f'Excluding entity of type "{entity_type_name}"')


The entity type is not PII - it's schema metadata (e.g., "Person", "Organization"). This information is valuable for debugging without exposing user data. The original line was fine:

Suggested change

logger.debug(f'Excluding entity of type "{entity_type_name}"')

logger.debug(f'Excluding entity of type "{entity_type_name}"')

Wait, looking again this is unchanged. But the original had entity name which is correct to remove. This change looks appropriate.

claude · 2026-02-17T19:49:27Z

graphiti_core/utils/maintenance/node_operations.py

            logger.warning(
-                'Invalid duplicate_name %r for extracted node %s; treating as no duplicate.',
-                duplicate_name,
+                'Invalid duplicate_name for extracted node %s; treating as no duplicate.',


Removing duplicate_name from this warning makes it impossible to diagnose LLM deduplication issues. The duplicate name returned by the LLM is structural metadata about how entities were matched, not necessarily user PII.

If the LLM consistently returns invalid duplicate names, operators need to see what value was returned to identify patterns (e.g., case sensitivity issues, partial matches, etc.).

Consider at minimum logging whether a duplicate_name was provided:

Suggested change

'Invalid duplicate_name for extracted node %s; treating as no duplicate.',

logger.warning(

'Invalid duplicate_name for extracted node %s; treating as no duplicate. '

'duplicate_name was: %r',

extracted_node.uuid,

duplicate_name[:50] + '...' if len(duplicate_name) > 50 else duplicate_name,

)

Or if truly concerned about PII, at least log the length to distinguish empty vs invalid values.

claude · 2026-02-17T19:49:30Z

graphiti_core/utils/maintenance/node_operations.py

                node.summary = truncated_summary
        else:
-            logger.warning(f'LLM returned summary for unknown entity: {summarized_entity.name}')
+            logger.warning('LLM returned summary for unknown entity')


This warning now provides no actionable information. When an LLM returns a summary for an entity that doesn't exist, it indicates either:

A bug in the summarization prompt/context

An LLM hallucination pattern

Without knowing which entity name the LLM returned, there's no way to debug this. Consider logging the summarized_entity.name truncated:

Suggested change

logger.warning('LLM returned summary for unknown entity')

logger.warning(

'LLM returned summary for unknown entity (first 30 chars): %.30s',

summarized_entity.name,

)

claude · 2026-02-17T19:49:42Z

Review Summary

The goal of removing PII from logs is important for security and compliance. However, this PR overcorrects in several places, removing debugging information that is not PII while making the logs nearly useless for diagnosing production issues.

Main Concerns

Distinguishing PII from metadata: Entity names can contain PII, but relation types (e.g., WORKS_AT, LIVES_IN) and entity types (e.g., Person, Organization) are schema metadata, not user data. These should be preserved in logs.
Debuggability regression: Several warnings now provide no actionable context:
- "Source entity not found in nodes for extracted edge" - which edge? what relation?
- "LLM returned summary for unknown entity" - what name did the LLM hallucinate?
- "Invalid duplicate_name for extracted node" - what value did the LLM return?
Raw output removal in _get_failed_generation_log: Completely removing the LLM's raw output means failed generation errors become nearly impossible to debug. The raw output typically contains structural/formatting issues, not user PII. Consider truncating rather than omitting entirely.

Recommendations

Preserve relation types and entity types in logs (these are not PII)
Truncate potentially sensitive values rather than removing them entirely
Consider structured logging with a separate "sensitive" field that can be filtered at the log collector level, rather than removing from source

See inline comments for specific suggestions.

Address review feedback — the initial PII fix overcorrected by removing non-PII debugging context: - Restore relation types in edge WARNING logs (schema metadata, not PII) - Restore truncated duplicate_name in dedup WARNING (needed for diagnosis) - Restore truncated entity name (first 30 chars) in summary WARNING - Restore truncated raw LLM output (first 500 chars) in failed generation ERROR logs — malformed output is structural, not user content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-17T19:53:29Z

graphiti_core/utils/maintenance/edge_operations.py

-            logger.warning(
-                f'Could not find nodes for edge: {edge_data.source_entity_name} -> {edge_data.target_entity_name}'
-            )
+            logger.warning('Could not find source or target node for extracted edge')


This warning still lacks the relation type context that was restored in the similar warnings above (lines 157, 164). Since edge_data is in scope here, consider including edge_data.relation_type for consistency:

Suggested change

logger.warning('Could not find source or target node for extracted edge')

logger.warning('Could not find source or target node for edge relation: %s', edge_data.relation_type)

prasmussen15 temporarily deployed to development February 17, 2026 19:47 — with GitHub Actions Inactive

claude bot reviewed Feb 17, 2026

View reviewed changes

prasmussen15 temporarily deployed to development February 17, 2026 19:52 — with GitHub Actions Inactive

claude bot reviewed Feb 17, 2026

View reviewed changes

prasmussen15 merged commit 7985997 into main Feb 17, 2026
13 checks passed

prasmussen15 deleted the fix/remove-pii-from-logs branch February 17, 2026 19:58

getzep locked and limited conversation to collaborators Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove PII from log messages#1237

fix: remove PII from log messages#1237
prasmussen15 merged 2 commits intomainfrom
fix/remove-pii-from-logs

prasmussen15 commented Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

claude bot commented Feb 17, 2026

Uh oh!

claude bot Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            log += f'Raw output length: {len(output)} chars\n'
+        if output is not None:
+            # Truncate to avoid PII while preserving debug value
+            truncated = output[:500] + '...' if len(output) > 500 else output
+            log += f'Raw output (truncated): {truncated}\n'

	logger.debug(f'Excluding entity of type "{entity_type_name}"')
	logger.debug(f'Excluding entity of type "{entity_type_name}"')

-                'Invalid duplicate_name for extracted node %s; treating as no duplicate.',
+            logger.warning(
+                'Invalid duplicate_name for extracted node %s; treating as no duplicate. '
+                'duplicate_name was: %r',
+                extracted_node.uuid,
+                duplicate_name[:50] + '...' if len(duplicate_name) > 50 else duplicate_name,
+            )

	logger.warning('Could not find source or target node for extracted edge')
	logger.warning('Could not find source or target node for edge relation: %s', edge_data.relation_type)

Conversation

prasmussen15 commented Feb 17, 2026

Summary

Files changed

What was leaking

Test plan

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 17, 2026

Review Summary

Main Concerns

Recommendations

Uh oh!

claude bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant