Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
972b263
Add fact parser validation, tests, and bulk fact loading with tests
ColtonPayne Jan 21, 2026
e498f4a
Don't hardcode default values
ColtonPayne Jan 21, 2026
0e2e395
Prevent predicates from starting with a digit
ColtonPayne Jan 21, 2026
cf592e2
Fix typo in MAKEFILE
ColtonPayne Jan 21, 2026
0908471
Improve CSV loader tests
ColtonPayne Jan 21, 2026
5a78cea
Add fact string formatting rules in docstring
ColtonPayne Jan 21, 2026
83a2452
Remove extranious f string for linter
ColtonPayne Jan 21, 2026
ee0ea04
Fix api test file loading
ColtonPayne Jan 21, 2026
f1395dc
Add test for example with no header
ColtonPayne Jan 21, 2026
0e3db89
Make invalid csv file loads raise exceptions by default
ColtonPayne Jan 25, 2026
a402321
Upd tests
ColtonPayne Jan 25, 2026
d1cb309
Add support for negated interval and negated explicit true/false
ColtonPayne Jan 30, 2026
b903281
Load facts from json instead of csv
ColtonPayne Jan 30, 2026
cad2d95
Update file loading tests
ColtonPayne Jan 30, 2026
3eecf32
Revert
ColtonPayne Jan 30, 2026
05b3748
Final cleanup
ColtonPayne Jan 31, 2026
0c79988
Add back csv file loading and add duplicate name checks
ColtonPayne Feb 3, 2026
b1ea06c
Merge branch 'main' into input-validation
ColtonPayne Feb 3, 2026
01a20a4
CSV Formatting
ColtonPayne Feb 4, 2026
42d54e2
Add back load rules from file
ColtonPayne Feb 4, 2026
f4fbcb0
Merge branch 'input-validation' of github.com:lab-v2/pyreason into in…
ColtonPayne Feb 4, 2026
3ba07ec
Requrie exact header match for csv headers
ColtonPayne Feb 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ test-api: ## Run only API tests (tests/api_tests)

test-jit: ## Run only JIT-disabled tests (tests/unit/disable_jit)
@echo "$(BOLD)$(BLUE)Running JIT-disabled tests...$(RESET)"
$(RUN_TESTS) --suite don_disable_jit
$(RUN_TESTS) --suite dont_disable_jit

test-no-jit: ## Run only JIT-enabled tests (tests/unit/dont_disable_jit)
@echo "$(BOLD)$(BLUE)Running JIT-enabled tests...$(RESET)"
Expand Down
325 changes: 324 additions & 1 deletion pyreason/pyreason.py

Large diffs are not rendered by default.

40 changes: 39 additions & 1 deletion pyreason/scripts/facts/fact.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,45 @@ class Fact:
def __init__(self, fact_text: str, name: str = None, start_time: int = 0, end_time: int = 0, static: bool = False):
"""Define a PyReason fact that can be loaded into the program using `pr.add_fact()`

:param fact_text: The fact in text format. Example: `'pred(x,y) : [0.2, 1]'` or `'pred(x,y) : True'`
:param fact_text: The fact in text format. Must follow these formatting rules:

**Format:** `predicate(component)` or `predicate(component):bound`

**Predicate rules:**
- Must start with a letter (a-z, A-Z) or underscore (_)
- Can contain letters, digits (0-9), and underscores
- Cannot start with a digit
- Examples: `viewed`, `has_access`, `_internal`, `pred123`

**Component rules:**
- Node fact: Single component `predicate(node1)`
- Edge fact: Two components separated by comma `predicate(node1,node2)`
- Cannot contain parentheses, colons, or nested structures

**Bound rules:**
- If omitted, defaults to True (1.0)
- Boolean: `True` or `False` (case-insensitive)
- Interval: `[lower,upper]` where both values are in range [0, 1]
- Negation: `~predicate(component)`

**Valid examples:**
- `'viewed(Zach)'` - defaults to True
- `'viewed(Zach):True'` - explicit True
- `'viewed(Zach):False'` - explicit False
- `'~viewed(Zach)'` - negation (False)
- `'viewed(Zach):[0.5,0.8]'` - interval bound
- `'connected(Alice,Bob)'` - edge fact
- `'connected(Alice,Bob):[0.7,0.9]'` - edge fact with interval
- `'~pred(node):[0.2,0.8]'` - negation with explicit bound
NOTE: Negating an explicit bound will round the upper and lower bounds to 10 decimal places before taking the negation
This is needed to avoid floating point precision errors.

**Invalid examples:**
- `'123pred(node)'` - predicate starts with digit
- `'pred@name(node)'` - invalid characters in predicate
- `'pred(node1,node2,node3)'` - more than 2 components
- `'pred(node):[1.5,2.0]'` - values out of range [0,1]

:type fact_text: str
:param name: The name of the fact. This will appear in the trace so that you know when it was applied
:type name: str
Expand Down
147 changes: 135 additions & 12 deletions pyreason/scripts/utils/fact_parser.py
Original file line number Diff line number Diff line change
@@ -1,40 +1,163 @@
import pyreason.scripts.numba_wrapper.numba_types.interval_type as interval
import re


# Input validation work was implemented with the help of Claude Sonnet 4.5.
def parse_fact(fact_text):
# Validate input is not empty or whitespace only
if not fact_text or not fact_text.strip():
raise ValueError("Fact text cannot be empty or whitespace only")

f = fact_text.replace(' ', '')

# Check for multiple colons
colon_count = f.count(':')
if colon_count > 1:
raise ValueError(f"Fact text contains multiple colons ({colon_count}), expected at most 1")

# Check for double negation
if f.startswith('~~'):
raise ValueError("Double negation is not allowed")

# Separate into predicate-component and bound. If there is no bound it means it's true
negate_interval = False
if ':' in f:
pred_comp, bound = f.split(':')
parts = f.split(':')
if len(parts) != 2:
raise ValueError("Invalid fact format: expected at most one colon separator")
pred_comp, bound = parts

# Check for negation with explicit bound
if pred_comp.startswith('~'):
pred_comp = pred_comp[1:]
if bound.lower() == 'true':
bound = 'False'
elif bound.lower() == 'false':
bound = 'True'
else:
negate_interval = True
else:
pred_comp = f
if pred_comp[0] == '~':
if pred_comp.startswith('~'):
bound = 'False'
pred_comp = pred_comp[1:]
else:
bound = 'True'

# Check if bound is a boolean or a list of floats
bound = bound.lower()
if bound == 'true':
bound = interval.closed(1, 1)
elif bound == 'false':
bound = interval.closed(0, 0)
else:
bound = [float(b) for b in bound[1:-1].split(',')]
bound = interval.closed(*bound)
# Validate predicate-component is not empty
if not pred_comp:
raise ValueError("Predicate-component cannot be empty")

# Validate parentheses exist and are properly formed
if '(' not in pred_comp:
raise ValueError("Missing opening parenthesis in fact")
if ')' not in pred_comp:
raise ValueError("Missing closing parenthesis in fact")

# Check for nested or multiple parentheses
open_count = pred_comp.count('(')
close_count = pred_comp.count(')')
if open_count != 1 or close_count != 1:
raise ValueError(f"Invalid parentheses: found {open_count} '(' and {close_count} ')', expected exactly 1 of each")

# Check parentheses are in correct order
open_idx = pred_comp.find('(')
close_idx = pred_comp.find(')')
if open_idx >= close_idx:
raise ValueError("Invalid parentheses order: '(' must come before ')'")

# Check closing parenthesis is at the end
if close_idx != len(pred_comp) - 1:
raise ValueError("Closing parenthesis must be at the end of predicate-component")

# Split the predicate and component
idx = pred_comp.find('(')
pred = pred_comp[:idx]
component = pred_comp[idx + 1:-1]

# Validate predicate is not empty
if not pred:
raise ValueError("Predicate cannot be empty")

# Validate predicate contains only valid characters (alphanumeric and underscore)
# Predicates must start with a letter or underscore (like Python identifiers)
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', pred):
if pred[0].isdigit():
raise ValueError(f"Predicate '{pred}' cannot start with a digit. Must start with a letter or underscore")
else:
raise ValueError(f"Predicate '{pred}' contains invalid characters. Only letters, digits, and underscores allowed, must start with letter or underscore")

# Validate component is not empty
if not component:
raise ValueError("Component cannot be empty")

# Check for invalid characters in component
if '(' in component or ')' in component:
raise ValueError("Component cannot contain parentheses")
if ':' in component:
raise ValueError("Component cannot contain colons")

# Check if it is a node or edge fact
if ',' in component:
fact_type = 'edge'
component = tuple(component.split(','))
components = component.split(',')

# Validate exactly 2 components for edges
if len(components) != 2:
raise ValueError(f"Edge facts must have exactly 2 components, found {len(components)}")

# Validate no empty components
for i, comp in enumerate(components):
if not comp:
raise ValueError(f"Component {i+1} in edge fact cannot be empty")

component = tuple(components)
else:
fact_type = 'node'

# Check if bound is a boolean or a list of floats
if bound.lower() == 'true':
bound = interval.closed(1, 1)
elif bound.lower() == 'false':
bound = interval.closed(0, 0)
else:
# Validate interval format
if not bound.startswith('['):
raise ValueError(f"Invalid bound format: expected '[' at start of interval, got '{bound[0] if bound else 'empty'}'")
if not bound.endswith(']'):
raise ValueError(f"Invalid bound format: expected ']' at end of interval, got '{bound[-1] if bound else 'empty'}'")

# Extract values between brackets
interval_content = bound[1:-1]
if not interval_content:
raise ValueError("Interval cannot be empty")

# Parse float values
parts = interval_content.split(',')
if len(parts) != 2:
raise ValueError(f"Interval must have exactly 2 values, found {len(parts)}")

try:
bound_values = [float(b) for b in parts]
except ValueError as e:
raise ValueError(f"Invalid interval values: {e}")

lower, upper = bound_values
# Validate bounds are in valid range [0, 1]
if lower < 0 or lower > 1:
raise ValueError(f"Interval lower bound {lower} is out of valid range [0, 1]")
if upper < 0 or upper > 1:
raise ValueError(f"Interval upper bound {upper} is out of valid range [0, 1]")

# Validate lower <= upper
if lower > upper:
raise ValueError(f"Interval lower bound {lower} cannot be greater than upper bound {upper}")

# We calculate ~[l,u] = [1-u, 1-l]
# Round to eliminate floating point precision errors (e.g., 1 - 0.8 = 0.19999999...)
if negate_interval:
lower, upper = round(1 - upper, 10), round(1 - lower, 10)

bound = interval.closed(lower, upper)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyumanaditya Looks good to me, can you check this once?

return pred, component, bound, fact_type
19 changes: 19 additions & 0 deletions tests/api_tests/test_files/example_facts.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
fact_text,name,start_time,end_time,static
Viewed(Zach),seen-fact-zach,0,3,False
Viewed(Justin),seen-fact-justin,0,3,true
Viewed(Michelle),seen-fact-michelle,1,3,FALSE
Viewed(Amy),seen-fact-amy,2,3,0
"HaveAccess(Zach,TextMessage)",access-zach,0,5,True
"HaveAccess(Justin,TextMessage)",access-justin,0,5,1
"HaveAccess(Michelle,TextMessage)",access-michelle,0,5,yes
"HaveAccess(Amy,TextMessage)",access-amy,0,5,no
"Processed(Node1):[0.5,0.8]",interval-node,0,10,False
"Knows(Person1,Person2):[0.7,0.9]",interval-edge,0,10,True
Viewed(Valid),valid-fact,0,3,False
,empty-fact-text,0,3,False
InvalidSyntax,bad-syntax,0,3,False
"Viewed(OutOfRange):[2.5,3.0]",out-of-range,0,3,False
Viewed(BadStartTime),bad-start,abc,5,False
Viewed(BadEndTime),bad-end,0,xyz,False
Viewed(BadStaticValue),bad-static,0,5,invalid
Viewed(EmptyOptionals),,,,
121 changes: 121 additions & 0 deletions tests/api_tests/test_files/example_facts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
[
{
"fact_text": "Viewed(Zach)",
"name": "seen-fact-zach",
"start_time": 0,
"end_time": 3,
"static": false
},
{
"fact_text": "Viewed(Justin)",
"name": "seen-fact-justin",
"start_time": 0,
"end_time": 3,
"static": true
},
{
"fact_text": "Viewed(Michelle)",
"name": "seen-fact-michelle",
"start_time": 1,
"end_time": 3,
"static": false
},
{
"fact_text": "Viewed(Amy)",
"name": "seen-fact-amy",
"start_time": 2,
"end_time": 3,
"static": 0
},
{
"fact_text": "HaveAccess(Zach,TextMessage)",
"name": "access-zach",
"start_time": 0,
"end_time": 5,
"static": true
},
{
"fact_text": "HaveAccess(Justin,TextMessage)",
"name": "access-justin",
"start_time": 0,
"end_time": 5,
"static": 1
},
{
"fact_text": "HaveAccess(Michelle,TextMessage)",
"name": "access-michelle",
"start_time": 0,
"end_time": 5,
"static": "yes"
},
{
"fact_text": "HaveAccess(Amy,TextMessage)",
"name": "access-amy",
"start_time": 0,
"end_time": 5,
"static": "no"
},
{
"fact_text": "Processed(Node1):[0.5,0.8]",
"name": "interval-node"
},
{
"fact_text": "Knows(Person1,Person2):[0.7,0.9]",
"name": "interval-edge",
"start_time": 0,
"end_time": 10,
"static": true
},
{
"fact_text": "Viewed(Valid)",
"name": "valid-fact",
"start_time": 0,
"end_time": 3,
"static": false
},
{
"fact_text": "",
"name": "empty-fact-text",
"start_time": 0,
"end_time": 3,
"static": false
},
{
"fact_text": "InvalidSyntax",
"name": "bad-syntax",
"start_time": 0,
"end_time": 3,
"static": false
},
{
"fact_text": "Viewed(OutOfRange):[2.5,3.0]",
"name": "out-of-range",
"start_time": 0,
"end_time": 3,
"static": false
},
{
"fact_text": "Viewed(BadStartTime)",
"name": "bad-start",
"start_time": "abc",
"end_time": 5,
"static": false
},
{
"fact_text": "Viewed(BadEndTime)",
"name": "bad-end",
"start_time": 0,
"end_time": "xyz",
"static": false
},
{
"fact_text": "Viewed(BadStaticValue)",
"name": "bad-static",
"start_time": 0,
"end_time": 5,
"static": "invalid"
},
{
"fact_text": "Viewed(EmptyOptionals)"
}
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cases where bounds are specified. Edge cases where bounds are wrong - non-numeric or outside [0,1] or lower > upper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are a part of the fact parser updates in the PR here. The load_from_json() validates that the csv inputs are integer values - see my response to the comment you left above.

6 changes: 6 additions & 0 deletions tests/api_tests/test_files/example_facts_no_headers.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Viewed(Alice),fact-alice,0,5,False
Viewed(Bob),fact-bob,1,5,False
"Connected(Alice,Bob):[0.7,0.9]",connection-fact,0,10,True
,empty-fact,0,5,False
InvalidNoParens,bad-fact,0,5,True
Viewed(Charlie),good-fact,bad-time,5,False
Loading