new benchmarks by navicore · Pull Request #185 · navicore/patch-seq

navicore · 2026-01-05T01:57:08Z

No description provided.

claude · 2026-01-05T02:01:00Z

PR Review: New Benchmarks

Summary

This PR adds a comprehensive compute benchmark suite alongside the existing concurrency benchmarks, comparing Seq performance against Rust and Go. Overall, this is a high-quality addition that significantly expands the project's benchmarking capabilities.

Code Quality & Best Practices ✅

Strengths:

Excellent organization: Clear separation between concurrency and compute benchmarks
Consistent implementation: All three languages (Seq, Rust, Go) implement identical algorithms
Well-documented: README updates are thorough and explain benchmark purposes clearly
Good shell scripting: run.sh properly uses set -e, handles colors, checks dependencies
Proper optimization flags: Rust uses -O, Go uses default optimizations
Validation built-in: All benchmarks verify expected results and exit with proper codes

Seq Code Quality:

fib.seq: Clean recursive implementation with proper base cases
sum_squares.seq: Good use of quotations with while combinator
primes.seq: Well-structured nested loops with clear stack effect comments

Potential Issues 🔍

1. Integer Overflow Risk (Low Severity)

In sum_squares benchmarks, the result is 333,333,833,333,500,000 which is safely within i64 range, but if someone increases the limit beyond ~3 million, overflow will occur silently. Consider adding a comment about the safe upper limit.

2. Shell Script Cleanup Issue

benchmarks/compute/run.sh:78-80 - Cleanup only removes specific binary names. If a benchmark is added/renamed, old binaries won't be cleaned. Consider using wildcards instead.

3. LATEST_RUN.txt Update

Ensure CI actually ran all benchmarks including the new compute ones.

Performance Considerations 🚀

Good:

Benchmark selection is appropriate (fib tests recursion, sum_squares tests loops, primes tests nested loops)
Rust and Go use release/optimized builds
Hyperfine integration with proper warmup and min-runs

Considerations:

The naive recursive fibonacci is exponential O(2^n). This is intentional for testing recursion overhead, but a comment clarifying this choice vs memoized/iterative would be helpful

Security Concerns 🔒

No significant security issues. Good practices observed:

All shell variables are properly quoted
mktemp is used correctly, and files are cleaned up
Fixed benchmark parameters (no user input)
Proper exit code validation
No use of eval or unquoted variables

Test Coverage 📊

Strengths:

All benchmarks have built-in validation (exit code 0 on success, 1 on failure)
Expected values are documented and checked
Three implementations per benchmark ensure correctness

Gaps:

No explicit test suite for the benchmarks themselves
Consider adding a test that verifies all three implementations produce identical results

Documentation 📚

Excellent:

README updates are thorough and well-structured
New benchmarks/compute/README.md clearly explains each benchmark
Sample results table format is clear
Usage examples are helpful

Minor suggestions:

The TBD placeholders in compute/README.md should be filled in before merge

Recommendations 📝

Must Fix:

Fill in sample results in benchmarks/compute/README.md (currently TBD)
Verify CI runs compute benchmarks and updates LATEST_RUN.txt accordingly

Should Fix:

Add comment about overflow risk in sum_squares implementations
Change cleanup in compute/run.sh to use wildcards
Consider adding a correctness test script

Nice to Have:

Add comment explaining why naive recursive fib is chosen over iterative
Document the theoretical performance expectations (interpreter overhead range)
Consider adding a benchmark for string manipulation or data structure operations

Verdict ✨

Recommendation: APPROVE with minor suggestions

This is a well-crafted PR that significantly enhances the benchmark suite. The code quality is high, the implementations are correct, and the documentation is excellent. The identified issues are minor and don't block merging.

Impact:

Provides valuable insight into Seq's interpreter overhead
Demonstrates that ~15-30x slowdown vs native code is reasonable for an interpreter
Complements existing concurrency benchmarks well
Sets good foundation for future benchmark additions

Great work! The project now has both concurrency and compute benchmarks for comprehensive performance evaluation.

Changes Made Must Fix (Done) 1. Filled in sample results in benchmarks/compute/README.md - Added actual benchmark results from MacBook Pro M-series Should Fix (Done) 2. Added overflow risk comments to all three sum_squares implementations: - sum_squares.seq - sum_squares.rs - sum_squares.go 3. Changed cleanup to use wildcards in benchmarks/compute/run.sh: # Before rm -f fib_seq fib_rust fib_go rm -f sum_squares_seq sum_squares_rust sum_squares_go rm -f primes_seq primes_rust primes_go # After rm -f *_seq *_rust *_go Nice to Have (Done) 4. Added explanation for naive recursive fib in benchmarks/compute/README.md: Why naive recursion? The exponential O(2^n) algorithm is intentional - it stress-tests function call overhead and stack operations, which are key performance indicators for interpreted languages. 5. Added performance expectations section explaining typical interpreter overhead ranges (10-20x good, 20-50x expected, >50x investigate) All CI checks pass.

claude · 2026-01-05T02:13:48Z

Code Review - PR #185: New Benchmarks

This PR adds comprehensive compute benchmarks and Rust comparison benchmarks. Overall, this is excellent work with high-quality implementations. Here's my detailed review:

✅ Strengths

1. Well-Structured Additions

Clear separation between concurrency benchmarks and compute benchmarks
Excellent documentation in both main and compute README files
Consistent implementation patterns across Seq, Rust, and Go versions

2. Code Quality

Seq implementations: Clean, idiomatic stack-based code with correct tail recursion
- fib.seq: Proper recursive structure
- sum_squares.seq: Good use of while loop pattern
- primes.seq: Correctly implements trial division with tail-recursive helpers
Rust implementations: Well-written with proper optimization flags (-O)
Go implementations: Clean and equivalent to other implementations
All implementations use equivalent algorithms (no unfair optimizations)

3. Testing & Validation

Each benchmark validates its output against expected values
Exit codes properly indicate success/failure
Expected values are documented in comments

4. Documentation Excellence

benchmarks/README.md: Comprehensive guide with clear categories, usage instructions, and interpretation guidelines
benchmarks/compute/README.md: Excellent explanations of why each benchmark matters
Good rationale for naive Fibonacci (testing function call overhead vs. iterative)

🔍 Issues & Recommendations

Critical: Missing Justfile Integration

The new compute benchmarks are not integrated into the justfile. Users won't discover them easily.

Recommendation: Add these recipes to justfile:

# Run compute benchmarks (Seq vs Rust vs Go)
bench-compute: build
    @echo "Running compute benchmarks..."
    cd benchmarks/compute && ./run.sh

# Run individual compute benchmarks
bench-fib: build
    @echo "Running Fibonacci benchmark..."
    cd benchmarks && ./run.sh fib

bench-sum-squares: build
    @echo "Running sum of squares benchmark..."
    cd benchmarks && ./run.sh sum_squares

bench-primes: build
    @echo "Running prime counting benchmark..."
    cd benchmarks && ./run.sh primes

Also update the main bench recipe description to mention both categories:

# Run all benchmarks (concurrency + compute)
bench: build
    @echo "Running all benchmarks..."
    cd benchmarks && ./run.sh

Minor: Shell Script Robustness

File: benchmarks/compute/run.sh

The script is good but could benefit from:

Variable quoting: While $SEQC is set and used correctly, the pattern is inconsistent
Error handling: The script uses set -e which is good, but individual command failures in run_benchmark could be more explicit

Not blocking, but consider for future iterations.

Minor: Documentation Clarity

File: benchmarks/run.sh (lines 8-9)

The usage comment mentions ./run.sh compute and ./run.sh concurrency, but looking at the implementation (truncated in diff), verify these commands actually work. If not, update the usage docs.

Minor: Benchmark Freshness Check

The check-bench-freshness in the justfile checks benchmarks/LATEST_RUN.txt. The PR updates this file, which is correct, but consider:

Does the CI run benchmarks automatically on PRs?
Should compute benchmarks also update this timestamp?

Not blocking, but worth clarifying in a comment.

🔒 Security

✅ No security concerns. The code:

Uses safe arithmetic (proper bounds checking)
No file I/O or network operations
No unsafe code blocks in Rust implementations
Proper memory management in all implementations

⚡ Performance Considerations

Good Design Choices

Overflow awareness: Sum of squares correctly notes the 3M limit for i64
Algorithm equivalence: All three languages use the same algorithms (fair comparison)
Compiler flags: Rust uses -O, Go uses default optimizations

Observation: Integer Overflow Risk

File: benchmarks/compute/sum_squares.seq:24

The current limit of 1M is safe, but the comment warns about 3M+. Consider adding runtime overflow detection in future iterations, though this is not blocking for this PR since the benchmarks use safe values.

🧪 Test Coverage

Excellent

All benchmarks have expected output validation
Exit codes properly indicate pass/fail
Consistent test approach across all implementations

Suggestion: Add CI Integration Test

Currently, the benchmarks aren't run in CI (only freshness is checked). Consider adding a quick smoke test:

# In .github/workflows/ci-linux.yml
- name: Verify benchmarks compile and run
  run: |
    just bench-fib
    just bench-sum-squares  
    just bench-primes

This ensures benchmarks don't break without requiring full benchmark runs on every PR.

📊 Code Quality Summary

Category	Rating	Notes
Code Quality	⭐⭐⭐⭐⭐	Clean, idiomatic implementations
Documentation	⭐⭐⭐⭐⭐	Excellent READMEs with clear guidance
Testing	⭐⭐⭐⭐	Good validation, could add CI smoke tests
Performance	⭐⭐⭐⭐⭐	Fair comparisons, appropriate algorithms
Integration	⭐⭐⭐	Missing justfile recipes

🎯 Recommendations Before Merge

Required: Add compute benchmark recipes to justfile
Recommended: Verify ./run.sh compute and ./run.sh concurrency work as documented
Optional: Consider adding CI smoke test for benchmark compilation

🎉 Conclusion

This is high-quality work that significantly expands the benchmark suite. The compute benchmarks provide valuable insight into interpreter overhead, and the Rust comparisons offer an interesting perspective on concurrency models.

The implementations are correct, well-documented, and fair. The only blocking issue is the missing justfile integration, which is a quick fix.

Recommendation: Approve after adding justfile recipes.

Great work! 🚀

claude · 2026-01-05T02:26:51Z

PR Review - New Benchmarks Suite

Summary

This PR adds a comprehensive compute benchmark suite comparing Seq, Rust, and Go performance on pure computation tasks (fibonacci, sum of squares, prime counting), plus Rust implementations for existing concurrency benchmarks. The PR also addresses feedback from a previous review.

Code Quality ✅

Strengths:

Excellent documentation: README files are thorough and well-structured with clear explanations of what each benchmark tests
Consistent implementation patterns: All three language implementations (Seq/Rust/Go) follow equivalent algorithms, making comparisons fair
Good code organization: Compute benchmarks properly separated into their own directory with dedicated runner script
Clear commenting: Seq implementations include helpful stack effect comments that aid understanding
Professional shell scripting: The run.sh scripts properly handle errors (set -e), have color-coded output, and gracefully degrade when optional tools (hyperfine, jq) are unavailable

Areas for improvement:

The run.sh script at benchmarks/run.sh has grown quite large (255 lines). Consider extracting the summary table generation into a separate function or helper script for better maintainability
Minor: The .gitignore patterns could be consolidated (e.g., *_rust and *_seq cover compute benchmarks already)

Potential Bugs and Issues ✅

No critical bugs found. However, a few observations:

Integer overflow noted but not handled: All three sum_squares implementations correctly document the overflow risk for n > 3M. This is acceptable for a benchmark, but the comment is accurate and helpful.
Division by zero protection: In benchmarks/run.sh lines 73-76 and similar, the code properly checks elapsed_ms > 0 before division. Good defensive programming.
Shell script portability: Uses bash-specific features (${BASH_SOURCE[0]}, arrays). This is fine since the shebang explicitly declares bash, but worth noting for portability.
Temporary file cleanup: The json_file=$(mktemp) followed by rm -f "$json_file" pattern is correct. Consider using a trap for cleanup to handle interrupt signals:
```
trap 'rm -f "$json_file"' EXIT
```

Performance Considerations ✅

Benchmark design is sound:

Algorithm equivalence: All implementations use the same algorithms (naive recursive fib, iterative sum_squares, trial division for primes). This ensures fair comparison.
Appropriate test sizes:
- fib(40) is a good stress test for recursion without being excessive
- 1M iterations for sum_squares is safe for i64 and provides meaningful timing
- 100K limit for prime counting provides good timing separation
Compilation flags: Rust uses -O and Go uses default optimizations. Consistent with seqc's release build.
Benchmark runner: Uses hyperfine when available (5 runs, 2 warmups for concurrency; 3 runs, 1 warmup for compute). These are reasonable defaults.

Minor suggestions:

Consider documenting in the README whether these benchmarks are representative of real-world workloads or purely synthetic micro-benchmarks (they're clearly the latter, which is fine)
The Rust concurrency benchmarks use std::thread (OS threads) rather than green threads. The README correctly explains this, but consider noting that this comparison is apples-to-oranges for pingpong

Security Concerns ✅

No security issues found.

No unsafe code blocks in Rust implementations
Shell scripts properly quote paths and variables
No credential or secret handling
Build commands use explicit flags without shell injection risks
The rm -f *_seq *_rust *_go cleanup uses safe wildcards within the controlled directory

Test Coverage ⚠️

Current state:

Each benchmark includes a correctness check (exit code 0 if result matches expected value)
The run.sh scripts build and run benchmarks but don't verify correctness automatically
No automated CI test that verifies benchmark correctness

Recommendations:

Add a verification mode to the run scripts that checks exit codes:

# Run in verify mode
if ./${name}_seq; then
    echo "✓ Seq result correct"
else
    echo "✗ Seq result INCORRECT" && exit 1
fi

CI integration: The PR updates LATEST_RUN.txt but doesn't show CI configuration. Consider adding a CI job that:
- Builds all benchmark variants
- Runs each once to verify correctness (exit code 0)
- Doesn't require timing thresholds (too environment-dependent)
Add edge case tests: Consider testing fib(0), fib(1), sum_squares(0), primes(2) as quick correctness checks

Additional Observations

Positive:

Addresses previous review feedback comprehensively (overflow comments, wildcards in cleanup, performance expectations)
The commit message for 26557d9 clearly documents what was changed and why
Sample results in README provide useful context for interpreting benchmark output
The distinction between concurrency and compute benchmarks is clear

Questions/Suggestions:

Compiler change: The diff shows +1/-0 in crates/compiler/src/lib.rs. What was this change? It's not visible in the provided diff context.
Benchmark stability: Consider documenting expected variance. Micro-benchmarks can be noisy - what's an acceptable stddev?
Future work: The README mentions "adding new benchmarks". Consider adding a template file or example to make contributions easier.

Verdict

Approved with minor suggestions. This is high-quality work that significantly expands the benchmark coverage. The code is well-documented, properly structured, and implements fair comparisons. The few suggestions above are minor improvements, not blockers.

Recommended next steps:

Add verification mode to run scripts (optional but recommended)
Document the compiler change if it's related to this PR
Consider CI integration for correctness verification

Great work on the comprehensive documentation and consistent implementation across languages!

new benchmarks

de0a5f4

opt

8c28bb2

navicore merged commit 58bb5bd into main Jan 5, 2026
2 checks passed

navicore deleted the i-179 branch January 5, 2026 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new benchmarks#185

new benchmarks#185
navicore merged 3 commits intomainfrom
i-179

navicore commented Jan 5, 2026

Uh oh!

claude Bot commented Jan 5, 2026

Uh oh!

claude Bot commented Jan 5, 2026

Uh oh!

claude Bot commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

navicore commented Jan 5, 2026

Uh oh!

claude Bot commented Jan 5, 2026

PR Review: New Benchmarks

Summary

Code Quality & Best Practices ✅

Potential Issues 🔍

1. Integer Overflow Risk (Low Severity)

2. Shell Script Cleanup Issue

3. LATEST_RUN.txt Update

Performance Considerations 🚀

Security Concerns 🔒

Test Coverage 📊

Documentation 📚

Recommendations 📝

Must Fix:

Should Fix:

Nice to Have:

Verdict ✨

Uh oh!

claude Bot commented Jan 5, 2026

Code Review - PR #185: New Benchmarks

✅ Strengths

1. Well-Structured Additions

2. Code Quality

3. Testing & Validation

4. Documentation Excellence

🔍 Issues & Recommendations

Critical: Missing Justfile Integration

Minor: Shell Script Robustness

Minor: Documentation Clarity

Minor: Benchmark Freshness Check

🔒 Security

⚡ Performance Considerations

Good Design Choices

Observation: Integer Overflow Risk

🧪 Test Coverage

Excellent

Suggestion: Add CI Integration Test

📊 Code Quality Summary

🎯 Recommendations Before Merge

🎉 Conclusion

Uh oh!

claude Bot commented Jan 5, 2026

PR Review - New Benchmarks Suite

Summary

Code Quality ✅

Potential Bugs and Issues ✅

Performance Considerations ✅

Security Concerns ✅

Test Coverage ⚠️

Additional Observations

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant