feat: batch operations (toArray/fromArray/putAll/getAll) and atomic increment#44
feat: batch operations (toArray/fromArray/putAll/getAll) and atomic increment#44
Conversation
…ement Implements issue #43 features 1 and 2: Batch operations: - toArray(): convert Judy array to native PHP array - fromArray(int $type, array $data): static factory from PHP array - putAll(array $data): bulk-insert into existing Judy array - getAll(array $keys): retrieve multiple values at once Atomic increment: - increment(mixed $key, int $amount = 1): single-traversal counter update for INT_TO_INT and STRING_TO_INT types All methods reuse existing helpers (judy_build_data_array, judy_create_result, judy_object_write_dimension_helper) and require no changes to judy_handlers.c, judy_iterator.c, or judy_arrayaccess.c. 14 new tests (99 total, all passing).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Test Results
Benchmark Summary (Judy vs PHP Array)Ratio = Judy / Array. Bold = Judy wins (≤0.95x). Plain = Array is faster/smaller. Time (Write / Read) — Linux
Memory — Linux
Time (Write / Read) — Windows
Memory — Windows
Raw benchmark dataWrite Time — Linux
Read Time — Linux
Memory — Linux
Write Time — Windows
Read Time — Windows
Memory — Windows
Batch & Set Operations BenchmarksBenchmarks for Batch Operations — PHP 8.5 (Linux)Set Operations — PHP 8.5 (Linux) |
There was a problem hiding this comment.
Code Review
This pull request introduces valuable batch operations (toArray, fromArray, putAll, getAll) and an atomic increment method, enhancing performance and usability. However, a critical security vulnerability was identified in the Judy::increment() method due to a missing length check for string keys. This can lead to stack buffer overflows in other methods, potentially enabling arbitrary code execution or denial of service. Additionally, there are suggestions to improve maintainability by reducing code duplication and to correct documentation discrepancies regarding the increment method's performance.
Add examples/judy-bench-batch-operations.php comparing: - Bulk add: putAll()/fromArray() vs individual assignment vs PHP array - Bulk get: getAll() vs individual access vs PHP array - Conversion: toArray() vs manual foreach - Increment: increment() vs manual read-modify-write vs PHP array Key findings (100K elements, PHP 8.3): - getAll() is 1.9x faster than individual Judy lookups (INT_TO_INT) - toArray() is 2.8-3.1x faster than manual foreach - increment() is 1.3-1.6x faster than manual $j[$k]+1 - fromArray() is 1.3x faster than individual inserts (INT_TO_INT) Add Tables 4-7 to BENCHMARK.md with results.
Restructure judy-bench-batch-operations.php so each type section (INT_TO_INT, STRING_TO_INT) runs all 4 benchmark categories together: 1. Bulk Add (PHP array vs individual Judy vs putAll vs fromArray) 2. Bulk Get (PHP array vs individual Judy vs getAll) 3. Conversion (toArray vs manual foreach) 4. Increment (PHP array vs manual Judy vs increment) Every method now has a consistent 3-way comparison in context.
- Security: add string key length validation in increment() before JSLG/JSLI calls to prevent potential buffer overflow - Refactor: extract judy_populate_from_array() shared helper for fromArray() and putAll() to eliminate code duplication - Refactor: restructure getAll() to group by is_integer_keyed vs is_string_keyed, reducing redundant key retrieval code - Docs: correct "single-traversal" claim for increment() — INT_TO_INT uses single traversal (JLI), STRING_TO_INT uses two (JSLG+JSLI) - CI: run batch ops and set ops benchmarks, include results in PR report
Summary
Implements issue #43 features 1 and 2 (batch operations and atomic increment). Features 3 (JudyHS) and 4 (INT_TO_PACKED) are deferred due to high complexity (105+ type-switch sites each).
New Methods
Batch operations:
toArray(): array— convert Judy array to native PHP array (reusesjudy_build_data_array)static fromArray(int $type, array $data): Judy— static factory from PHP arrayputAll(array $data): void— bulk-insert into existing Judy arraygetAll(array $keys): array— retrieve multiple values at once (missing keys return null)Atomic increment:
increment(mixed $key, int $amount = 1): int— efficient counter update for INT_TO_INT (single-traversal via JLI) and STRING_TO_INT (two traversals: JSLG for counter tracking + JSLI). Throws exception on unsupported types.Scope
php_judy.c— no modifications tojudy_handlers.c,judy_iterator.c, orjudy_arrayaccess.cjudy_build_data_array,judy_create_result,judy_object_write_dimension_helperjudy_populate_from_array()helper extracted forfromArray/putAllto reduce duplicationpackage.xml,README.md, andBENCHMARK.mdBenchmark Results (100K elements)
fromArray()vs individualgetAll()vs individualtoArray()vs manual foreachincrement()vs manualKey insights:
toArray()provides the biggest speedup (2.8-3.1x) by using native C iteration, bypassing PHP Iterator overheadgetAll()is 1.9x faster for integer keys by avoiding per-element ArrayAccess overheadincrement()achieves true single-traversal for INT_TO_INT via JLI; STRING_TO_INT uses two traversals (JSLG+JSLI) for counter tracking but still 1.3x faster than PHP-level read-modify-writefromArray()provides meaningful speedup for integer keys; string key performance is dominated by trie traversalFull benchmark details in BENCHMARK.md Tables 4-7. CI now runs batch and set operations benchmarks automatically.
Code Review Fixes (Gemini)
ZSTR_LEN >= PHP_JUDY_MAX_LENGTH) inincrement()before JSLG/JSLI calls to prevent buffer overflowjudy_populate_from_array()static helper shared byfromArray()andputAll()getAll()to group byis_integer_keyed/is_string_keyedfirst, reducing redundant key retrieval codeTest plan
pecl package package.xmlproduces valid.tgz