From e11aeaebe1fa9a653e2083918ffe7f293dbe144a Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 13 Apr 2026 08:43:17 +0200 Subject: [PATCH 1/4] docs: add CPAN compatibility reports for Scalar::Util and Memoize Detailed investigation of both modules against their CPAN test suites: - Scalar::Util (Scalar-List-Utils 1.70): 606/816 subtests pass (74.3%). All 14 exported functions implemented; isvstring is a stub, tainted always returns false (no taint mode), readonly partial. Most failures are in List::Util/Sub::Util functions. - Memoize 1.17: Core functionality fully working (7/16 test files pass). All failures are peripheral: deep recursion stack overflow, threads not available, DB_File shim bug. Good bundling candidate. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dev/cpan-reports/Memoize.md | 172 ++++++++++++++++++++++++++++++++ dev/cpan-reports/Scalar-Util.md | 136 +++++++++++++++++++++++++ 2 files changed, 308 insertions(+) create mode 100644 dev/cpan-reports/Memoize.md create mode 100644 dev/cpan-reports/Scalar-Util.md diff --git a/dev/cpan-reports/Memoize.md b/dev/cpan-reports/Memoize.md new file mode 100644 index 000000000..3b1da44e5 --- /dev/null +++ b/dev/cpan-reports/Memoize.md @@ -0,0 +1,172 @@ +# Memoize Compatibility Report for PerlOnJava + +> Investigated 2026-04-13 against Memoize 1.17 (CPAN, ARISTOTLE) with PerlOnJava + +## Summary + +| Metric | Value | +|--------|-------| +| **CPAN distribution** | Memoize-1.17 (ARISTOTLE) | +| **Bundled in PerlOnJava** | No (installed via jcpan for testing) | +| **Test files** | 16 (4 skipped) | +| **Subtests run** | 206 | +| **Subtests explicitly failed** | 0 | +| **Test programs crashed** | 5 / 12 that ran | +| **Test files passing** | 7 / 16 | +| **Overall status** | FAIL (but close to passing for core functionality) | + +## Architecture + +Memoize is **100% pure Perl** -- no XS required. It caches function return values +by wrapping functions via typeglob manipulation. + +Source is available in the Perl 5 checkout at `perl5/cpan/Memoize/` but is +**not bundled** in PerlOnJava's JAR. + +## Dependency Analysis + +### Direct Dependencies (all satisfied) + +| Dependency | Available | Location | +|-----------|-----------|----------| +| `Carp` | Yes | `src/main/perl/lib/Carp.pm` | +| `Scalar::Util` (>=1.11) | Yes (v1.63) | Java backend: `ScalarUtil.java` | +| `Exporter` | Yes | `src/main/perl/lib/Exporter.pm` | +| `warnings` | Yes | Java backend: `Warnings.java` | + +### Sub-module Dependencies + +| Sub-module | Extra Dependency | Available | +|-----------|------------------|-----------| +| `Memoize::Expire` | `Time::HiRes` | Yes (Java impl) | +| `Memoize::Storable` | `Storable` | Yes (with stub locking) | +| `Memoize::AnyDBM_File` | `AnyDBM_File` | No | +| `Memoize::NDBM_File` | `NDBM_File` | No | +| `Memoize::SDBM_File` | `SDBM_File` | No | + +### Perl Language Features Used + +| Feature | PerlOnJava Status | +|---------|------------------| +| `*{$name} = $wrapper` (typeglob CODE assign) | Implemented | +| `*{$name}{CODE}` (extract CODE slot) | Implemented | +| `Scalar::Util::set_prototype` | Implemented | +| `caller`, `wantarray` | Implemented | +| `no strict` + symbolic refs | Implemented | +| `prototype()` builtin | Implemented | +| `tied %$hash` | Implemented | +| `warnings::enabled('all')` | Implemented | + +## Test Results by File + +### Passing (7 files) + +| Test File | Subtests | What it tests | +|-----------|----------|---------------| +| t/basic.t | ok | Core memoize/unmemoize, INSTALL, NORMALIZER, prototype preservation | +| t/cache.t | ok | SCALAR_CACHE, LIST_CACHE with MEMORY/FAULT/MERGE | +| t/expmod.t | ok | Memoize::Expire module | +| t/expmod_t.t | ok | Memoize::Expire with timed expiration | +| t/flush.t | ok | flush_cache() | +| t/normalize.t | ok | NORMALIZER option | +| t/unmemoize.t | ok | unmemoize() | + +### Failing (5 files) + +| Test File | Ran/Planned | Root Cause | +|-----------|-------------|------------| +| t/correctness.t | 16/17 | **StackOverflowError** -- deep recursion test (~100k calls) exceeds JVM stack | +| t/threadsafe.t | 1/8 | `threads` module not available (PerlOnJava limitation) | +| t/tie.t | 0/7 | **StackOverflowError** in `DB_File.pm` line 238/240 (infinite recursion) | +| t/tie_db.t | 0/7 | **StackOverflowError** in `DB_File.pm` (same as tie.t) | +| t/tie_storable.t | 5/6 | 1 subtest not reached (likely `Storable` lock_store stub issue) | + +### Skipped (4 files) + +| Test File | Reason | +|-----------|--------| +| t/tie_gdbm.t | Could not load `GDBM_File` | +| t/tie_ndbm.t | Could not load `Memoize::NDBM_File` | +| t/tie_odbm.t | Could not load `ODBM_File` | +| t/tie_sdbm.t | Could not load `SDBM_File` | + +## Failure Analysis + +### 1. StackOverflowError in correctness.t (line 93) + +The test probes for the Perl "Deep recursion" warning threshold (~100 recursive calls +in standard Perl) and then verifies that Memoize's wrapper doesn't add extra stack +frames that would trigger the warning. The probe function recurses up to 100,000 times, +which overflows the JVM default stack. + +**Workaround**: Run with `JPERL_OPTS="-Xss256m"` to increase JVM stack size. +This is the same workaround used for `re/pat.t` and other recursive tests. + +### 2. threads not available (threadsafe.t) + +PerlOnJava does not implement Perl-style `threads`. This is a known systemic limitation. +The `CLONE` method in Memoize.pm is defined but harmless. + +### 3. DB_File infinite recursion (tie.t, tie_db.t) + +`DB_File.pm` has an infinite recursion at lines 238-240. This is a bug in the +`DB_File` shim, not in Memoize itself. These tests tie Memoize's cache to a DB_File +database. + +### 4. tie_storable.t partial failure + +5 of 6 subtests pass. The final subtest likely involves `lock_store`/`lock_retrieve` +which are stub implementations in PerlOnJava's Storable. + +## Core Functionality Assessment + +The **core Memoize functionality works correctly**: + +- `memoize()` -- caching function return values +- `unmemoize()` -- restoring original functions +- `flush_cache()` -- clearing caches +- `NORMALIZER` -- custom key normalization +- `INSTALL` -- installing under different names +- `SCALAR_CACHE` / `LIST_CACHE` -- cache configuration +- `MERGE` -- merging scalar/list caches +- `Memoize::Expire` -- time-based expiration +- Prototype preservation via `set_prototype` +- Context propagation (`wantarray`) + +All failures are in **peripheral features** (threads, DB backends, deep recursion edge case). + +## Recommendations + +### Bundling Memoize + +Memoize is an excellent candidate for bundling. All core dependencies are satisfied. + +To bundle, add to `dev/import-perl5/config.yaml`: +```yaml + # Memoize - Function return value caching (pure Perl) + - source: perl5/cpan/Memoize/Memoize.pm + target: src/main/perl/lib/Memoize.pm + + - source: perl5/cpan/Memoize/Memoize + target: src/main/perl/lib/Memoize + type: directory +``` + +### Improving Test Results + +1. **correctness.t**: Would pass with `JPERL_OPTS="-Xss256m"` (add to perl_test_runner config) +2. **tie.t / tie_db.t**: Fix DB_File.pm infinite recursion at line 238-240 +3. **tie_storable.t**: Investigate the 6th subtest failure +4. **threadsafe.t**: Will always skip/fail (no threads) -- acceptable + +### Expected Results After Fixes + +| Test | Current | After Fix | +|------|---------|-----------| +| t/correctness.t | FAIL (stack) | PASS (with -Xss256m) | +| t/threadsafe.t | FAIL (threads) | SKIP (acceptable) | +| t/tie.t | FAIL (DB_File) | PASS (after DB_File fix) | +| t/tie_db.t | FAIL (DB_File) | PASS (after DB_File fix) | +| t/tie_storable.t | FAIL (1/6) | Likely PASS | + +With these fixes, Memoize would go from 7/16 to 11/16 passing (4 skipped, 1 threads-only). diff --git a/dev/cpan-reports/Scalar-Util.md b/dev/cpan-reports/Scalar-Util.md new file mode 100644 index 000000000..7a96b8eec --- /dev/null +++ b/dev/cpan-reports/Scalar-Util.md @@ -0,0 +1,136 @@ +# Scalar::Util Compatibility Report for PerlOnJava + +> Investigated 2026-04-13 against Scalar-List-Utils 1.70 (CPAN) with PerlOnJava bundled Scalar::Util v1.63 + +## Summary + +| Metric | Value | +|--------|-------| +| **CPAN distribution** | Scalar-List-Utils-1.70 (PEVANS) | +| **Bundled version** | 1.63 (Java backend) | +| **Test files** | 38 | +| **Subtests run** | 816 | +| **Subtests passed** | 606 (74.3%) | +| **Subtests failed** | 210 | +| **Test files passing** | 10 / 38 | +| **Overall status** | FAIL | + +## Architecture + +Scalar::Util is implemented as a split Perl/Java module: + +- **Perl wrapper**: `src/main/perl/lib/Scalar/Util.pm` -- thin shim, uses `XSLoader::load` +- **Java backend**: `src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java` (380 lines) + +The CPAN distribution includes **List::Util** and **Sub::Util** alongside Scalar::Util. +Many test failures below are in List::Util or Sub::Util functions, not Scalar::Util itself. + +## Function Implementation Status + +All 14 standard Scalar::Util EXPORT_OK functions are declared and registered. + +| Function | Status | Notes | +|----------|--------|-------| +| `blessed` | Full | Handles blessed refs and `qr//` (implicit "Regexp" blessing) | +| `refaddr` | Full | Uses `System.identityHashCode()` (JVM -- not real memory address) | +| `reftype` | Full | Handles SCALAR, REF, ARRAY, HASH, CODE, GLOB, FORMAT, REGEXP, VSTRING | +| `weaken` | Full | Cooperative reference counting on JVM GC. Well tested. | +| `unweaken` | Full | Restores strong reference | +| `isweak` | Full | Delegates to `WeakRefRegistry.isweak()` | +| `dualvar` | Full | Creates `DualVar` record with separate numeric/string values | +| `isdual` | Full | Checks for DUALVAR type; handles READONLY_SCALAR unwrapping | +| `isvstring` | **Stub** | Always returns false. VSTRING type (ID 5) exists in runtime but is never checked. | +| `looks_like_number` | Full | Delegates to `ScalarUtils.looksLikeNumber()` with fast/slow path | +| `openhandle` | Full | Checks GLOB/GLOBREFERENCE; verifies IO handle not closed; handles `*{}` overload | +| `readonly` | **Partial** | Only detects compile-time constants (`RuntimeScalarReadOnly`). Does NOT detect runtime `Internals::SvREADONLY`. | +| `set_prototype` | Full | Sets/clears prototype on CODE refs | +| `tainted` | **Stub** | Always returns false. Taint mode is not implemented in PerlOnJava. | + +## Test Results by File + +### Passing (10 files) + +| Test File | Subtests | Notes | +|-----------|----------|-------| +| t/any-all.t | ok | List::Util any/all/none/notall | +| t/blessed.t | ok | Scalar::Util blessed | +| t/max.t | ok | List::Util max | +| t/maxstr.t | ok | List::Util maxstr | +| t/minstr.t | ok | List::Util minstr | +| t/prototype.t | ok | Sub::Util set_prototype | +| t/readonly.t | ok | Scalar::Util readonly | +| t/rt-96343.t | ok | Regression test | +| t/stack-corruption.t | ok | Stack safety | +| t/sum0.t | ok | List::Util sum0 | + +### Failing (28 files) + +| Test File | Ran | Failed | Root Cause | +|-----------|-----|--------|------------| +| t/00version.t | 4 | 1 | Version mismatch: LU::XS reports 1.70 vs bundled LU 1.63 | +| t/dualvar.t | 41 | 3 | `dualvar` increment and UTF-8 handling issues | +| t/exotic_names.t | 238/1560 | 120 | Sub renaming with control characters (`set_subname`); early abort | +| t/first.t | 24 | 6 | `first {}` block not called with `$_` properly | +| t/getmagic-once.t | 6 | 6 | Magic/tie get-magic not invoked correctly | +| t/head-tail.t | 42 | 2 | `head`/`tail` edge cases | +| t/isvstring.t | 3 | 1 | `isvstring` always returns false (stub) | +| t/lln.t | 19 | 1 | `looks_like_number` edge case | +| t/mesh.t | 0/8 | 0 | Crash before any tests run (mesh/zip not implemented) | +| t/min.t | 22 | 1 | `min` edge case | +| t/openhan.t | 21 | 2 | `openhandle` edge cases | +| t/pair.t | 19/29 | 3 | `pairmap`/`pairfirst` issues; early abort | +| t/product.t | 27 | 3 | `product` numeric edge cases | +| t/reduce.t | 33 | 7 | `reduce` block context / prototype issues | +| t/reductions.t | 7 | 1 | `reductions` edge case | +| t/refaddr.t | 32 | 4 | `refaddr` with overloaded/tied objects | +| t/reftype.t | 32 | 3 | `reftype` edge cases (FORMAT, LVALUE) | +| t/sample.t | 9 | 3 | `sample` not implemented or buggy | +| t/scalarutil-proto.t | 12/14 | 1 | Prototype check issues | +| t/shuffle.t | 7 | 1 | `shuffle` edge case | +| t/subname.t | 21 | 7 | `set_subname`/`subname` not fully implemented | +| t/sum.t | 18 | 3 | `sum` numeric edge cases | +| t/tainted.t | 5 | 3 | Taint mode not implemented | +| t/undefined-block.t | 18 | 18 | Undefined code block handling | +| t/uniq.t | 31 | 6 | `uniq`/`uniqstr` edge cases | +| t/uniqnum.t | 23 | 2 | `uniqnum` numeric edge cases | +| t/weak.t | 28 | 2 | Weak reference edge cases | +| t/zip.t | 0/8 | 0 | Crash before any tests run (zip not implemented) | + +## Key Failure Categories + +### 1. Missing/incomplete List::Util functions +`mesh`, `zip`, `sample` are not implemented or crash. `first`, `reduce`, `reductions` have block-calling issues. These are **List::Util** problems, not Scalar::Util. + +### 2. Sub::Util `set_subname`/`subname` (exotic_names.t, subname.t) +Sub renaming with exotic characters (control chars, UTF-8) does not work. +`set_subname` appears non-functional -- renamed closures still report `__ANON__`. + +### 3. Stubs returning incorrect values +- `isvstring`: always returns false (trivial fix available) +- `tainted`: always returns false (systemic: no taint mode) + +### 4. Magic/tie get-magic (getmagic-once.t) +All 6 tests fail -- get-magic is not invoked the correct number of times. + +### 5. Undefined code block handling (undefined-block.t) +All 18 tests fail -- functions don't properly die/warn when passed undefined blocks. + +### 6. Version mismatch (00version.t) +Bundled version is 1.63 but CPAN test suite is 1.70. List::Util::XS reports 1.70. + +## Existing Test Coverage in PerlOnJava + +| Area | Test Files | Coverage | +|------|-----------|----------| +| `weaken`/`isweak`/`unweaken` | 4 files (~634 lines) | Excellent | +| `blessed` | Incidental in subroutine.t | Minimal | +| All other functions | None | No dedicated tests | + +## Recommendations + +1. **Fix `isvstring`** -- trivial: check `s.type == VSTRING` instead of always returning false +2. **Fix version mismatch** -- update bundled version to 1.70 or sync XS version reporting +3. **Implement `mesh`/`zip`** -- these List::Util functions crash immediately +4. **Fix `first`/`reduce` block calling** -- `$_` not set correctly in the block +5. **Improve `set_subname`** -- critical for Moose/Moo ecosystem +6. **Add unit tests** for untested Scalar::Util functions (blessed, refaddr, reftype, dualvar, etc.) From 5cd3272cf38afca85d709e280e4c9c80421392f9 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 13 Apr 2026 09:16:18 +0200 Subject: [PATCH 2/4] feat: fix isvstring, implement List::Util zip/mesh, bundle Memoize Scalar::Util: - Fix isvstring() to actually check VSTRING type instead of always returning false. The VSTRING type infrastructure was already complete; only the isvstring() function was missing the check. List::Util: - Implement zip, zip_shortest, zip_longest (returns list of arrayrefs) - Implement mesh, mesh_shortest, mesh_longest (returns flat interleaved list) - These 6 functions were declared in @EXPORT_OK but had no Java backend Memoize: - Bundle Memoize 1.17 from perl5/cpan/Memoize/ into the JAR - Add import-perl5 config entries for Memoize.pm and Memoize/ submodules - Pure Perl module, all dependencies already satisfied Plan documents: - Add dev/modules/scalar_util.md with full bug analysis and fix plan - Add dev/modules/memoize.md with dependency analysis and bundling plan Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dev/import-perl5/config.yaml | 8 + dev/modules/memoize.md | 207 ++++ dev/modules/scalar_util.md | 186 ++++ .../org/perlonjava/core/Configuration.java | 6 +- .../runtime/perlmodule/ListUtil.java | 118 +++ .../runtime/perlmodule/ScalarUtil.java | 5 +- src/main/perl/lib/Memoize.pm | 958 ++++++++++++++++++ src/main/perl/lib/Memoize/AnyDBM_File.pm | 37 + src/main/perl/lib/Memoize/Expire.pm | 352 +++++++ src/main/perl/lib/Memoize/NDBM_File.pm | 39 + src/main/perl/lib/Memoize/SDBM_File.pm | 27 + src/main/perl/lib/Memoize/Storable.pm | 75 ++ 12 files changed, 2013 insertions(+), 5 deletions(-) create mode 100644 dev/modules/memoize.md create mode 100644 dev/modules/scalar_util.md create mode 100644 src/main/perl/lib/Memoize.pm create mode 100644 src/main/perl/lib/Memoize/AnyDBM_File.pm create mode 100644 src/main/perl/lib/Memoize/Expire.pm create mode 100644 src/main/perl/lib/Memoize/NDBM_File.pm create mode 100644 src/main/perl/lib/Memoize/SDBM_File.pm create mode 100644 src/main/perl/lib/Memoize/Storable.pm diff --git a/dev/import-perl5/config.yaml b/dev/import-perl5/config.yaml index 8f6a1ce02..f75849d60 100644 --- a/dev/import-perl5/config.yaml +++ b/dev/import-perl5/config.yaml @@ -42,6 +42,14 @@ imports: - source: perl5/lib/Benchmark.pm target: src/main/perl/lib/Benchmark.pm + # Memoize - Function return value caching (pure Perl, core since 5.8) + - source: perl5/cpan/Memoize/Memoize.pm + target: src/main/perl/lib/Memoize.pm + + - source: perl5/cpan/Memoize/Memoize + target: src/main/perl/lib/Memoize + type: directory + # Net::Ping - Network ping module (required by CPAN::Mirrors) - source: perl5/dist/Net-Ping/lib/Net/Ping.pm target: src/main/perl/lib/Net/Ping.pm diff --git a/dev/modules/memoize.md b/dev/modules/memoize.md new file mode 100644 index 000000000..40e57dca6 --- /dev/null +++ b/dev/modules/memoize.md @@ -0,0 +1,207 @@ +# Memoize Support Plan for PerlOnJava + +## Overview + +**Module:** Memoize 1.17 (CPAN: ARISTOTLE) +**Bundled in PerlOnJava:** No (available via `jcpan`, candidate for bundling) +**Test command:** `./jcpan -t Memoize` +**Type:** Pure Perl (no XS) + +Memoize caches function return values, speeding up expensive computations. +It is a core Perl module since Perl 5.8. All its direct dependencies are +already satisfied in PerlOnJava. + +## Current Status + +**Branch:** `docs/cpan-reports-scalar-util-memoize` + +### Results History + +| Date | Programs Failed | Subtests Failed | Total Subtests | Key Fix | +|------|----------------|-----------------|----------------|---------| +| 2026-04-13 | 5/16 (4 skipped) | 0/206 explicit | 206 | Baseline (via jcpan) | + +### Test Results Summary + +| Test File | Status | Subtests | Root Cause | +|-----------|--------|----------|------------| +| t/basic.t | PASS | ok | Core memoize/unmemoize/INSTALL/NORMALIZER | +| t/cache.t | PASS | ok | SCALAR_CACHE/LIST_CACHE with MEMORY/FAULT/MERGE | +| t/correctness.t | FAIL | 16/17 | **Bug 1: StackOverflowError** in deep recursion test | +| t/expmod.t | PASS | ok | Memoize::Expire | +| t/expmod_t.t | PASS | ok | Timed expiration | +| t/flush.t | PASS | ok | flush_cache() | +| t/normalize.t | PASS | ok | NORMALIZER option | +| t/threadsafe.t | FAIL | 1/8 | Requires `threads` module (not available) | +| t/tie.t | FAIL | 0/7 | **Bug 2: DB_File infinite recursion** | +| t/tie_db.t | FAIL | 0/7 | **Bug 2: DB_File infinite recursion** | +| t/tie_gdbm.t | SKIP | — | Could not load GDBM_File | +| t/tie_ndbm.t | SKIP | — | Could not load Memoize::NDBM_File | +| t/tie_odbm.t | SKIP | — | Could not load ODBM_File | +| t/tie_sdbm.t | SKIP | — | Could not load SDBM_File | +| t/tie_storable.t | FAIL | 5/6 | 1 subtest not reached | +| t/unmemoize.t | PASS | ok | unmemoize() | + +--- + +## Dependency Analysis + +### Direct Dependencies (all satisfied) + +| Dependency | Available | Location | +|-----------|-----------|----------| +| Carp | Yes | `src/main/perl/lib/Carp.pm` | +| Scalar::Util (>=1.11) | Yes (v1.63) | Java backend: `ScalarUtil.java` | +| Exporter | Yes | `src/main/perl/lib/Exporter.pm` | +| warnings | Yes | Java backend: `Warnings.java` | + +### Sub-module Dependencies + +| Sub-module | Dependency | Available | +|-----------|-----------|-----------| +| Memoize::Expire | Time::HiRes | Yes (Java impl) | +| Memoize::Storable | Storable (lock_store) | Yes (stub locking) | +| Memoize::AnyDBM_File | AnyDBM_File | No | +| Memoize::NDBM_File | NDBM_File | No | +| Memoize::SDBM_File | SDBM_File | No | + +### Key Language Features Used + +| Feature | Used For | PerlOnJava Status | +|---------|----------|------------------| +| `*{$name} = $wrapper` | Install memoized function | Implemented | +| `*{$name}{CODE}` | Extract CODE slot | Implemented | +| `Scalar::Util::set_prototype` | Preserve prototype | Implemented | +| `caller`, `wantarray` | Context detection | Implemented | +| `no strict` + symbolic refs | Dynamic installation | Implemented | +| `tied %$hash` | Check tied cache | Implemented | + +--- + +## Bug Details + +### Bug 1: StackOverflowError in correctness.t + +**Impact:** t/correctness.t (1/17 tests not reached) + +**Root cause:** The test at lines 90-103 probes for Perl's "Deep recursion" +warning threshold by recursing up to 100,000 times (`deep_probe()`). This +overflows the JVM default stack size. + +```perl +sub deep_probe { deep_probe() if ++$limit < 100_000 and not $fail } +sub deep_test { no warnings "recursion"; deep_test() if $limit-- > 0 } +memoize "deep_test"; +``` + +**Workaround:** Run with `JPERL_OPTS="-Xss256m"`. The `perl_test_runner.pl` +already applies this for known deeply-recursive tests. + +**Status:** Won't fix in Memoize -- needs JVM stack config + +### Bug 2: DB_File Infinite Recursion + +**Impact:** t/tie.t (7/7), t/tie_db.t (7/7) + +**Root cause:** `DB_File.pm` lines 238-240 have infinite recursion: +``` +DB_File at /Users/fglock/.perlonjava/lib/DB_File.pm line 238 +DB_File at /Users/fglock/.perlonjava/lib/DB_File.pm line 240 +``` +This is a bug in the DB_File shim, not in Memoize. These tests tie Memoize's +cache to a DB_File database. + +**Status:** Deferred (fix in DB_File shim) + +### Bug 3: threads Not Available + +**Impact:** t/threadsafe.t (7/8) + +**Root cause:** PerlOnJava does not implement Perl-style `threads`. This is +a known systemic limitation. + +**Status:** Won't fix (architectural limitation) + +### Bug 4: tie_storable.t Partial Failure + +**Impact:** t/tie_storable.t (1/6 not reached) + +**Root cause:** Likely related to `Storable::lock_store`/`lock_retrieve` +being stubs in PerlOnJava. + +**Status:** Low priority + +--- + +## Bundling Plan + +Memoize source is available in the Perl 5 checkout at `perl5/cpan/Memoize/`. + +### Step 1: Add to import-perl5 config + +Add to `dev/import-perl5/config.yaml`: +```yaml + # Memoize - Function return value caching (pure Perl, core since 5.8) + - source: perl5/cpan/Memoize/lib/Memoize.pm + target: src/main/perl/lib/Memoize.pm + + - source: perl5/cpan/Memoize/lib/Memoize + target: src/main/perl/lib/Memoize + type: directory +``` + +### Step 2: Sync +```bash +perl dev/import-perl5/sync.pl +``` + +### Step 3: Update docs +Add to `docs/reference/bundled-modules.md`. + +### Step 4: Verify +```bash +make dev +./jperl -e 'use Memoize; memoize("fib"); sub fib { return $_[0] if $_[0] < 2; fib($_[0]-1)+fib($_[0]-2) } print fib(30), "\n"' +``` + +--- + +## Fix Order (Priority) + +1. **Bundle Memoize** -- add to import config, sync, verify +2. Investigate correctness.t with `-Xss256m` -- likely just works +3. Fix DB_File shim infinite recursion -- separate issue +4. threads support -- systemic, won't fix + +## Expected Results After Bundling + +| Test | Current | Expected | +|------|---------|----------| +| t/basic.t | PASS | PASS | +| t/cache.t | PASS | PASS | +| t/correctness.t | FAIL | PASS (with -Xss256m) | +| t/expmod.t | PASS | PASS | +| t/expmod_t.t | PASS | PASS | +| t/flush.t | PASS | PASS | +| t/normalize.t | PASS | PASS | +| t/threadsafe.t | FAIL | FAIL (no threads) | +| t/tie.t | FAIL | FAIL (DB_File bug) | +| t/tie_db.t | FAIL | FAIL (DB_File bug) | +| t/tie_storable.t | FAIL | Investigate | +| t/unmemoize.t | PASS | PASS | + +**Target: 8/12 passing** (4 skipped for missing DB backends) + +## Completed Fixes + +_(none yet)_ + +## Progress Tracking + +- [ ] Bundle Memoize via import-perl5 config +- [ ] Verify core functionality works from bundled JAR +- [ ] Re-run test suite and update results + +## Related Documents + +- `dev/cpan-reports/Memoize.md` -- detailed test investigation diff --git a/dev/modules/scalar_util.md b/dev/modules/scalar_util.md new file mode 100644 index 000000000..d288a87f9 --- /dev/null +++ b/dev/modules/scalar_util.md @@ -0,0 +1,186 @@ +# Scalar-List-Utils Support Plan for PerlOnJava + +## Overview + +**Module:** Scalar-List-Utils 1.70 (CPAN: PEVANS) +**Bundled version:** 1.63 (Java backend) +**Test command:** `./jcpan -t Scalar::Util` +**Sub-modules:** Scalar::Util, List::Util, Sub::Util + +The Scalar-List-Utils distribution provides essential utility functions used by +virtually every non-trivial CPAN module. PerlOnJava implements all three +sub-modules as thin Perl wrappers backed by Java classes: + +| Sub-module | Java class | Perl stub | +|-----------|-----------|-----------| +| Scalar::Util | `ScalarUtil.java` | `src/main/perl/lib/Scalar/Util.pm` | +| List::Util | `ListUtil.java` | `src/main/perl/lib/List/Util.pm` | +| Sub::Util | `SubUtil.java` | `src/main/perl/lib/Sub/Util.pm` | + +## Current Status + +**Branch:** `docs/cpan-reports-scalar-util-memoize` + +### Results History + +| Date | Programs Failed | Subtests Failed | Total Subtests | Key Fix | +|------|----------------|-----------------|----------------|---------| +| 2026-04-13 | 28/38 | 210/816 | 816 | Baseline | + +### Test Results Summary + +| Test File | Status | Subtests | Root Cause | +|-----------|--------|----------|------------| +| t/00version.t | FAIL | 1/4 | Version mismatch: LU::XS 1.70 vs bundled 1.63 | +| t/any-all.t | PASS | ok | | +| t/blessed.t | PASS | ok | | +| t/dualvar.t | FAIL | 3/41 | dualvar increment and UTF-8 handling | +| t/exotic_names.t | FAIL | 120/238 | set_subname with control chars; planned 1560 | +| t/first.t | FAIL | 6/24 | `$_` not aliased in caller's scope for `first` | +| t/getmagic-once.t | FAIL | 6/6 | No Perl 5-style get-magic protocol | +| t/head-tail.t | FAIL | 2/42 | Edge cases | +| t/isvstring.t | FAIL | 1/3 | **Bug 1: isvstring always returns false** | +| t/lln.t | FAIL | 1/19 | looks_like_number edge case | +| t/max.t | PASS | ok | | +| t/maxstr.t | PASS | ok | | +| t/mesh.t | FAIL | 0/8 | **Bug 2: mesh/zip not implemented** | +| t/min.t | FAIL | 1/22 | min edge case | +| t/minstr.t | PASS | ok | | +| t/openhan.t | FAIL | 2/21 | openhandle edge cases | +| t/pair.t | FAIL | 3/29 | pairmap/pairfirst issues | +| t/product.t | FAIL | 3/27 | Numeric edge cases | +| t/prototype.t | PASS | ok | | +| t/readonly.t | PASS | ok | | +| t/reduce.t | FAIL | 7/33 | Block context / prototype issues | +| t/reductions.t | FAIL | 1/7 | Edge case | +| t/refaddr.t | FAIL | 4/32 | Overloaded/tied objects | +| t/reftype.t | FAIL | 3/32 | FORMAT, LVALUE edge cases | +| t/rt-96343.t | PASS | ok | | +| t/sample.t | FAIL | 3/9 | sample edge cases | +| t/scalarutil-proto.t | FAIL | 1/14 | Prototype check issues | +| t/shuffle.t | FAIL | 1/7 | Edge case | +| t/stack-corruption.t | PASS | ok | | +| t/subname.t | FAIL | 7/21 | set_subname not fully effective | +| t/sum.t | FAIL | 3/18 | Numeric edge cases | +| t/sum0.t | PASS | ok | | +| t/tainted.t | FAIL | 3/5 | No taint mode | +| t/undefined-block.t | FAIL | 18/18 | Undefined code block handling | +| t/uniq.t | FAIL | 6/31 | uniq/uniqstr edge cases | +| t/uniqnum.t | FAIL | 2/23 | uniqnum edge cases | +| t/weak.t | FAIL | 2/28 | Weak reference edge cases | +| t/zip.t | FAIL | 0/8 | **Bug 2: zip not implemented** | + +--- + +## Bug Details + +### Bug 1: `isvstring()` Always Returns False + +**Impact:** t/isvstring.t (1 failure) + +**Root cause:** `ScalarUtil.java:238-243` is a stub that always returns `false`: +```java +// Placeholder for isvstring functionality +return new RuntimeScalar(false).getList(); +``` +The VSTRING type (constant 5) already exists in the runtime and is correctly +used by `reftype()` and `Version.java`. Only `isvstring()` doesn't check for it. + +**Fix:** Check `type == VSTRING`, following the `isdual()` pattern: +```java +RuntimeScalar s = args.get(0); +if (s.type == READONLY_SCALAR) s = (RuntimeScalar) s.value; +return new RuntimeScalar(s.type == VSTRING).getList(); +``` + +**Files:** `src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java` + +### Bug 2: `mesh`/`zip` Functions Not Implemented in ListUtil.java + +**Impact:** t/mesh.t (8 tests), t/zip.t (8 tests) -- both crash immediately + +**Root cause:** The Perl stub `List/Util.pm` declares these in `@EXPORT_OK`: +```perl +zip zip_longest zip_shortest mesh mesh_longest mesh_shortest +``` +But `ListUtil.java` never registers them in `initialize()`. When called, there's +no Java method to dispatch to. + +**Fix:** Implement 6 new methods in `ListUtil.java`: +- `zip` / `zip_shortest` / `zip_longest` -- takes arrayrefs, returns list of arrayrefs +- `mesh` / `mesh_shortest` / `mesh_longest` -- takes arrayrefs, returns flat interleaved list + +Per Perl 5 docs: +- `zip` returns list of arrayrefs (one per "row"), stopping at shortest input +- `zip_longest` pads with undef to longest input +- `zip_shortest` is an alias for `zip` +- `mesh` returns flat interleaved list, stopping at shortest input +- `mesh_longest` pads with undef to longest input +- `mesh_shortest` is an alias for `mesh` + +**Files:** `src/main/java/org/perlonjava/runtime/perlmodule/ListUtil.java` + +### Bug 3: `tainted()` Always Returns False (Systemic) + +**Impact:** t/tainted.t (3 failures) + +**Root cause:** Taint mode is not implemented in PerlOnJava. `RuntimeScalar.isTainted()` +always returns `false`. This is a systemic limitation, not fixable in Scalar::Util alone. + +**Status:** Won't fix (requires taint mode implementation) + +### Bug 4: `isvstring` Returns False -- Resolved by Bug 1 Fix + +### Bug 5: `set_subname` Doesn't Work With Exotic Characters + +**Impact:** t/exotic_names.t (120 failures), t/subname.t (7 failures) + +**Root cause:** `SubUtil.set_subname()` sets `code.packageName` and `code.subName` +correctly, but `caller()` doesn't always return the renamed sub's name. The issue +is that PerlOnJava's `__ANON__` handling may override the set name for closures. + +**Status:** Deferred -- needs deep investigation of RuntimeCode caller integration + +### Bug 6: `getmagic-once` -- No Magic Get Protocol + +**Impact:** t/getmagic-once.t (6/6 failures) + +**Root cause:** PerlOnJava doesn't implement Perl 5's mg_get()/mg_set() protocol. +Tied scalars use tiedFetch/tiedStore directly; there's no "magic invocation count" +concept. + +**Status:** Won't fix (architectural difference) + +### Bug 7: `undefined-block` -- Missing Error for Undefined Code Blocks + +**Impact:** t/undefined-block.t (18/18 failures) + +**Root cause:** When `undef` is passed as a code block to `first`, `any`, `all`, etc., +Perl 5 throws specific errors. PerlOnJava doesn't validate the code ref argument. + +**Status:** Deferred + +--- + +## Fix Order (Priority) + +1. **Fix `isvstring()`** -- trivial, 1 test file +2. **Implement `mesh`/`zip`** -- medium, 2 test files (16 tests) +3. Version sync (update bundled version to match) -- optional +4. `undefined-block` error handling -- deferred +5. `set_subname` caller integration -- deferred + +## Completed Fixes + +_(none yet)_ + +## Progress Tracking + +- [ ] Fix `isvstring()` to check VSTRING type +- [ ] Implement `mesh`/`zip`/`mesh_longest`/`zip_longest`/`mesh_shortest`/`zip_shortest` +- [ ] Re-run `./jcpan -t Scalar::Util` and update results + +## Related Documents + +- `dev/cpan-reports/Scalar-Util.md` -- detailed test investigation +- `dev/architecture/weaken-destroy.md` -- weaken/DESTROY architecture diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index c2d338cba..c1cd4dc8c 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,14 +33,14 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "63c473e40"; + public static final String gitCommitId = "e11aeaebe"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitDate = "2026-04-12"; + public static final String gitCommitDate = "2026-04-13"; /** * Build timestamp in Perl 5 "Compiled at" format (e.g., "Apr 7 2026 11:20:00"). @@ -48,7 +48,7 @@ public final class Configuration { * Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at" * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String buildTimestamp = "Apr 12 2026 20:39:23"; + public static final String buildTimestamp = "Apr 13 2026 09:14:54"; // Prevent instantiation private Configuration() { diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/ListUtil.java b/src/main/java/org/perlonjava/runtime/perlmodule/ListUtil.java index 9c9132039..065a3d2b1 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/ListUtil.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/ListUtil.java @@ -76,6 +76,14 @@ public static void initialize() { listUtil.registerMethod("pairmap", "pairmap", "&@"); listUtil.registerMethod("pairgrep", "pairgrep", "&@"); listUtil.registerMethod("pairfirst", "pairfirst", "&@"); + + // Zip/mesh functions (take arrayrefs, no prototype) + listUtil.registerMethod("zip", "zip", null); + listUtil.registerMethod("zip_shortest", "zip_shortest", null); + listUtil.registerMethod("zip_longest", "zip", null); // alias for zip + listUtil.registerMethod("mesh", "mesh", null); + listUtil.registerMethod("mesh_shortest", "mesh_shortest", null); + listUtil.registerMethod("mesh_longest", "mesh", null); // alias for mesh } catch (NoSuchMethodException e) { System.err.println("Warning: Missing List::Util method: " + e.getMessage()); } @@ -756,4 +764,114 @@ public static RuntimeList pairfirst(RuntimeArray args, int ctx) { return ctx == RuntimeContextType.SCALAR ? scalarFalse.getList() : new RuntimeList(); } + + /** + * Zip arrayrefs into a list of arrayrefs (tuples). + * Pads shorter inputs with undef to the length of the longest input. + * zip(\@a, \@b, ...) returns ([a0,b0,...], [a1,b1,...], ...) + */ + public static RuntimeList zip(RuntimeArray args, int ctx) { + return zipImpl(args, ctx, false); + } + + /** + * Zip arrayrefs, stopping at the shortest input. + */ + public static RuntimeList zip_shortest(RuntimeArray args, int ctx) { + return zipImpl(args, ctx, true); + } + + /** + * Shared implementation for zip and zip_shortest. + */ + private static RuntimeList zipImpl(RuntimeArray args, int ctx, boolean shortest) { + if (args.isEmpty()) { + return new RuntimeList(); + } + + // Collect input arrays and find min/max lengths + RuntimeArray[] arrays = new RuntimeArray[args.size()]; + int maxLen = 0; + int minLen = Integer.MAX_VALUE; + for (int i = 0; i < args.size(); i++) { + RuntimeScalar ref = args.get(i); + if (ref.type != RuntimeScalarType.ARRAYREFERENCE) { + throw new RuntimeException("Not an ARRAY reference"); + } + arrays[i] = (RuntimeArray) ref.value; + maxLen = Math.max(maxLen, arrays[i].size()); + minLen = Math.min(minLen, arrays[i].size()); + } + + int len = shortest ? minLen : maxLen; + RuntimeArray result = new RuntimeArray(); + + for (int row = 0; row < len; row++) { + RuntimeArray tuple = new RuntimeArray(); + for (RuntimeArray array : arrays) { + if (row < array.size()) { + tuple.push(array.get(row)); + } else { + tuple.push(RuntimeScalarCache.scalarUndef); + } + } + result.push(tuple.createReference()); + } + + return result.getList(); + } + + /** + * Mesh (interleave) arrayrefs into a flat list. + * Pads shorter inputs with undef to the length of the longest input. + * mesh(\@a, \@b, ...) returns (a0, b0, ..., a1, b1, ...) + */ + public static RuntimeList mesh(RuntimeArray args, int ctx) { + return meshImpl(args, ctx, false); + } + + /** + * Mesh arrayrefs, stopping at the shortest input. + */ + public static RuntimeList mesh_shortest(RuntimeArray args, int ctx) { + return meshImpl(args, ctx, true); + } + + /** + * Shared implementation for mesh and mesh_shortest. + */ + private static RuntimeList meshImpl(RuntimeArray args, int ctx, boolean shortest) { + if (args.isEmpty()) { + return new RuntimeList(); + } + + // Collect input arrays and find min/max lengths + RuntimeArray[] arrays = new RuntimeArray[args.size()]; + int maxLen = 0; + int minLen = Integer.MAX_VALUE; + for (int i = 0; i < args.size(); i++) { + RuntimeScalar ref = args.get(i); + if (ref.type != RuntimeScalarType.ARRAYREFERENCE) { + throw new RuntimeException("Not an ARRAY reference"); + } + arrays[i] = (RuntimeArray) ref.value; + maxLen = Math.max(maxLen, arrays[i].size()); + minLen = Math.min(minLen, arrays[i].size()); + } + + int len = shortest ? minLen : maxLen; + RuntimeArray result = new RuntimeArray(); + + for (int row = 0; row < len; row++) { + for (RuntimeArray array : arrays) { + if (row < array.size()) { + result.push(array.get(row)); + } else { + result.push(RuntimeScalarCache.scalarUndef); + } + } + } + + return result.getList(); + } } diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java b/src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java index 11c453983..163ab9841 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/ScalarUtil.java @@ -239,8 +239,9 @@ public static RuntimeList isvstring(RuntimeArray args, int ctx) { if (args.size() != 1) { throw new IllegalStateException("Bad number of arguments for isvstring() method"); } - // Placeholder for isvstring functionality - return new RuntimeScalar(false).getList(); + RuntimeScalar s = args.get(0); + if (s.type == READONLY_SCALAR) s = (RuntimeScalar) s.value; + return new RuntimeScalar(s.type == VSTRING).getList(); } /** diff --git a/src/main/perl/lib/Memoize.pm b/src/main/perl/lib/Memoize.pm new file mode 100644 index 000000000..bf26cf422 --- /dev/null +++ b/src/main/perl/lib/Memoize.pm @@ -0,0 +1,958 @@ +# -*- mode: perl; perl-indent-level: 2; -*- +# vim: ts=8 sw=2 sts=2 noexpandtab + +# Memoize.pm +# +# Copyright 1998, 1999, 2000, 2001, 2012 M. J. Dominus. +# You may copy and distribute this program under the +# same terms as Perl itself. + +use strict; use warnings; + +package Memoize; +our $VERSION = '1.17'; + +use Carp; +use Scalar::Util 1.11 (); # for set_prototype + +BEGIN { require Exporter; *import = \&Exporter::import } +our @EXPORT = qw(memoize); +our @EXPORT_OK = qw(unmemoize flush_cache); + +my %memotable; + +sub CLONE { + my @info = values %memotable; + %memotable = map +($_->{WRAPPER} => $_), @info; +} + +sub memoize { + my $fn = shift; + my %options = @_; + + unless (defined($fn) && + (ref $fn eq 'CODE' || ref $fn eq '')) { + croak "Usage: memoize 'functionname'|coderef {OPTIONS}"; + } + + my $uppack = caller; # TCL me Elmo! + my $name = (ref $fn ? undef : $fn); + my $cref = _make_cref($fn, $uppack); + + my $normalizer = $options{NORMALIZER}; + if (defined $normalizer && ! ref $normalizer) { + $normalizer = _make_cref($normalizer, $uppack); + } + + my $install_name = exists $options{INSTALL} + ? $options{INSTALL} # use given name (or, if undef: do not install) + : $name; # no INSTALL option provided: default to original name if possible + + if (defined $install_name) { + $install_name = $uppack . '::' . $install_name + unless $install_name =~ /::/; + } + + # convert LIST_CACHE => MERGE to SCALAR_CACHE => MERGE + # to ensure TIE/HASH will always be checked by _check_suitable + if (($options{LIST_CACHE} || '') eq 'MERGE') { + $options{LIST_CACHE} = $options{SCALAR_CACHE}; + $options{SCALAR_CACHE} = 'MERGE'; + } + + # These will be the caches + my %caches; + for my $context (qw(LIST SCALAR)) { # SCALAR_CACHE must be last, to process MERGE + my $fullopt = $options{"${context}_CACHE"} ||= 'MEMORY'; + my ($cache_opt, @cache_opt_args) = ref $fullopt ? @$fullopt : $fullopt; + if ($cache_opt eq 'FAULT') { # no cache + $caches{$context} = undef; + } elsif ($cache_opt eq 'HASH') { # user-supplied hash + my $cache = $cache_opt_args[0]; + _check_suitable($context, ref tied %$cache); + $caches{$context} = $cache; + } elsif ($cache_opt eq 'TIE') { + carp("TIE option to memoize() is deprecated; use HASH instead") + if warnings::enabled('all'); + my $module = shift(@cache_opt_args) || ''; + _check_suitable($context, $module); + my $hash = $caches{$context} = {}; + (my $modulefile = $module . '.pm') =~ s{::}{/}g; + require $modulefile; + tie(%$hash, $module, @cache_opt_args) + or croak "Couldn't tie memoize hash to `$module': $!"; + } elsif ($cache_opt eq 'MEMORY') { + $caches{$context} = {}; + } elsif ($cache_opt eq 'MERGE' and not ref $fullopt) { # ['MERGE'] was never supported + die "cannot MERGE $context\_CACHE" if $context ne 'SCALAR'; # should never happen + die 'bad cache setup order' if not exists $caches{LIST}; # should never happen + $options{MERGED} = 1; + $caches{SCALAR} = $caches{LIST}; + } else { + croak "Unrecognized option to `${context}_CACHE': `$cache_opt' should be one of (MERGE TIE MEMORY FAULT HASH)"; + } + } + + my $wrapper = _wrap($install_name, $cref, $normalizer, $options{MERGED}, \%caches); + + if (defined $install_name) { + no strict; + no warnings 'redefine'; + *{$install_name} = $wrapper; + } + + $memotable{$wrapper} = { + L => $caches{LIST}, + S => $caches{SCALAR}, + U => $cref, + NAME => $install_name, + WRAPPER => $wrapper, + }; + + $wrapper # Return just memoized version +} + +sub flush_cache { + my $func = _make_cref($_[0], scalar caller); + my $info = $memotable{$func}; + die "$func not memoized" unless defined $info; + for my $context (qw(S L)) { + my $cache = $info->{$context}; + if (tied %$cache && ! (tied %$cache)->can('CLEAR')) { + my $funcname = defined($info->{NAME}) ? + "function $info->{NAME}" : "anonymous function $func"; + my $context = {S => 'scalar', L => 'list'}->{$context}; + croak "Tied cache hash for $context-context $funcname does not support flushing"; + } else { + %$cache = (); + } + } +} + +sub _wrap { + my ($name, $orig, $normalizer, $merged, $caches) = @_; + my ($cache_L, $cache_S) = @$caches{qw(LIST SCALAR)}; + undef $caches; # keep the pad from keeping the hash alive forever + Scalar::Util::set_prototype(sub { + my $argstr = do { + no warnings 'uninitialized'; + defined $normalizer + ? ( wantarray ? ( $normalizer->( @_ ) )[0] : $normalizer->( @_ ) ) + . '' # coerce undef to string while the warning is off + : join chr(28), @_; + }; + + if (wantarray) { + _crap_out($name, 'list') unless $cache_L; + exists $cache_L->{$argstr} ? ( + @{$cache_L->{$argstr}} + ) : do { + my @q = do { no warnings 'recursion'; &$orig }; + $cache_L->{$argstr} = \@q; + @q; + }; + } else { + _crap_out($name, 'scalar') unless $cache_S; + exists $cache_S->{$argstr} ? ( + $merged ? $cache_S->{$argstr}[0] : $cache_S->{$argstr} + ) : do { + my $val = do { no warnings 'recursion'; &$orig }; + $cache_S->{$argstr} = $merged ? [$val] : $val; + $val; + }; + } + }, prototype $orig); +} + +sub unmemoize { + my $f = shift; + my $uppack = caller; + my $cref = _make_cref($f, $uppack); + + unless (exists $memotable{$cref}) { + croak "Could not unmemoize function `$f', because it was not memoized to begin with"; + } + + my $tabent = $memotable{$cref}; + unless (defined $tabent) { + croak "Could not figure out how to unmemoize function `$f'"; + } + my $name = $tabent->{NAME}; + if (defined $name) { + no strict; + no warnings 'redefine'; + *{$name} = $tabent->{U}; # Replace with original function + } + delete $memotable{$cref}; + + $tabent->{U}; +} + +sub _make_cref { + my $fn = shift; + my $uppack = shift; + my $cref; + my $name; + + if (ref $fn eq 'CODE') { + $cref = $fn; + } elsif (! ref $fn) { + if ($fn =~ /::/) { + $name = $fn; + } else { + $name = $uppack . '::' . $fn; + } + no strict; + if (defined $name and !defined(&$name)) { + croak "Cannot operate on nonexistent function `$fn'"; + } +# $cref = \&$name; + $cref = *{$name}{CODE}; + } else { + my $parent = (caller(1))[3]; # Function that called _make_cref + croak "Usage: argument 1 to `$parent' must be a function name or reference.\n"; + } + our $DEBUG and warn "${name}($fn) => $cref in _make_cref\n"; + $cref; +} + +sub _crap_out { + my ($funcname, $context) = @_; + if (defined $funcname) { + croak "Function `$funcname' called in forbidden $context context; faulting"; + } else { + croak "Anonymous function called in forbidden $context context; faulting"; + } +} + +# Raise an error if the user tries to specify one of these packages as a +# tie for LIST_CACHE +my %scalar_only = map {($_ => 1)} qw(DB_File GDBM_File SDBM_File ODBM_File), map +($_, "Memoize::$_"), qw(AnyDBM_File NDBM_File); +sub _check_suitable { + my ($context, $package) = @_; + croak "You can't use $package for LIST_CACHE because it can only store scalars" + if $context eq 'LIST' and $scalar_only{$package}; +} + +1; + +__END__ + +=pod + +=head1 NAME + +Memoize - Make functions faster by trading space for time + +=head1 SYNOPSIS + + use Memoize; + memoize('slow_function'); + slow_function(arguments); # Is faster than it was before + + +This is normally all you need to know. However, many options are available: + + memoize(function, options...); + +Options include: + + NORMALIZER => function + INSTALL => new_name + + SCALAR_CACHE => 'MEMORY' + SCALAR_CACHE => ['HASH', \%cache_hash ] + SCALAR_CACHE => 'FAULT' + SCALAR_CACHE => 'MERGE' + + LIST_CACHE => 'MEMORY' + LIST_CACHE => ['HASH', \%cache_hash ] + LIST_CACHE => 'FAULT' + LIST_CACHE => 'MERGE' + +=head1 DESCRIPTION + +I a function makes it faster by trading space for time. It +does this by caching the return values of the function in a table. +If you call the function again with the same arguments, C +jumps in and gives you the value out of the table, instead of letting +the function compute the value all over again. + +=head1 EXAMPLE + +Here is an extreme example. Consider the Fibonacci sequence, defined +by the following function: + + # Compute Fibonacci numbers + sub fib { + my $n = shift; + return $n if $n < 2; + fib($n-1) + fib($n-2); + } + +This function is very slow. Why? To compute fib(14), it first wants +to compute fib(13) and fib(12), and add the results. But to compute +fib(13), it first has to compute fib(12) and fib(11), and then it +comes back and computes fib(12) all over again even though the answer +is the same. And both of the times that it wants to compute fib(12), +it has to compute fib(11) from scratch, and then it has to do it +again each time it wants to compute fib(13). This function does so +much recomputing of old results that it takes a really long time to +run---fib(14) makes 1,200 extra recursive calls to itself, to compute +and recompute things that it already computed. + +This function is a good candidate for memoization. If you memoize the +C function above, it will compute fib(14) exactly once, the first +time it needs to, and then save the result in a table. Then if you +ask for fib(14) again, it gives you the result out of the table. +While computing fib(14), instead of computing fib(12) twice, it does +it once; the second time it needs the value it gets it from the table. +It doesn't compute fib(11) four times; it computes it once, getting it +from the table the next three times. Instead of making 1,200 +recursive calls to C, it makes 15. This makes the function about +150 times faster. + +You could do the memoization yourself, by rewriting the function, like +this: + + # Compute Fibonacci numbers, memoized version + { my @fib; + sub fib { + my $n = shift; + return $fib[$n] if defined $fib[$n]; + return $fib[$n] = $n if $n < 2; + $fib[$n] = fib($n-1) + fib($n-2); + } + } + +Or you could use this module, like this: + + use Memoize; + memoize('fib'); + + # Rest of the fib function just like the original version. + +This makes it easy to turn memoizing on and off. + +Here's an even simpler example: I wrote a simple ray tracer; the +program would look in a certain direction, figure out what it was +looking at, and then convert the C value (typically a string +like C) of that object to a red, green, and blue pixel value, like +this: + + for ($direction = 0; $direction < 300; $direction++) { + # Figure out which object is in direction $direction + $color = $object->{color}; + ($r, $g, $b) = @{&ColorToRGB($color)}; + ... + } + +Since there are relatively few objects in a picture, there are only a +few colors, which get looked up over and over again. Memoizing +C sped up the program by several percent. + +=head1 DETAILS + +This module exports exactly one function, C. The rest of the +functions in this package are None of Your Business. + +You should say + + memoize(function) + +where C is the name of the function you want to memoize, or +a reference to it. C returns a reference to the new, +memoized version of the function, or C on a non-fatal error. +At present, there are no non-fatal errors, but there might be some in +the future. + +If C was the name of a function, then C hides the +old version and installs the new memoized version under the old name, +so that C<&function(...)> actually invokes the memoized version. + +=head1 OPTIONS + +There are some optional options you can pass to C to change +the way it behaves a little. To supply options, invoke C +like this: + + memoize(function, NORMALIZER => function, + INSTALL => newname, + SCALAR_CACHE => option, + LIST_CACHE => option + ); + +Each of these options is optional; you can include some, all, or none +of them. + +=head2 INSTALL + +If you supply a function name with C, memoize will install +the new, memoized version of the function under the name you give. +For example, + + memoize('fib', INSTALL => 'fastfib') + +installs the memoized version of C as C; without the +C option it would have replaced the old C with the +memoized version. + +To prevent C from installing the memoized version anywhere, use +C undef>. + +=head2 NORMALIZER + +Suppose your function looks like this: + + # Typical call: f('aha!', A => 11, B => 12); + sub f { + my $a = shift; + my %hash = @_; + $hash{B} ||= 2; # B defaults to 2 + $hash{C} ||= 7; # C defaults to 7 + + # Do something with $a, %hash + } + +Now, the following calls to your function are all completely equivalent: + + f(OUCH); + f(OUCH, B => 2); + f(OUCH, C => 7); + f(OUCH, B => 2, C => 7); + f(OUCH, C => 7, B => 2); + (etc.) + +However, unless you tell C that these calls are equivalent, +it will not know that, and it will compute the values for these +invocations of your function separately, and store them separately. + +To prevent this, supply a C function that turns the +program arguments into a string in a way that equivalent arguments +turn into the same string. A C function for C above +might look like this: + + sub normalize_f { + my $a = shift; + my %hash = @_; + $hash{B} ||= 2; + $hash{C} ||= 7; + + join(',', $a, map ($_ => $hash{$_}) sort keys %hash); + } + +Each of the argument lists above comes out of the C +function looking exactly the same, like this: + + OUCH,B,2,C,7 + +You would tell C to use this normalizer this way: + + memoize('f', NORMALIZER => 'normalize_f'); + +C knows that if the normalized version of the arguments is +the same for two argument lists, then it can safely look up the value +that it computed for one argument list and return it as the result of +calling the function with the other argument list, even if the +argument lists look different. + +The default normalizer just concatenates the arguments with character +28 in between. (In ASCII, this is called FS or control-\.) This +always works correctly for functions with only one string argument, +and also when the arguments never contain character 28. However, it +can confuse certain argument lists: + + normalizer("a\034", "b") + normalizer("a", "\034b") + normalizer("a\034\034b") + +for example. + +Since hash keys are strings, the default normalizer will not +distinguish between C and the empty string. It also won't work +when the function's arguments are references. For example, consider a +function C which gets two arguments: A number, and a reference to +an array of numbers: + + g(13, [1,2,3,4,5,6,7]); + +The default normalizer will turn this into something like +C<"13\034ARRAY(0x436c1f)">. That would be all right, except that a +subsequent array of numbers might be stored at a different location +even though it contains the same data. If this happens, C +will think that the arguments are different, even though they are +equivalent. In this case, a normalizer like this is appropriate: + + sub normalize { join ' ', $_[0], @{$_[1]} } + +For the example above, this produces the key "13 1 2 3 4 5 6 7". + +Another use for normalizers is when the function depends on data other +than those in its arguments. Suppose you have a function which +returns a value which depends on the current hour of the day: + + sub on_duty { + my ($problem_type) = @_; + my $hour = (localtime)[2]; + open my $fh, "$DIR/$problem_type" or die...; + my $line; + while ($hour-- > 0) + $line = <$fh>; + } + return $line; + } + +At 10:23, this function generates the 10th line of a data file; at +3:45 PM it generates the 15th line instead. By default, C +will only see the $problem_type argument. To fix this, include the +current hour in the normalizer: + + sub normalize { join ' ', (localtime)[2], @_ } + +The calling context of the function (scalar or list context) is +propagated to the normalizer. This means that if the memoized +function will treat its arguments differently in list context than it +would in scalar context, you can have the normalizer function select +its behavior based on the results of C. Even if called in +a list context, a normalizer should still return a single string. + +=head2 C, C + +Normally, C caches your function's return values into an +ordinary Perl hash variable. However, you might like to have the +values cached on the disk, so that they persist from one run of your +program to the next, or you might like to associate some other +interesting semantics with the cached values. + +There's a slight complication under the hood of C: There are +actually I caches, one for scalar values and one for list values. +When your function is called in scalar context, its return value is +cached in one hash, and when your function is called in list context, +its value is cached in the other hash. You can control the caching +behavior of both contexts independently with these options. + +The argument to C or C must either be one of +the following four strings: + + MEMORY + FAULT + MERGE + HASH + +or else it must be a reference to an array whose first element is one of +these four strings, such as C<[HASH, arguments...]>. + +=over 4 + +=item C + +C means that return values from the function will be cached in +an ordinary Perl hash variable. The hash variable will not persist +after the program exits. This is the default. + +=item C + +C allows you to specify that a particular hash that you supply +will be used as the cache. You can tie this hash beforehand to give +it any behavior you want. + +A tied hash can have any semantics at all. It is typically tied to an +on-disk database, so that cached values are stored in the database and +retrieved from it again when needed, and the disk file typically +persists after your program has exited. See C for more +complete details about C. + +A typical example is: + + use DB_File; + tie my %cache => 'DB_File', $filename, O_RDWR|O_CREAT, 0666; + memoize 'function', SCALAR_CACHE => [HASH => \%cache]; + +This has the effect of storing the cache in a C database +whose name is in C<$filename>. The cache will persist after the +program has exited. Next time the program runs, it will find the +cache already populated from the previous run of the program. Or you +can forcibly populate the cache by constructing a batch program that +runs in the background and populates the cache file. Then when you +come to run your real program the memoized function will be fast +because all its results have been precomputed. + +Another reason to use C is to provide your own hash variable. +You can then inspect or modify the contents of the hash to gain finer +control over the cache management. + +=item C + +This option is no longer supported. It is still documented only to +aid in the debugging of old programs that use it. Old programs should +be converted to use the C option instead. + + memoize ... ['TIE', PACKAGE, ARGS...] + +is merely a shortcut for + + require PACKAGE; + { tie my %cache, PACKAGE, ARGS...; + memoize ... [HASH => \%cache]; + } + +=item C + +C means that you never expect to call the function in scalar +(or list) context, and that if C detects such a call, it +should abort the program. The error message is one of + + `foo' function called in forbidden list context at line ... + `foo' function called in forbidden scalar context at line ... + +=item C + +C normally means that the memoized function does not +distinguish between list and scalar context, and that return values in +both contexts should be stored together. Both C +MERGE> and C MERGE> mean the same thing. + +Consider this function: + + sub complicated { + # ... time-consuming calculation of $result + return $result; + } + +The C function will return the same numeric C<$result> +regardless of whether it is called in list or in scalar context. + +Normally, the following code will result in two calls to C, even +if C is memoized: + + $x = complicated(142); + ($y) = complicated(142); + $z = complicated(142); + +The first call will cache the result, say 37, in the scalar cache; the +second will cache the list C<(37)> in the list cache. The third call +doesn't call the real C function; it gets the value 37 +from the scalar cache. + +Obviously, the second call to C is a waste of time, and +storing its return value is a waste of space. Specifying C MERGE> will make C use the same cache for scalar and +list context return values, so that the second call uses the scalar +cache that was populated by the first call. C ends up +being called only once, and both subsequent calls return C<37> from the +cache, regardless of the calling context. + +=back + +=head3 List values in scalar context + +Consider this function: + + sub iota { return reverse (1..$_[0]) } + +This function normally returns a list. Suppose you memoize it and +merge the caches: + + memoize 'iota', SCALAR_CACHE => 'MERGE'; + + @i7 = iota(7); + $i7 = iota(7); + +Here the first call caches the list (1,2,3,4,5,6,7). The second call +does not really make sense. C cannot guess what behavior +C should have in scalar context without actually calling it in +scalar context. Normally C I call C in scalar +context and cache the result, but the C 'MERGE'> +option says not to do that, but to use the cache list-context value +instead. But it cannot return a list of seven elements in a scalar +context. In this case C<$i7> will receive the B of the +cached list value, namely 7. + +=head3 Merged disk caches + +Another use for C is when you want both kinds of return values +stored in the same disk file; this saves you from having to deal with +two disk files instead of one. You can use a normalizer function to +keep the two sets of return values separate. For example: + + local $MLDBM::UseDB = 'DB_File'; + tie my %cache => 'MLDBM', $filename, ...; + + memoize 'myfunc', + NORMALIZER => 'n', + SCALAR_CACHE => [HASH => \%cache], + LIST_CACHE => 'MERGE', + ; + + sub n { + my $context = wantarray() ? 'L' : 'S'; + # ... now compute the hash key from the arguments ... + $hashkey = "$context:$hashkey"; + } + +This normalizer function will store scalar context return values in +the disk file under keys that begin with C, and list context +return values under keys that begin with C. + +=head1 OTHER FACILITIES + +=head2 C + +There's an C function that you can import if you want to. +Why would you want to? Here's an example: Suppose you have your cache +tied to a DBM file, and you want to make sure that the cache is +written out to disk if someone interrupts the program. If the program +exits normally, this will happen anyway, but if someone types +control-C or something then the program will terminate immediately +without synchronizing the database. So what you can do instead is + + $SIG{INT} = sub { unmemoize 'function' }; + +C accepts a reference to, or the name of a previously +memoized function, and undoes whatever it did to provide the memoized +version in the first place, including making the name refer to the +unmemoized version if appropriate. It returns a reference to the +unmemoized version of the function. + +If you ask it to unmemoize a function that was never memoized, it +croaks. + +=head2 C + +C will flush out the caches, discarding I +the cached data. The argument may be a function name or a reference +to a function. For finer control over when data is discarded or +expired, see the documentation for C, included in +this package. + +Note that if the cache is a tied hash, C will attempt to +invoke the C method on the hash. If there is no C +method, this will cause a run-time error. + +An alternative approach to cache flushing is to use the C option +(see above) to request that C use a particular hash variable +as its cache. Then you can examine or modify the hash at any time in +any way you desire. You may flush the cache by using C<%hash = ()>. + +=head1 CAVEATS + +Memoization is not a cure-all: + +=over 4 + +=item * + +Do not memoize a function whose behavior depends on program +state other than its own arguments, such as global variables, the time +of day, or file input. These functions will not produce correct +results when memoized. For a particularly easy example: + + sub f { + time; + } + +This function takes no arguments, and as far as C is +concerned, it always returns the same result. C is wrong, of +course, and the memoized version of this function will call C