Autoresearch/json decode 2026 03 19 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

miikka wants to merge 19 commits into master from autoresearch/json-decode-2026-03-19

+396 −30

autoresearch.checks.sh

-Original file line number
+Diff line change
@@ -0,0 +1,6 @@
+    #!/usr/bin/env bash
+    set -euo pipefail
+    lein do clean, test >/tmp/jsonista-autoresearch-checks.log 2>&1 || {
+      tail -80 /tmp/jsonista-autoresearch-checks.log
+      exit 1
+    }

autoresearch.ideas.md

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		- Explore a principled schema-shape cache for repeated object layouts (not benchmark-specific): cache a previously validated unique key sequence for common map sizes and bypass duplicate tracking when the incoming key sequence matches exactly. A quick 12-field-only version was noisy and not clearly better, but the general idea may still pay off with a cleaner design.

autoresearch.jsonl

Large diffs are not rendered by default.

autoresearch.md

-Original file line number
+Diff line change
@@ -0,0 +1,58 @@
+    # Autoresearch: JSON decode performance
+    ## Objective
+    Optimize JSON decoding throughput in jsonista, measured by `scripts/bench-target.sh`. The target benchmark returns decode throughput in ops/s, so higher is better. Preserve correctness by ensuring the full test suite passes with `lein do clean, test`.
+    ## Metrics
+    - **Primary**: `ops_per_s` (ops/s, higher is better)
+    - **Secondary**: benchmark wall time, test suite status
+    ## How to Run
+    `./autoresearch.sh` — runs the decode benchmark repeatedly, reports the median as `METRIC ops_per_s=<number>`.
+    ## Files in Scope
+    - `src/java/jsonista/jackson/PersistentHashMapDeserializer.java` — Clojure map deserializer used on decode path
+    - `src/java/jsonista/jackson/PersistentVectorDeserializer.java` — Clojure vector deserializer used on decode path
+    - `src/java/jsonista/jackson/HashMapDeserializer.java` — Java map deserializer used by fast mapper paths
+    - `src/java/jsonista/jackson/ArrayListDeserializer.java` — Java list deserializer used by fast mapper paths
+    - `src/clj/jsonista/core.clj` — mapper/module wiring and read API
+    - `test/jsonista/jmh.clj` — benchmark definitions if benchmark instrumentation needs adjustment
+    - `scripts/bench-target.sh` — existing target benchmark command
+    - `autoresearch.sh` — benchmark harness for autoresearch
+    - `autoresearch.checks.sh` — correctness checks
+    ## Off Limits
+    - Public API behavior
+    - Dependency versions
+    - Unrelated encode-path changes unless required for decode optimization
+    - Documentation unrelated to autoresearch bookkeeping
+    ## Constraints
+    - Keep JSON decoding semantics unchanged
+    - `scripts/bench-target.sh` is the primary benchmark
+    - `lein do clean, test` must pass for any kept change
+    - No new dependencies
+    ## What's Been Tried
+    - Baseline median (3 runs): `13244.35 ops/s`.
+    - Initial source review suggests the hot path is in the custom Jackson deserializers for persistent maps/vectors and possibly mapper configuration overhead.
+    - Attempted to contextualize and cache value deserializers in `TaggedValueOrPersistentVectorDeserializer` / `ArrayListDeserializer`; benchmark regressed and was discarded.
+    - Attempted to restructure tagged-vector decode to avoid transient-vector setup before checking the first element; benchmark regressed and was discarded.
+    - **Win:** `PersistentHashMapDeserializer` now uses a small-map fast path: collect key/value pairs into an array and return `PersistentArrayMap.createAsIfByAssoc(...)`, falling back to transient `PersistentHashMap` only for larger objects. Raising the cutoff from 8 to 16 entries improved results further, which fits the benchmark's many nested objects with sub-16 field counts.
+    - A dynamic exact-capacity array strategy for common small-map sizes looked good in one run but lost on follow-up reruns; discarded to avoid overfitting.
+    - **Win:** Switching the `PersistentHashMapDeserializer` object loop to `JsonParser.nextFieldName()` shaved a bit more overhead off field iteration while preserving the existing small-map fast path.
+    - **Win:** `TaggedValueOrPersistentVectorDeserializer` now has a cheaper untagged-array path: it checks the first token before considering tagged decoding, buffers up to 16 values in a plain array, and only falls back to transient vector construction for larger arrays. This helps the benchmark's ordinary `results` array without changing tagged-value semantics.
+    - **Win:** For the common default-string-key case, `PersistentHashMapDeserializer` now reuses the field name string directly instead of routing it through Jackson's String key deserializer. This is a small but measurable improvement on the benchmark's plain-string JSON object keys.
+    - **Win:** Contextualizing `TaggedValueOrPersistentVectorDeserializer` and reusing the resolved value deserializer turned out to help once combined with the newer array fast path. Earlier contextualization experiments were on different code and regressed; in the current implementation, removing the repeated resolver lookup improved throughput on repeated runs.
+    - CPU profiling of the benchmark workload shows `PersistentHashMapDeserializer.deserialize` still dominates, with `PersistentArrayMap.equalKey` / `String.equals` visible underneath. That pointed specifically at duplicate-key handling in `PersistentArrayMap.createAsIfByAssoc(...)` as a remaining hot area.
+    - **Win:** For the common string-key path, `PersistentHashMapDeserializer` now tracks seen keys with a `HashSet` while filling the small-map buffer and, when no duplicates were present, constructs `PersistentArrayMap` directly instead of calling `createAsIfByAssoc(...)`. This preserves duplicate-key semantics by falling back to `createAsIfByAssoc` when needed, and profiling-backed it removed a major hotspot.
+    - **Win:** A hybrid duplicate-tracking strategy worked even better: only allocate/populate the `HashSet` once a small map reaches 5 entries. That avoids `HashMap.put` overhead for the many 2–4 field objects in the benchmark while still bypassing `createAsIfByAssoc`'s duplicate scan for larger small maps.
+    - **Win:** Small string-key maps of size 0–4 now use hand-written duplicate checks and construct `PersistentArrayMap` directly when unique, instead of paying `createAsIfByAssoc(...)`'s generic duplicate-scan path. This fits the benchmark's many tiny nested objects and improves throughput again.
+    - **Win:** After the tiny-map specialization, re-tuning the `HashSet` threshold paid off: delaying duplicate tracking until 7 entries reduced `HashSet.add` overhead enough to beat the previous 5-entry threshold. Profiling-guided thresholds can change as surrounding costs shift.
+    - **Win:** Pushing that threshold one step further to 8 entries improved throughput again on repeated runs. The current shape of the workload appears to favor direct `PersistentArrayMap` construction for 7-field objects, with `HashSet`-based duplicate tracking kicking in only from 8 fields onward.
+    - Follow-up correctness review found that the 8-entry threshold version needed explicit duplicate handling for 5–7 field string-key maps; otherwise duplicate-key semantics could drift because those sizes skipped both the `HashSet` tracking path and `createAsIfByAssoc(...)`.
+    - **Win:** Added regression tests for duplicate keys and manual duplicate checks for 5–7 field string-key maps. This restores correctness while keeping most of the tiny/small-map speedup intact.
+    - A small extra win came from using direct reference comparisons for 2–4 field string-key duplicate checks. Follow-up validation with a custom `JsonFactory` that disables field-name canonicalization still passed, so this appears safe for Jackson's field-name handling in this library.
+    - **Win:** Replacing `HashSet` with a tiny fixed-size open-addressed `String[]` set for 8–15 field string-key duplicate tracking improved throughput further. It preserves `String.equals` semantics for duplicate detection but avoids `HashMap.put`/node overhead on the hot path.
+    - **Win:** After that change, pushing the duplicate-tracking threshold from 8 to 9 entries helped again: 8-field string-key maps now use a one-shot duplicate scan at the end, while the custom open-addressed set only kicks in for 9–15 field maps. That reduced duplicate-tracking overhead enough to improve repeated runs.
+    - Follow-up correctness review found another edge case in the custom open-addressed set path: duplicates among the seeded keys (before the threshold-triggering insert) were not being marked. Added regression tests for 9-field and 12-field duplicate-key objects and fixed the seeding loop to record duplicates discovered while populating the set.

autoresearch.sh

-Original file line number
+Diff line change
@@ -0,0 +1,21 @@
+    #!/usr/bin/env bash
+    set -euo pipefail
+    if ! command -v python3 >/dev/null 2>&1; then
+      echo "python3 is required" >&2
+      exit 1
+    fi
+    run_bench() {
+      local out
+      out="$(./scripts/bench-target.sh)"
+      printf '%s\n' "$out"
+    }
+    values=()
+    for _ in 1 2 3; do
+      values+=("$(run_bench)")
+    done
+    median="$({ printf '%s\n' "${values[@]}" | sort -n; } | sed -n '2p')"
+    printf 'METRIC ops_per_s=%s\n' "$median"

benchmarks.edn

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,12 +1,12 @@
  
    {:benchmarks [{:name :encode

                   :ns jsonista.jmh

                   :fn [encode-data-json encode-cheshire encode-jsonista encode-jackson #_encode-jsonista-fast]

                   :fn [#_encode-data-json #_encode-cheshire encode-jsonista #_encode-jackson #_encode-jsonista-fast]

                   :args [:state/edn]}

                  {:name :decode

                   :ns jsonista.jmh

                   :fn [decode-data-json decode-cheshire decode-jsonista decode-jackson #_decode-jsonista-fast]

                   :fn [#_decode-data-json #_decode-cheshire decode-jsonista #_decode-jackson #_decode-jsonista-fast]

                   :args [:state/json]}]

     :states {:json {:fn jsonista.jmh/json-data, :args [:param/size]}

              :edn {:fn jsonista.jmh/edn-data, :args [:param/size]}}

     :params {:size ["10b" "100b" "1k" "10k" "100k"]}

     :params {:size [#_"10b" #_"100b" #_"1k" "10k" #_"100k"]}

     :options {:jmh/default {:fork {:jvm {:append-args ["-Dclojure.compiler.direct-linking=true"]}}}}}

scripts/bench-target.sh

-Original file line number
+Diff line change
@@ -0,0 +1,4 @@
+    #!/usr/bin/env bash
+    set -euo pipefail
+    lein jmh '{:file "benchmarks.edn", :type :quick, :format :pprint, :select :decode}' | tail +2 | bb '(-> (first *input*) (get-in [:score 0]))'

src/java/jsonista/jackson/PersistentHashMapDeserializer.java

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,12 +11,43 @@ @@
     import com.fasterxml.jackson.databind.deser.std.StdDeserializer;
     import java.io.IOException;
+    import java.util.Arrays;
     import java.util.Map;
     public class PersistentHashMapDeserializer extends StdDeserializer<Map<String, Object>> implements ContextualDeserializer {
+      private static boolean uniqueStringKeys(Object[] result, int size) {
+        int limit = size << 1;
+        for (int i = 0; i < limit; i += 2) {
+          String key = (String) result[i];
+          for (int j = i + 2; j < limit; j += 2) {
+            if (key.equals(result[j])) {
+              return false;
+            }
+          }
+        }
+        return true;
+      }
+      private static boolean addSeenKey(String[] seenKeys, String key) {
+        int mask = seenKeys.length - 1;
+        int index = key.hashCode() & mask;
+        while (true) {
+          String seen = seenKeys[index];
+          if (seen == null) {
+            seenKeys[index] = key;
+            return true;
+          }
+          if (seen.equals(key)) {
+            return false;
+          }
+          index = (index + 1) & mask;
+        }
+      }
       private KeyDeserializer _keyDeserializer;
       private JsonDeserializer<?> _valueDeserializer;
+      private boolean _stringKeys;
       public PersistentHashMapDeserializer() {
         super(Map.class);
@@ Expand All @@
         this();
         _keyDeserializer = keyDeser;
         _valueDeserializer = valueDeser;
+        _stringKeys = keyDeser != null && "com.fasterxml.jackson.databind.deser.std.StdKeyDeserializer$StringKD".equals(keyDeser.getClass().getName());
       }
       protected PersistentHashMapDeserializer withResolved(KeyDeserializer keyDeser, JsonDeserializer<?> valueDeser) {
@@ Expand All @@
       @Override
       @SuppressWarnings("unchecked")
       public Map<String, Object> deserialize(JsonParser p, DeserializationContext ctxt) throws IOException, JsonProcessingException {
-        ITransientMap t = PersistentHashMap.EMPTY.asTransient();
-        while (p.nextToken() != JsonToken.END_OBJECT) {
-          Object key = _keyDeserializer.deserializeKey(p.getCurrentName(), ctxt);
+        Object[] entries = new Object[32];
+        int size = 0;
+        boolean hasDuplicateStringKeys = false;
+        String[] seenKeys = null;
+        String fieldName;
+        while ((fieldName = p.nextFieldName()) != null) {
+          Object key = _stringKeys ? fieldName : _keyDeserializer.deserializeKey(fieldName, ctxt);
           p.nextToken();
           Object value = _valueDeserializer.deserialize(p, ctxt);
-          t = t.assoc(key, value);
+          if (size < 16) {
+            int i = size << 1;
+            entries[i] = key;
+            entries[i + 1] = value;
+            size++;
+            if (_stringKeys && !hasDuplicateStringKeys) {
+              if (seenKeys == null) {
+                if (size == 9) {
+                  seenKeys = new String[32];
+                  for (int j = 0; j < i; j += 2) {
+                    if (!addSeenKey(seenKeys, (String) entries[j])) {
+                      hasDuplicateStringKeys = true;
+                    }
+                  }
+                  if (!hasDuplicateStringKeys) {
+                    hasDuplicateStringKeys = !addSeenKey(seenKeys, fieldName);
+                  }
+                }
+              } else if (!addSeenKey(seenKeys, fieldName)) {
+                hasDuplicateStringKeys = true;
+              }
+            }
+          } else {
+            ITransientMap t = PersistentHashMap.EMPTY.asTransient();
+            for (int i = 0; i < size << 1; i += 2) {
+              t = t.assoc(entries[i], entries[i + 1]);
+            }
+            t = t.assoc(key, value);
+            while ((fieldName = p.nextFieldName()) != null) {
+              Object nextKey = _stringKeys ? fieldName : _keyDeserializer.deserializeKey(fieldName, ctxt);
+              p.nextToken();
+              Object nextValue = _valueDeserializer.deserialize(p, ctxt);
+              t = t.assoc(nextKey, nextValue);
+            }
+            return (Map<String, Object>) t.persistent();
+          }
         }
-        // t.persistent() returns a PersistentHashMap, which is a Map.
-        return (Map<String, Object>) t.persistent();
+        Object[] result = Arrays.copyOf(entries, size << 1);
+        if (_stringKeys) {
+          switch (size) {
+            case 0:
+            case 1:
+              return (Map<String, Object>) new PersistentArrayMap(result);
+            case 2:
+              if (!((String) result[0]).equals(result[2])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 3:
+              if (!((String) result[0]).equals(result[2])
+                  && !((String) result[0]).equals(result[4])
+                  && !((String) result[2]).equals(result[4])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 4:
+              if (!((String) result[0]).equals(result[2])
+                  && !((String) result[0]).equals(result[4])
+                  && !((String) result[0]).equals(result[6])
+                  && !((String) result[2]).equals(result[4])
+                  && !((String) result[2]).equals(result[6])
+                  && !((String) result[4]).equals(result[6])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 5:
+              if (!((String) result[0]).equals(result[2])
+                  && !((String) result[0]).equals(result[4])
+                  && !((String) result[0]).equals(result[6])
+                  && !((String) result[0]).equals(result[8])
+                  && !((String) result[2]).equals(result[4])
+                  && !((String) result[2]).equals(result[6])
+                  && !((String) result[2]).equals(result[8])
+                  && !((String) result[4]).equals(result[6])
+                  && !((String) result[4]).equals(result[8])
+                  && !((String) result[6]).equals(result[8])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 6:
+              if (!((String) result[0]).equals(result[2])
+                  && !((String) result[0]).equals(result[4])
+                  && !((String) result[0]).equals(result[6])
+                  && !((String) result[0]).equals(result[8])
+                  && !((String) result[0]).equals(result[10])
+                  && !((String) result[2]).equals(result[4])
+                  && !((String) result[2]).equals(result[6])
+                  && !((String) result[2]).equals(result[8])
+                  && !((String) result[2]).equals(result[10])
+                  && !((String) result[4]).equals(result[6])
+                  && !((String) result[4]).equals(result[8])
+                  && !((String) result[4]).equals(result[10])
+                  && !((String) result[6]).equals(result[8])
+                  && !((String) result[6]).equals(result[10])
+                  && !((String) result[8]).equals(result[10])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 7:
+              if (!((String) result[0]).equals(result[2])
+                  && !((String) result[0]).equals(result[4])
+                  && !((String) result[0]).equals(result[6])
+                  && !((String) result[0]).equals(result[8])
+                  && !((String) result[0]).equals(result[10])
+                  && !((String) result[0]).equals(result[12])
+                  && !((String) result[2]).equals(result[4])
+                  && !((String) result[2]).equals(result[6])
+                  && !((String) result[2]).equals(result[8])
+                  && !((String) result[2]).equals(result[10])
+                  && !((String) result[2]).equals(result[12])
+                  && !((String) result[4]).equals(result[6])
+                  && !((String) result[4]).equals(result[8])
+                  && !((String) result[4]).equals(result[10])
+                  && !((String) result[4]).equals(result[12])
+                  && !((String) result[6]).equals(result[8])
+                  && !((String) result[6]).equals(result[10])
+                  && !((String) result[6]).equals(result[12])
+                  && !((String) result[8]).equals(result[10])
+                  && !((String) result[8]).equals(result[12])
+                  && !((String) result[10]).equals(result[12])) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            case 8:
+              if (uniqueStringKeys(result, size)) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+              break;
+            default:
+              if (!hasDuplicateStringKeys) {
+                return (Map<String, Object>) new PersistentArrayMap(result);
+              }
+          }
+        }
+        return (Map<String, Object>) PersistentArrayMap.createAsIfByAssoc(result);
       }
     }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoresearch/json decode 2026 03 19 #1

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!

Autoresearch/json decode 2026 03 19 #1

Are you sure you want to change the base?

Autoresearch/json decode 2026 03 19 #1

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!