Nontrivial string processing can get very slow #222

copumpkin · 2016-07-24T14:40:06Z

I think it's mostly caused the back-and-forth between lists of characters and strings.

For example, I have a function that takes a list of strings and "camelCases" the list, so that ["foo", "bar", "baz"] turns into "fooBarBaz". Using that function on lots of strings in my larger evaluation makes the whole thing take about 40s. Replacing it with a simple function (x) std.join("-", x) for hyphenated names reduces total evaluation time to 10s, keeping other things equal.

The camelcase code is as follows:

local upper(c): local cp = std.codepoint(c); if cp >= 97 && cp < 123 then std.char(cp - 32) else c;
local capitalize(str): util.upper(str[0]) + std.substr(str, 1, std.length(str) - 1);
local zipWithIndex(f, list): std.makeArray(std.length(list), function (i) f(i, list[i]));
local camelcase(list): std.join("", util.zipWithIndex(function (i, x) if i == 0 then x else util.capitalize(x), list));

Not the most beautiful thing ever, but it works as intended, other than being super slow.

The text was updated successfully, but these errors were encountered:

sparkprime · 2016-07-30T17:25:18Z

I wonder whether most of the problem is that std.join is currently not implemented with O(1) allocations and O(n) work like it should be.

copumpkin · 2016-07-30T17:43:18Z

That could probably be improved, but in my example above, the big speed difference came from replacing the fancy std.join call with a simpler invocation of the same function, so in my case at least I don't think join is the culprit.

fare · 2016-09-06T21:00:55Z

You can use divide and conquer to build your string in O(n log n) instead of O(n**2) with:

/**
 * @param {(value, value) -> value} combine An associative operation on values
 * @param {integer -> value} data A function that given a parameter yields a value
 * @param {integer} start The minimum parameter, inclusive (must be greater than -maxint/2)
 * @param {integer} end The maximum parameter, exclusive (must be lesser than maxint/2)
 * @return {value} The combination of all values of the function data for increasing
 * integer parameters from between start (included) and end (excluded) (where start < end)
 * This allows to do in O(n log n) what naive iteration would do in O(n^2),
 * when combining objects of size s is O(s)
 */
local fold_dichotomy_1(combine, data, start, end) =
  // assert(is_integer(start) && is_integer(end) && start < end)
  // assert( -maxint/2 < start && end < maxint/2);  // fails if overflow or underflow
  if start == end - 1 then data(start) else
    local mid = std.floor( (start + end) / 2 ) ;
    combine(fold_dichotomy_1(combine, data, start, mid),
            fold_dichotomy_1(combine, data, mid, end));

/**
 * @param {(value, value) -> value} combine An associative operation on values
 * @param {value} empty An identity value for the operation monoid
 * @param {integer -> value} data A function that given a parameter yields a value
 * @param {integer} start The minimum parameter, inclusive (must be greater than -maxint/2)
 * @param {integer} end The maximum parameter, exclusive (must be lesser than maxint/2)
 * @return {value} The combination of all values of the function data for increasing
 * integer parameters from between start (included) and end (excluded)
 * This allows to do in O(n log n) what naive iteration would do in O(n^2),
 * when combining objects of size s is O(s)
 */
local fold_dichotomy(combine, empty, data, start, end) =
  if start >= end then empty else fold_dichotomy_1(combine, data, start, end);

// O(n log n) replacement for join()
local join_strings(separator, string_array) =
  fold_dichotomy(function(x,y) x+separator+y, "", function(i) string_array[i], 0, std.length(string_array));

fare · 2016-09-06T21:06:20Z

A better solution would be that internally, jsonnet should use pure functional balanced binary trees to represent strings as well as arrays and dicts. Then, all simple operations would be O(log n).

Actually, independently from the above, we could also have special O(n) constructors based on monadic iterators: functions that when applied to an accumulator, return a triplet [isDone, nextValue, nextAccumulator]. Then performance would have the correct asymptotic behavior.

sbarzowski mentioned this issue Sep 11, 2017

More efficient array comprehensions and join #362

Open

sbarzowski added the performance label May 22, 2020

This was referenced Nov 20, 2023

stack-overflow exists in the function parse in parser.cpp #1116

Open

stack-overflow exists in the function maybeParseGreedy in parser.cpp #1117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nontrivial string processing can get very slow #222

Nontrivial string processing can get very slow #222

copumpkin commented Jul 24, 2016

sparkprime commented Jul 30, 2016

copumpkin commented Jul 30, 2016

fare commented Sep 6, 2016

fare commented Sep 6, 2016

Nontrivial string processing can get very slow #222

Nontrivial string processing can get very slow #222

Comments

copumpkin commented Jul 24, 2016

sparkprime commented Jul 30, 2016

copumpkin commented Jul 30, 2016

fare commented Sep 6, 2016

fare commented Sep 6, 2016