Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nontrivial string processing can get very slow #222

Open
copumpkin opened this issue Jul 24, 2016 · 4 comments
Open

Nontrivial string processing can get very slow #222

copumpkin opened this issue Jul 24, 2016 · 4 comments

Comments

@copumpkin
Copy link

I think it's mostly caused the back-and-forth between lists of characters and strings.

For example, I have a function that takes a list of strings and "camelCases" the list, so that ["foo", "bar", "baz"] turns into "fooBarBaz". Using that function on lots of strings in my larger evaluation makes the whole thing take about 40s. Replacing it with a simple function (x) std.join("-", x) for hyphenated names reduces total evaluation time to 10s, keeping other things equal.

The camelcase code is as follows:

local upper(c): local cp = std.codepoint(c); if cp >= 97 && cp < 123 then std.char(cp - 32) else c;
local capitalize(str): util.upper(str[0]) + std.substr(str, 1, std.length(str) - 1);
local zipWithIndex(f, list): std.makeArray(std.length(list), function (i) f(i, list[i]));
local camelcase(list): std.join("", util.zipWithIndex(function (i, x) if i == 0 then x else util.capitalize(x), list));

Not the most beautiful thing ever, but it works as intended, other than being super slow.

@sparkprime
Copy link
Member

I wonder whether most of the problem is that std.join is currently not implemented with O(1) allocations and O(n) work like it should be.

@copumpkin
Copy link
Author

That could probably be improved, but in my example above, the big speed difference came from replacing the fancy std.join call with a simpler invocation of the same function, so in my case at least I don't think join is the culprit.

@fare
Copy link

fare commented Sep 6, 2016

You can use divide and conquer to build your string in O(n log n) instead of O(n**2) with:

/**
 * @param {(value, value) -> value} combine An associative operation on values
 * @param {integer -> value} data A function that given a parameter yields a value
 * @param {integer} start The minimum parameter, inclusive (must be greater than -maxint/2)
 * @param {integer} end The maximum parameter, exclusive (must be lesser than maxint/2)
 * @return {value} The combination of all values of the function data for increasing
 * integer parameters from between start (included) and end (excluded) (where start < end)
 * This allows to do in O(n log n) what naive iteration would do in O(n^2),
 * when combining objects of size s is O(s)
 */
local fold_dichotomy_1(combine, data, start, end) =
  // assert(is_integer(start) && is_integer(end) && start < end)
  // assert( -maxint/2 < start && end < maxint/2);  // fails if overflow or underflow
  if start == end - 1 then data(start) else
    local mid = std.floor( (start + end) / 2 ) ;
    combine(fold_dichotomy_1(combine, data, start, mid),
            fold_dichotomy_1(combine, data, mid, end));

/**
 * @param {(value, value) -> value} combine An associative operation on values
 * @param {value} empty An identity value for the operation monoid
 * @param {integer -> value} data A function that given a parameter yields a value
 * @param {integer} start The minimum parameter, inclusive (must be greater than -maxint/2)
 * @param {integer} end The maximum parameter, exclusive (must be lesser than maxint/2)
 * @return {value} The combination of all values of the function data for increasing
 * integer parameters from between start (included) and end (excluded)
 * This allows to do in O(n log n) what naive iteration would do in O(n^2),
 * when combining objects of size s is O(s)
 */
local fold_dichotomy(combine, empty, data, start, end) =
  if start >= end then empty else fold_dichotomy_1(combine, data, start, end);

// O(n log n) replacement for join()
local join_strings(separator, string_array) =
  fold_dichotomy(function(x,y) x+separator+y, "", function(i) string_array[i], 0, std.length(string_array));

@fare
Copy link

fare commented Sep 6, 2016

A better solution would be that internally, jsonnet should use pure functional balanced binary trees to represent strings as well as arrays and dicts. Then, all simple operations would be O(log n).

Actually, independently from the above, we could also have special O(n) constructors based on monadic iterators: functions that when applied to an accumulator, return a triplet [isDone, nextValue, nextAccumulator]. Then performance would have the correct asymptotic behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants