Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow lists to be built from iterators #486

Closed
wants to merge 4 commits into from
Closed

Allow lists to be built from iterators #486

wants to merge 4 commits into from

Conversation

josevalim
Copy link
Contributor

In particular, allow lists to be built from
double ended iterators, which avoids allocating
intermediate vectors.

In particular, allow lists to be built from
double ended iterators, which avoids allocating
intermediate vectors.
@filmor
Copy link
Member

filmor commented Sep 11, 2022

Nice! Did you compare the performance of both approaches?

@josevalim
Copy link
Contributor Author

make_list_from_end was actually 20% faster when encoding arrow arrays. I am yet to benchmark if it makes the default Vector encoding faster too. Later today I Will have some numbers. :)

@josevalim
Copy link
Contributor Author

josevalim commented Sep 11, 2022

Ok, so this is tricky. Here are the numbers:

## New (this PR)

Name               ips        average  deviation         median         99th %
16 elems        1.03 M      971.63 ns  ±1498.29%         958 ns        1083 ns
32 elems      668.49 K        1.50 μs   ±918.15%        1.46 μs        1.58 μs
128 elems     220.57 K        4.53 μs   ±388.58%        4.38 μs        4.75 μs
1k elems       31.51 K       31.73 μs     ±5.36%       31.29 μs       36.92 μs
1M elems         28.28       35.37 ms     ±7.33%       33.73 ms       40.14 ms

## Old (master)

Name               ips        average  deviation         median         99th %
16 elems      915.15 K        1.09 μs  ±1399.89%        1.08 μs        1.21 μs
32 elems      696.04 K        1.44 μs   ±943.38%        1.42 μs        1.54 μs
128 elems     274.22 K        3.65 μs   ±289.96%        3.54 μs        3.88 μs
1k elems       42.16 K       23.72 μs    ±67.46%       23.17 μs       29.50 μs
1M elems         38.43       26.02 ms     ±7.68%       24.69 ms       30.16 ms

You can see this PR is faster for small lists but then it gets slower, most likely because enif_make_list_from_array allocates the whole area upfront, while this PR has to allocate cell by cell. On the other hand, the current approach uses less memory.

I will be glad to revert the changes to rustler::types::list if you would prefer to not change the default encoding.

EDIT: After thinking a bit more about it, I would argue that this PR may be the way to go:

  1. This PR is faster for small lists (say up to 16 elems)
  2. For larger lists, it is slower, but that's when avoiding the double allocation matters more

@hansihe
Copy link
Member

hansihe commented Sep 11, 2022

With regards to erlang/otp#6293:

There are a couple of places I can think of where the performance difference might come from between approach 1. and 2.:

a. The alloc/free of the buffer used to store terms passed to enif_make_list_from_array
b. The allocation on the process heap inside enif_make_list_from_array/enif_make_list_cell
c. The function call overhead each time enif_make_list_cell is called

The two different approaches are affected by the different factors as follows:

  1. a and b dominates
  2. b and c dominates

I have nothing hard to base this on, but I have a gut feeling the reason why approach 2 gets slower is because c becomes the a pretty large dominating factor.

If this is the case, introducing an API like enif_alloc_list/enif_set_list might not improve things by much.

A solution to this would be to introduce some way of allowing our code to populate the data on the process heap directly without going through a function call. This might be a hard sell to the OTP team, because (at least in the simplest approach) it would require compiled NIFs to have some knowledge on how lists are laid out on process heaps.

Of course all of this is very speculative on my end, but I think it would be useful to validate what causes performance to drop before we introduce new NIF APIs that might not improve things by much.

@josevalim
Copy link
Contributor Author

@hansihe if the NIF cost is the culprit, then we can emulate it? Here is what I did, I changed encode to perform a NIF call to enif_make_list(env, 0), which is as cheap as a NIF can be:

impl<T> Encoder for [T]
where
    T: Encoder,
{
    fn encode<'b>(&self, env: Env<'b>) -> Term<'b> {
        let env_as_c_arg = env.as_c_arg();
        let term_array: Vec<NIF_TERM> = self.iter().map(|x| {
            unsafe { rustler_sys::_enif_make_list(env_as_c_arg, 0) };
            x.encode(env).as_c_arg()
        }).collect();
        unsafe { Term::new(env, list::make_list(env.as_c_arg(), &term_array)) }
    }
}

These are the results:

16 elems       945.72 K        1.06 μs  ±1417.14%        1.04 μs        1.17 μs
32 elems       703.49 K        1.42 μs   ±889.53%        1.38 μs        1.58 μs
128 elems      278.42 K        3.59 μs   ±367.05%        3.46 μs        3.88 μs
1k elems        43.41 K       23.04 μs    ±51.07%       22.46 μs       29.45 μs
1M elems          37.20       26.88 ms     ±7.57%       25.52 ms       31.26 ms

As you can see, it is slightly lower for the large case, but sometimes even slightly faster, so I don't think the NIF call adds to much? WDYT?

@hansihe
Copy link
Member

hansihe commented Sep 11, 2022

@josevalim Smart way of testing it!

Seems I was very off target with what cost dominated. The API you suggested would probably work pretty well

@josevalim josevalim closed this by deleting the head repository Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants