Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant memory pools #61315

Merged
merged 1 commit into from Aug 2, 2022
Merged

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 23, 2022

Memory pools via PagedAllocator for Transform2D, Transform3D, Basis and AABB.

Simplified version of #58942 in response to PR meeting feedback, using a couple of bucket sizes - one for Transform2D and AABB (6 reals), and one for Basis and Transform3D (12 reals).

Notes

  • In the PR meeting it was agreed to just pool the Variant sub types for now, using a couple of bucket sizes.
  • The only non-obvious part of this PR is we needed a way of turning off the PagedAllocator error messages during the fork in detect_prime(), otherwise we would get false error messages. See Use memory pools for Variant extended types #58633 for more details.
  • The _err_print_error change is to allow at least basic printing of errors outside the lifetime of OS (for instance if OS is destroyed before PagedAllocator pools). Order of construction / destruction requires some care with pools such as these as other modules depend on them being present during startup and shutdown.

@akien-mga
Copy link
Member

@Calinou Would you be able to run quick benchmarks on this to see if it makes a significant change?

@lawnjelly
Copy link
Member Author

lawnjelly commented Jun 28, 2022

PR Meeting:

  • print_string.h contains some similar globals, we can put these all together in a header in a namespace to keep them together.

Update:
I have now created a core/core_globals.h file, and moved leak_reporting_enabled into there as well as the globals from print_string.h. As before, let me know any better ideas for naming etc for this.

@lawnjelly lawnjelly requested a review from a team as a code owner June 28, 2022 15:15
@lawnjelly lawnjelly force-pushed the variant_bucket_pools branch 2 times, most recently from 7a312bb to ab8c72b Compare June 28, 2022 15:26
@lawnjelly lawnjelly requested a review from a team as a code owner June 28, 2022 15:26
@Calinou
Copy link
Member

Calinou commented Jun 28, 2022

@Calinou Would you be able to run quick benchmarks on this to see if it makes a significant change?

I'll try to do that this week.

@Calinou
Copy link
Member

Calinou commented Jul 3, 2022

I ran a benchmark comparing this PR to the commit on master before this PR. Performance seems to be identical within margin of error:

SCons flags: tools=yes target=release_debug use_lto=yes

Opening and quitting project manager

Benchmark #1: bin/godot.linuxbsd.opt.tools.64 --quit
  Time (mean ± σ):      2.518 s ±  0.049 s    [User: 1.282 s, System: 0.192 s]
  Range (min … max):    2.451 s …  2.571 s    10 runs
 
Benchmark #2: bin/godot.linuxbsd.opt.tools.64.variant_bucket_pools --quit
  Time (mean ± σ):      2.500 s ±  0.055 s    [User: 1.282 s, System: 0.193 s]
  Range (min … max):    2.435 s …  2.584 s    10 runs
 
Summary
  'bin/godot.linuxbsd.opt.tools.64.variant_bucket_pools --quit' ran
    1.01 ± 0.03 times faster than 'bin/godot.linuxbsd.opt.tools.64 --quit'

Opening and quitting editor

Benchmark #1: bin/godot.linuxbsd.opt.tools.64 /tmp/4/project.godot --quit
  Time (mean ± σ):      4.224 s ±  0.444 s    [User: 2.990 s, System: 0.312 s]
  Range (min … max):    3.616 s …  4.625 s    10 runs
 
Benchmark #2: bin/godot.linuxbsd.opt.tools.64.variant_bucket_pools /tmp/4/project.godot --quit
  Time (mean ± σ):      4.246 s ±  0.433 s    [User: 2.982 s, System: 0.321 s]
  Range (min … max):    3.623 s …  4.633 s    10 runs
 
Summary
  'bin/godot.linuxbsd.opt.tools.64 /tmp/4/project.godot --quit' ran
    1.01 ± 0.15 times faster than 'bin/godot.linuxbsd.opt.tools.64.variant_bucket_pools /tmp/4/project.godot --quit'

@lawnjelly
Copy link
Member Author

Yes I'll try and rebase this and also run some benchmarks.

Expected it should be pretty similar for a lot of cases, as malloc (on desktop linux at least) is pretty good in the general case. At the least, pooling should not be slower than malloc, provided the implementation is sound, and your tests confirm this.

To quote from wikipedia on benefits:

  • Memory pools allow memory allocation with constant execution time. The memory release for thousands of objects in a pool is just one operation, not one by one if malloc is used to allocate memory for each object (Note: We don't use bulk deletion in PagedAllocator).
  • Memory pools can be grouped in hierarchical tree structures, which is suitable for special programming structures like loops and recursions.
  • Fixed-size block memory pools do not need to store allocation metadata for each allocation, describing characteristics like the size of the allocated block. Particularly for small allocations, this provides substantial space savings.
  • Allows deterministic behavior on real-time systems avoiding the out of memory errors.

For myself the main attraction is the O(1) constant time operation, and the lack of fragmentation, housekeeping data and padding. Essentially you can really hammer a memory pool in recursive functions etc and not expect any glitches, whereas the same is not guaranteed with malloc.

Also although I'm leading with this on 4.x, I'm hoping to do the same on 3.x, where the fragmentation can lead to problems particularly on 32 bit OS : e.g. #61835

Juan has also pointed out that using PagedAllocator means allocations of same time will often be closer together in memory, which may have cache advantages. But this kind of thing is quite hard to quantify, because it depends on historical allocations.

Memory pools via PagedAllocator for Transform2D, Transform3D, Basis and AABB.
@lawnjelly
Copy link
Member Author

lawnjelly commented Jul 4, 2022

Quick really naive benchmark from c++ suggests this PR is a little faster taking 3/4 of the time of old version:

void BenchmarkVariant::run()
{
	const int NUM_ITERATIONS = 10000000;
	
	AABB val;
	uint64_t before;
	
	
	before = OS::get_singleton()->get_ticks_msec();
	for (int n=0; n<NUM_ITERATIONS; n++)
	{
		Variant v2 = Variant(AABB());
		AABB aabb = v2;
		val.merge_with(aabb);
	}
	
	uint64_t takenA = OS::get_singleton()->get_ticks_msec() - before;
	print_line(val);
	print_line("took A " + itos(takenA) + " ms.");

	before = OS::get_singleton()->get_ticks_msec();
	for (int n=0; n<NUM_ITERATIONS; n++)
	{
		AABB * test = memnew(AABB());
		AABB aabb = *test;
		val.merge_with(aabb);
		
		memdelete(test);
	}
	
	
	uint64_t takenB = OS::get_singleton()->get_ticks_msec() - before;
	print_line(val);
	print_line("took B " + itos(takenB) + " ms.");


	before = OS::get_singleton()->get_ticks_msec();
	for (int n=0; n<NUM_ITERATIONS; n++)
	{
		Variant v2 = Variant(AABB());
		AABB aabb = v2;
		val.merge_with(aabb);
	}
	
	uint64_t takenC = OS::get_singleton()->get_ticks_msec() - before;
	print_line(val);
	print_line("took C " + itos(takenC) + " ms.");
	
}

Result:
[P: (0, 0, 0), S: (0, 0, 0)]
took A 303 ms.
[P: (0, 0, 0), S: (0, 0, 0)]
took B 408 ms.
[P: (0, 0, 0), S: (0, 0, 0)]
took C 303 ms.

Test B is actually just to save me compiling everything twice, as the old Variant news and deletes an AABB each time, so it is pretty much doing the same thing (maybe a little less in fact that the actual old Variant). Test C is just a confirmation repeat of A, to check for things like hot cache effects etc.

So for my system (Linux Mint) the PagedAllocator appears to be > about 4/3 faster than malloc in this situation of hammering the allocator. I'm assuming @Calinou 's test before was just timing the startup, in which case it is unlikely to make a lot of difference. I'm also assuming the optimizer isn't doing something to throw off the timings, which is always a possibility with these things (the print statements are to prevent optimizing out).

But as I say the primary reason (in my mind) is for the O(1) constant time allocation / deallocation, any increase in speed is a nice bonus.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Also approved by @reduz in PR meeting.

@akien-mga akien-mga merged commit 33258d8 into godotengine:master Aug 2, 2022
@akien-mga
Copy link
Member

Thanks!

@lawnjelly lawnjelly deleted the variant_bucket_pools branch August 2, 2022 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants