Vector<T> does not yield performance results that capacity usually would #24731

Zylann · 2019-01-03T02:47:45Z

Godot 3.1 b60939b
Windows 10 64 bits

Today I got stuck in an optimization problem: I am generating a lot of 2D geometry to be drawn often with canvas_item_add_triangle_array. This forces to use Vectors, which I decided re-use each time the geometry gets generated. But it keeps lagging at 6 ms just doing that since I started using it (without even including the call to draw)...

Then, I realized something for good about Vector:

The two following codes take the same amount of time to complete.

	for(int i = 0; i < 10000; ++i) {
		Vector<uint8_t> test; // Never re-use it
		test.resize(5000);
		test.resize(2);
	}

	Vector<uint8_t> test; // Re-use it
	for(int i = 0; i < 10000; ++i) {
		// I expected this to be free past the first iteration... it isn't.
		test.resize(5000);
		test.resize(2);
	}

So it appears there is no way to re-use a Vector and hope it will be cost-free over repeated uses, which prevents from making my code faster. Is that really it? Is it a bug?
To describe more my problem, I tried rewriting my code in a way it never clears vectors, and instead resizes only once to the correct size before calling canvas_item_add_triangle_array... and I gained 200% speed, yet it still has to resize at the end (I am lucky enough to draw geometry that often roughly has the same amounts of vertices).

So Vector does copy on write, it's easy to pass around I guess, and it resizes by power of two. But how should we deal with this case? Are we stuck allocating memory each time and half-reimplementing capacity on top of it if we want more speed? I mean, it does sound like it has capacity somehow, but it doesn't yields the results I thought.

Note: this would also concern Line2D since it uses the same method too.

The text was updated successfully, but these errors were encountered:

hpvb · 2019-01-03T02:53:45Z

A Vector is under the hood an array of T. What we currently clear() we just call resize(0) on the underlaying CowData (previously we did the same btw).

Some suggestions:
a) Add an int parameter defaulting to 0 to clear() which gives the desired capacity after the clear. Then only destroy all elements rather than resizing.
b) Add a new method which does the above.
c) Add a new member to vector that gives it a minimum capacity (perhaps on creation?) which then does the case of (a)

Thoughts anyone? @reduz perhaps?

hpvb · 2019-01-03T02:55:26Z

Another option could be for Vector to know whether or not it is filled with POD types. We can do a template specialization for that case then and upon clear()/resize() etc just drop the elements on the floor since we won't have to call any dtors anyway.

We can probably achieve this with C++03 but it'd be much prettier with 11 :)

(It would still need some kind of notion of a 'capacity' though)

hpvb · 2019-01-03T03:09:05Z

Another option is actually to just use __is_pod, this is a compiler intrinsic supported by clang, msvc, and gcc. Then have a specialization for that and don't do anything with capacity at all. This would still mean a resize operation but it would be much, much faster. It may really be all that's necessary.

Zylann · 2019-01-03T21:15:52Z

@hpvb here are the times I get from the same benchmark:

Debug, without your PR:

Re-using the vector: 515ms
Not re-using the vector: 539ms

Debug, with your PR:

Re-using the vector: 39ms
Not re-using the vector: 65ms

Release-tools, without your PR:

Re-using the vector: 225ms
Not re-using the vector: 336ms

Release-tools, with your PR:

Re-using the vector: 39ms
Not re-using the vector: 63ms

Note: there seem to be a bit of random at stake, because some rare times, for the same test, the total durations I recorded were up to 3 times less or more, so I wrote down the most occurring times. I assume it's due to core switches or the way memory is rearranged.

Now, to better highlight my concern, I also tested my geometry generation code with an std::vector, in release_debug build:

With Godot Vector: 3ms
With std::vector: 0.18ms

(for a quick look of what I'm doing, it's a code minimap renderer, which is just a bunch of coloured lines, and I use vectors for batching: https://sebsauvage.net/paste/?c29ea94d9fa3949a#MrxGYB4a5NwuONNYAEBoxcmar9sZyamLZJHnCaJWKHg=)

It really looks like the bottleneck lies in clear() and resize(), because I even tried working on a raw pointer using ptrw() and that didn't improve it.

Zylann · 2019-04-13T20:49:36Z

I got another insight today by switching my Minecrafty mesher from Vector to std::vector, in particular these ones.
This mesher may generate one or more meshes per frame, and I logged the one taking the most time, in microseconds:

On the left, Godot release build using Vector to build up mesh data before creating the Mesh.
On the right, Godot release build, using std::vector instead.

As you can see, just changing the container at least DOUBLED how fast the mesher can turn voxels into mesh data in a release build, which is directly noticeable in the game without even looking at numbers. That means I can modify more voxels at once with less visual lag. I think that's enough to seriously consider an improvement to this container.

This gain is caused by the way I used the vector: I keep them around for re-use, and push_back/resize/clear are frequently called. std::vector grows a buffer more and more as capacity and reallocs less and less often. Vector appears to alloc upfront by power of two as well, but keeps calling realloc on every call and gets rid of it if size is zero. Hence std::vector calls are much cheaper.

Another note:
This container change is inside my module, so I can do my own soup easily (I basically took it away from everywhere by now).
BUT, Vector is mandatory in VisualServer APIs, and there is no intention of changing that. This means any heavy mesh generation routine (like the one I do in my voxel module but also the one in Line2D and cie.) is slowed down by the copy/allocs/resizes required to get a Vector. It's not even possible to re-use it because of that whole issue. I would be happy to use a Vector all the way if it was as fast as std::vector.

Calinou · 2020-05-19T19:38:08Z

The pull request that would originally close this wasn't merged, but since #38386 was, I'll close this.

Zylann · 2020-05-19T19:43:54Z

I need to profile it again some day

raymoo · 2020-05-19T22:18:05Z

Does this mean that the visual server can take LocalVectors now as parameters? I can see LocalVector::operator Vector<T>() const, but it appears to just create a new Vector<T> and then resize it, which would still be allocating the whole thing. I don't see any overload for canvas_item_add_triangle_array that takes LocalVector<*> directly. Am I understanding this correctly?

hpvb added enhancement feature proposal topic:core labels Jan 3, 2019

hpvb mentioned this issue Jan 3, 2019

Optimizations for trivial types #24732

Merged

vblanco20-1 mentioned this issue Jul 23, 2019

Using the slowest data structure almost every time. #23998

Closed

Zylann mentioned this issue Jul 24, 2019

Arrays and vectors always resize after a push or a remove #22561

Closed

profan mentioned this issue Aug 27, 2019

Add size tracking to CowData, make it never reallocate when shrinking. #31694

Closed

Zylann mentioned this issue Jan 4, 2020

Integrated the new NavigationServer and NavigationServer2D #34776

Merged

reduz mentioned this issue Mar 28, 2020

Create a LocalVector with capacity. #37370

Closed

Calinou closed this as completed May 19, 2020

Calinou added this to the 4.0 milestone May 19, 2020

Calinou removed the feature proposal label May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector<T> does not yield performance results that capacity usually would #24731

Vector<T> does not yield performance results that capacity usually would #24731

Zylann commented Jan 3, 2019 •

edited

hpvb commented Jan 3, 2019

hpvb commented Jan 3, 2019 •

edited

hpvb commented Jan 3, 2019

Zylann commented Jan 3, 2019 •

edited

Zylann commented Apr 13, 2019 •

edited

Calinou commented May 19, 2020 •

edited

Zylann commented May 19, 2020

raymoo commented May 19, 2020 •

edited

Vector<T> does not yield performance results that capacity usually would #24731

Vector<T> does not yield performance results that capacity usually would #24731

Comments

Zylann commented Jan 3, 2019 • edited

hpvb commented Jan 3, 2019

hpvb commented Jan 3, 2019 • edited

hpvb commented Jan 3, 2019

Zylann commented Jan 3, 2019 • edited

Zylann commented Apr 13, 2019 • edited

Calinou commented May 19, 2020 • edited

Zylann commented May 19, 2020

raymoo commented May 19, 2020 • edited

Zylann commented Jan 3, 2019 •

edited

hpvb commented Jan 3, 2019 •

edited

Zylann commented Jan 3, 2019 •

edited

Zylann commented Apr 13, 2019 •

edited

Calinou commented May 19, 2020 •

edited

raymoo commented May 19, 2020 •

edited