-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Use custom Boost::JSON allocator #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
buildkite run_qa_tests |
1 similar comment
|
buildkite run_qa_tests |
valeriy42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on figuring out and changing the allocator behavior. I think we need to report the allocator memory usage as part of the memory_stats since it may be considerable.
When considering whether or not to update the model, we use the following code:
std::size_t CResourceMonitor::allocationLimit() const {
return this->highLimit() - std::min(this->highLimit(), this->totalMemory());
}
Hence, we need to be transparent why the job gets into the hard_limit state while model_bytes is lower.
include/api/CJsonOutputWriter.h
Outdated
| //! from the CResourceMonitor via a callback | ||
| void reportMemoryUsage(const model::CResourceMonitor::SModelSizeStats& modelSizeStats); | ||
|
|
||
| std::size_t getAllocatorMemUsage() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a short documentation?
Just to clarify @valeriy42 , by |
* Add JSON allocator memory usage to the reported model memory stats
Exactly. Sorry for mixing up |
|
buildkite build this |
valeriy42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
lib/api/CModelSizeStatsJsonWriter.cc
Outdated
| const std::string CATEGORIZER_STATS{"categorizer_stats"}; | ||
| const std::string PARTITION_FIELD_NAME{"partition_field_name"}; | ||
| const std::string PARTITION_FIELD_VALUE{"partition_field_value"}; | ||
| const std::string JSON_MEMORY_ALLOCATOR_BYTES("json_memory_allocator_bytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am wondering if this name is not too specific. Maybe output_memory_allocator_bytes?
* rename jsonMemoryAllocator -> outputMemoryAllocator
The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime. These factors make the choice of the monotonic resource allocator unsuitable for its current use. This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile
The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime. These factors make the choice of the monotonic resource allocator unsuitable for its current use. This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile Backports #2674
|
Sorry to say that elastic/elasticsearch#109833 thoroughly breaks the ES wire protocol, I'm going to have to revert it to fix the ES build. I guess that means something needs to be reverted here too, but I'm not qualified to address that. |
The current code uses the monotonic resource allocator, for allocating memory to
boost::jsonobjects, which allocates memory in ever increasing chunks, which can lead to over allocation. The image below shows a typical series of memory allocations when using the monotonic resource allocatorThe other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name
monotonicas resource allocations can only increase during its lifetime.These factors make the choice of the monotonic resource allocator unsuitable for its current use.
This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard
operator::newand::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profileOn small data sets this change appears performant, but I do think it would be wise to run the QA tests against this PR, before merging.