[ML] Use custom Boost::JSON allocator #2674

edsavage · 2024-05-28T07:25:28Z

The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The image below shows a typical series of memory allocations when using the monotonic resource allocator

The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime.

These factors make the choice of the monotonic resource allocator unsuitable for its current use.

This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile

On small data sets this change appears performant, but I do think it would be wise to run the QA tests against this PR, before merging.

edsavage · 2024-05-30T00:41:48Z

buildkite run_qa_tests

wwang500 · 2024-05-30T13:17:40Z

buildkite run_qa_tests

valeriy42

Good work on figuring out and changing the allocator behavior. I think we need to report the allocator memory usage as part of the memory_stats since it may be considerable.

When considering whether or not to update the model, we use the following code:

std::size_t CResourceMonitor::allocationLimit() const {
    return this->highLimit() - std::min(this->highLimit(), this->totalMemory());
}

Hence, we need to be transparent why the job gets into the hard_limit state while model_bytes is lower.

valeriy42 · 2024-06-04T09:23:49Z

include/api/CJsonOutputWriter.h

    //! from the CResourceMonitor via a callback
    void reportMemoryUsage(const model::CResourceMonitor::SModelSizeStats& modelSizeStats);

+    std::size_t getAllocatorMemUsage() const;


Can you please add a short documentation?

* adjust test limits

…om_allocator

edsavage · 2024-06-12T03:13:25Z

report the allocator memory usage as part of the memory_stats

Just to clarify @valeriy42 , by memory_stats do you mean the Model size stats (as reported in the counts tab in the AD job results in Kibana)? i.e. include the JSON allocator mem usage in model::CResourceMonitor::SModelSizeStats?

* Add JSON allocator memory usage to the reported model memory stats

valeriy42 · 2024-06-12T11:07:04Z

report the allocator memory usage as part of the memory_stats

Just to clarify @valeriy42 , by memory_stats do you mean the Model size stats (as reported in the counts tab in the AD job results in Kibana)? i.e. include the JSON allocator mem usage in model::CResourceMonitor::SModelSizeStats?

Exactly. Sorry for mixing up memory_stats and model_size_stats.

edsavage · 2024-06-13T02:44:18Z

buildkite build this

valeriy42

LGTM.

valeriy42 · 2024-06-17T09:19:17Z

lib/api/CModelSizeStatsJsonWriter.cc

 const std::string CATEGORIZER_STATS{"categorizer_stats"};
 const std::string PARTITION_FIELD_NAME{"partition_field_name"};
 const std::string PARTITION_FIELD_VALUE{"partition_field_value"};
+const std::string JSON_MEMORY_ALLOCATOR_BYTES("json_memory_allocator_bytes");


nit: I am wondering if this name is not too specific. Maybe output_memory_allocator_bytes?

* rename jsonMemoryAllocator -> outputMemoryAllocator

The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime. These factors make the choice of the monotonic resource allocator unsuitable for its current use. This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile

The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime. These factors make the choice of the monotonic resource allocator unsuitable for its current use. This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile Backports #2674

DaveCTurner · 2024-06-18T07:39:54Z

Sorry to say that elastic/elasticsearch#109833 thoroughly breaks the ES wire protocol, I'm going to have to revert it to fix the ES build. I guess that means something needs to be reverted here too, but I'm not qualified to address that.

[ML] Experiment with different allocator types

dbe8b9b

edsavage added >non-issue :ml v8.15.0 labels May 28, 2024

Tidy up

4ecf14e

edsavage added the ci:run-qa-tests Run a subset of the QA tests label May 29, 2024

edsavage changed the title ~~[ML] Experiment with different allocator types~~ [ML] Use custom Boost::JSON allocator May 29, 2024

Formatting

707886e

edsavage added >bug affects-results v8.13.0 v8.14.1 and removed >non-issue affects-results v8.13.0 labels May 29, 2024

Update changelog

2e4b1ac

edsavage added v8.14.0 and removed v8.14.1 labels May 29, 2024

edsavage marked this pull request as ready for review May 29, 2024 03:36

Further tidy up

36fedd5

Experiment with accounting for memory used by JSON memory allocators.

aa3e938

valeriy42 reviewed Jun 4, 2024

View reviewed changes

edsavage added 6 commits June 5, 2024 16:31

Document the new getJsonMemoryAllocatorUsage function

4f9e304

Formatting

e7f1ba0

* Fix compilation issue

9af7a89

* adjust test limits

Formatting

ad7144c

Remove troublesome trace logging

9213537

Merge branch 'main' of github.com:elastic/ml-cpp into boost_json_cust…

11925f9

…om_allocator

edsavage added 3 commits June 12, 2024 16:26

Attend to review comments

7773f8a

* Add JSON allocator memory usage to the reported model memory stats

Adjust format of test code

bbfe911

Override clang format for test code

89dc65e

edsavage mentioned this pull request Jun 13, 2024

[ML] Handle the "output memory allocator bytes" field elastic/elasticsearch#109653

Merged

Fix ES test runner script

889d128

valeriy42 approved these changes Jun 17, 2024

View reviewed changes

Attend to review comments

e72eb49

* rename jsonMemoryAllocator -> outputMemoryAllocator

edsavage merged commit 7d08ac6 into elastic:main Jun 18, 2024

edsavage mentioned this pull request Jun 18, 2024

[8.14][ML] Use custom Boost::JSON allocator (#2674) #2682

Merged

edsavage deleted the boost_json_custom_allocator branch September 19, 2024 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Use custom Boost::JSON allocator #2674

[ML] Use custom Boost::JSON allocator #2674

edsavage commented May 28, 2024 •

edited

Loading

Uh oh!

edsavage commented May 30, 2024

Uh oh!

wwang500 commented May 30, 2024

Uh oh!

valeriy42 left a comment

Uh oh!

valeriy42 Jun 4, 2024

Uh oh!

edsavage commented Jun 12, 2024

Uh oh!

valeriy42 commented Jun 12, 2024

Uh oh!

edsavage commented Jun 13, 2024

Uh oh!

valeriy42 left a comment

Uh oh!

valeriy42 Jun 17, 2024

Uh oh!

DaveCTurner commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ML] Use custom Boost::JSON allocator #2674

[ML] Use custom Boost::JSON allocator #2674

Conversation

edsavage commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edsavage commented May 30, 2024

Uh oh!

wwang500 commented May 30, 2024

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

valeriy42 Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

edsavage commented Jun 12, 2024

Uh oh!

valeriy42 commented Jun 12, 2024

Uh oh!

edsavage commented Jun 13, 2024

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

valeriy42 Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edsavage commented May 28, 2024 •

edited

Loading