Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: improve event.clone memory usage #11794

Closed
wants to merge 1 commit into from

Conversation

kares
Copy link
Contributor

@kares kares commented Apr 15, 2020

by doing copy-on-write strings when deep cloning

this is motivated "big" events reaching plugins such as split
which might produce several new events out of a single one ...

did a naive test using LS_JAVA_OPTS="-Xms300m -Xmx300m" with a JSON event prototype:

input {
  generator {
    message => '
{ "data":
  [{"type": "articles",
    "id": "1",
    "attributes": {
      "title": "JSON:API paints my bikeshed!",
      "body": "The shortest article. Ever. Well maybe not, but still quite short ... 01234567890",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z"
    }
  },
  {"type": "articles",
    "id": "2",
    "attributes": {
      "title": "2JSON:API paints my bikeshed!",
      "body": "The shortest article. Ever.",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z"
    }
  },
  {"type": "articles",
    "id": "3",
    "attributes": {
      "title": "3JSON:API paints my bikeshed!",
      "body": "The shortest article. Ever.",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z"
    }
  },
  {"type": "articles",
    "id": "4",
    "attributes": {
      "title": "4JSON:API paints my bikeshed!",
      "body": "              ",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z"
    }
  },
  {"type": "articles",
    "id": "5",
    "attributes": {
      "title": "55555",
      "body": "a body - a body - a body - a body - a body",
      "created": "2015-05-22T14:56:29.000Z",
      "updated": "2015-05-22T14:56:28.000Z"
    }
  }
  ]
}
'
    codec => 'json'
    count => 1800
  }
}

filter {
  mutate { copy => { "sequence" => "[data][0][sequence]" } }
  mutate { copy => { "sequence" => "[data][1][sequence]" } }
  mutate { copy => { "sequence" => "[data][2][sequence]" } }
  mutate { copy => { "sequence" => "[data][3][sequence]" } }
  mutate { copy => { "sequence" => "[data][4][sequence]" } }
  split {
    field => 'data' # each event -> 5 clones
    target => 'json1_data'
  }
  split {
    field => 'data' # each event -> 5 clones
    target => 'json2_data'
  }
  split {
    field => 'data' # each event -> 5 clones
    target => 'json3_data'
  }
}

(1) doClone is getting out-of-memory spliting ~ 850-th event (after 5:30 minutes)
(1) strDup COW is getting stuck ~ 1550-th event (after 6:00 minutes)

The setup is very artificial but serves well and was motivated by a real-world scenario which was actually way more "agressive" - spliting on an array with 100s of elements (here we only have 5).

@yaauie
Copy link
Member

yaauie commented Apr 15, 2020

Test fail because we explicitly check for a different Bytelist object in ClonerTest.

AFAICT, we exclusively send RubyString#getBytelist in that ClonerTest and in our EventCondition implementations, which appear to cache the ByteLists of the RubyStrings they are instantiated with (from config IR) in order to improve lookup time.

Since RubyString#modify() produces a new copy of the underlying Bytelist for the object being modified, EventCondition's use of Bytelist is safe if either:

  • The objects it is instantiated with are not the products of cloning or subject future to cloning; OR
  • The Bytelists retrieved by RubyString#getBytelist are exclusively used for read operations.

@kares
Copy link
Contributor Author

kares commented Apr 16, 2020

Hey Ry, thanks. You're way ahead of me 🥰 was doing some LS testing to "prove" this makes sense.
Updated the PR description with results - they confirm this will be a huge memory saver in certain event.clone cases, will try getting this out of the draft form now by fixing tests and concerns.

@kares
Copy link
Contributor Author

kares commented Apr 16, 2020

EventCondition implementations, which appear to cache the ByteLists of the RubyStrings they are instantiated with (from config IR) in order to improve lookup time.

I've checked EventCondition impls and I do not see any potential (new) issues there.
These ByteList extractions are read-only and if they had issues they would be problematic with SHARED_BYTELIST coming in from other places not just from potentially event.clone sharing bytes.

Actually not sure doing indexOf with ByteList is 100% accurate (in case the encoding would be incompatible that operation makes no sense or if a multi-byte UTF-8 could ever in part match a single byte UTF-8 char) but I do not want to diverge here.

@kares kares marked this pull request as ready for review April 16, 2020 10:34
@kares kares requested a review from yaauie April 16, 2020 10:34
for Strings with copy-on-write semantics when deep cloning.

motivated by "big" events reaching plugins such as split,
which might produce several new events out of a single one.
@kares kares added the v7.8.0 label May 7, 2020
@elasticsearch-bot
Copy link

Karol Bucek merged this into the following branches!

Branch Commits
master 413a7fe
7.x 1cc1ce3
7.8 0dc71a7

elasticsearch-bot pushed a commit that referenced this pull request May 7, 2020
for Strings with copy-on-write semantics when deep cloning.

motivated by "big" events reaching plugins such as split,
which might produce several new events out of a single one.

Fixes #11794
elasticsearch-bot pushed a commit that referenced this pull request May 7, 2020
for Strings with copy-on-write semantics when deep cloning.

motivated by "big" events reaching plugins such as split,
which might produce several new events out of a single one.

Fixes #11794
@kares kares deleted the perf-event-clone-cow branch May 7, 2020 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants