Importing a large linux perf file will crash the application #433

Goose97 · 2023-06-27T16:26:58Z

Hi, thank you for your work on this awesome project.

Back to the story, I was importing a large file (linux perf tool format), around ~1.3 GB and it crashed the application. The root cause is this line. When we're splitting the file into blocks, a block size can exceeds V8 strings limit. The situation is similar to #385.

My plan is:

change splitBlocks signature to

splitBlocks(contents: TextFileContent): TextFileContent[]

pending will now use StringBackedTextFileContent | BufferBackedTextFileContent. It will initially use StringBackedTextFileContent for performance then fallback to use BufferBackedTextFileContent if the block size exceeds the limit. Something like this

try {
  pending += line
} catch (error) {
  if (check(error)) {
    // convert to BufferBackedTextFileContent and continue
  }
}

To do this, we might need to add some methods to TextFileContent interface:

isEmpty(): to replace pending.length > 0
append(): to replace string concatenation

Another approach that I'm considering is to implement a generic splitBy(delimiter: string) for TextFileContent. I haven't thought it through yet but it seems like a non-trivial thing.

The file is fairly large file so I won't attach it here. I still keep it in my computer in case you want to have a look.

The text was updated successfully, but these errors were encountered:

jlfwong · 2023-06-27T22:05:54Z

Hey @Goose97!

Thanks for the very clear description of the problem you're trying to solve, and the links to the specific parts of the code!

If I'm understanding the problem correctly, I think there's a simpler solution that's probably more performant.

The splitBlocks function doesn't really need to exist at all. It was just a simple way to reason about the problem.

Instead, I think we can change the importer to just operate on the output of splitLines() directly without using \n\n delimited blocks as an intermediate. This also has the benefit of not doubling the memory requirements.

If we do your proposed change, I think we end up with the 1.3GB in memory three times: once as a buffer, once as the return value of .splitLines() and once as the TextFileContent[]. If we skip the splitBlocks, then we just have the first two.

Really, ideally, we'd have only the one in the buffer and splitLines() should return an iterator instead, then we'd only have the 1.3GB in memory once. That's a separate issue than the one you're trying to solve though.

Goose97 · 2023-06-28T01:15:58Z

Hi @jlfwong,

That makes a lot of sense. I'll try an PR, going the way you suggested.

I guess I'll go ahead and convert the splitLines() to return an iterator first (neat idea btw), then come back to this issue. Is that cool?

jlfwong · 2023-06-28T03:51:14Z

I guess I'll go ahead and convert the splitLines() to return an iterator first (neat idea btw), then come back to this issue. Is that cool?

Yeah, if you'd like to tackle that, then please go ahead!

Please make it a separate PR from the addressing the issue described here (so one of the iterator, one for the perf file fix)

If, in the process of doing that, you realize it's kind of a pain to do, it's also fine to address the linux perf-file specific issue without solving the iterator.

The current behavior of splitLines is to eagerly split all the lines and return an array of strings. This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines) Relates to jlfwong#433

Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385. This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first. Fixes jlfwong#433

The current behavior of splitLines is to eagerly split all the lines and return an array of strings. This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines) Relates to #433

Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385. This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first. Fixes jlfwong#433

Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in #385. This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first. Fixes #433

The current behavior of splitLines is to eagerly split all the lines and return an array of strings. This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines) Relates to jlfwong#433

Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385. This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first. Fixes jlfwong#433

Goose97 mentioned this issue Jun 28, 2023

Improve splitLines: return iterator instead #434

Merged

Goose97 mentioned this issue Jun 28, 2023

Fix crash when importing big linux perf tool files #435

Merged

jlfwong closed this as completed in #435 Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing a large linux perf file will crash the application #433

Importing a large linux perf file will crash the application #433

Goose97 commented Jun 27, 2023

jlfwong commented Jun 27, 2023 •

edited

Loading

Goose97 commented Jun 28, 2023

jlfwong commented Jun 28, 2023

Importing a large linux perf file will crash the application #433

Importing a large linux perf file will crash the application #433

Comments

Goose97 commented Jun 27, 2023

jlfwong commented Jun 27, 2023 • edited Loading

Goose97 commented Jun 28, 2023

jlfwong commented Jun 28, 2023

jlfwong commented Jun 27, 2023 •

edited

Loading