Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing a large linux perf file will crash the application #433

Closed
Goose97 opened this issue Jun 27, 2023 · 3 comments · Fixed by #435
Closed

Importing a large linux perf file will crash the application #433

Goose97 opened this issue Jun 27, 2023 · 3 comments · Fixed by #435

Comments

@Goose97
Copy link
Contributor

Goose97 commented Jun 27, 2023

Hi, thank you for your work on this awesome project.

Back to the story, I was importing a large file (linux perf tool format), around ~1.3 GB and it crashed the application. The root cause is this line. When we're splitting the file into blocks, a block size can exceeds V8 strings limit. The situation is similar to #385.

My plan is:

  1. change splitBlocks signature to
splitBlocks(contents: TextFileContent): TextFileContent[]
  1. pending will now use StringBackedTextFileContent | BufferBackedTextFileContent. It will initially use StringBackedTextFileContent for performance then fallback to use BufferBackedTextFileContent if the block size exceeds the limit. Something like this
try {
  pending += line
} catch (error) {
  if (check(error)) {
    // convert to BufferBackedTextFileContent and continue
  }
}
  1. To do this, we might need to add some methods to TextFileContent interface:

Another approach that I'm considering is to implement a generic splitBy(delimiter: string) for TextFileContent. I haven't thought it through yet but it seems like a non-trivial thing.

The file is fairly large file so I won't attach it here. I still keep it in my computer in case you want to have a look.

@jlfwong
Copy link
Owner

jlfwong commented Jun 27, 2023

Hey @Goose97!

Thanks for the very clear description of the problem you're trying to solve, and the links to the specific parts of the code!

If I'm understanding the problem correctly, I think there's a simpler solution that's probably more performant.

The splitBlocks function doesn't really need to exist at all. It was just a simple way to reason about the problem.

Instead, I think we can change the importer to just operate on the output of splitLines() directly without using \n\n delimited blocks as an intermediate. This also has the benefit of not doubling the memory requirements.

If we do your proposed change, I think we end up with the 1.3GB in memory three times: once as a buffer, once as the return value of .splitLines() and once as the TextFileContent[]. If we skip the splitBlocks, then we just have the first two.

Really, ideally, we'd have only the one in the buffer and splitLines() should return an iterator instead, then we'd only have the 1.3GB in memory once. That's a separate issue than the one you're trying to solve though.

@Goose97
Copy link
Contributor Author

Goose97 commented Jun 28, 2023

Hi @jlfwong,

That makes a lot of sense. I'll try an PR, going the way you suggested.

I guess I'll go ahead and convert the splitLines() to return an iterator first (neat idea btw), then come back to this issue. Is that cool?

@jlfwong
Copy link
Owner

jlfwong commented Jun 28, 2023

I guess I'll go ahead and convert the splitLines() to return an iterator first (neat idea btw), then come back to this issue. Is that cool?

Yeah, if you'd like to tackle that, then please go ahead!

Please make it a separate PR from the addressing the issue described here (so one of the iterator, one for the perf file fix)

If, in the process of doing that, you realize it's kind of a pain to do, it's also fine to address the linux perf-file specific issue without solving the iterator.

Goose97 added a commit to Goose97/speedscope that referenced this issue Jun 28, 2023
The current behavior of splitLines is to eagerly split all the lines and return an array of strings.

This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines)

Relates to jlfwong#433
Goose97 added a commit to Goose97/speedscope that referenced this issue Jun 28, 2023
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385.

This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.

Fixes jlfwong#433
jlfwong pushed a commit that referenced this issue Jun 28, 2023
The current behavior of splitLines is to eagerly split all the lines and return an array of strings.

This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines)

Relates to #433
Goose97 added a commit to Goose97/speedscope that referenced this issue Jun 28, 2023
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385.

This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.

Fixes jlfwong#433
Goose97 added a commit to Goose97/speedscope that referenced this issue Jun 28, 2023
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385.

This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.

Fixes jlfwong#433
jlfwong pushed a commit that referenced this issue Jun 28, 2023
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in #385.

This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.

Fixes #433
jackerghan pushed a commit to jackerghan/speedscope that referenced this issue Jul 28, 2023
The current behavior of splitLines is to eagerly split all the lines and return an array of strings.

This PR improves this by returning an iterator instead, which will emit lines. This lets callers decide how to best use the splitLines function (i.e. lazily enumerate over lines)

Relates to jlfwong#433
jackerghan pushed a commit to jackerghan/speedscope that referenced this issue Jul 28, 2023
Currently, importing files generated by linux perf tool whose some blocks exceed V8 strings limit can crash the application. This issue is similar to the one in jlfwong#385.

This PR fixes it by changing parseEvents to work directly with lines instead of chunking lines into blocks first.

Fixes jlfwong#433
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants