Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cover command performance issue #208

Open
venupec opened this issue Mar 1, 2018 · 3 comments
Open

cover command performance issue #208

venupec opened this issue Mar 1, 2018 · 3 comments

Comments

@venupec
Copy link

venupec commented Mar 1, 2018

UPDATE:

It is probably the IO issue reading all digest files. I plan to acquire more powerful machines and test.

The cover report spends most of the time -
Devel::Cover::DB::cover(). It is the @Runs loop that takes up almost 80-90%.

The cover text report files are 28MB for each test suite. Some are of smaller size too like 6MB.


Hello,

We run Devel::Cover on some long running harness test suites that runs into couple of days. We're seeing performance issue with cover <db> -report text command.

Issues I'm seeing:

  1. The first few calls to cover report seems to be taking longer time.
  2. I consistently see that cover text report takes about 66% of the total run time to generate report.
  3. When i run the cover text report it takes a minimum of 1 sec to a maximum of 11 minutes on each test suite.
  4. The time to generate cover text report is directly proportional to the amount of the coverage data collected. The more the data, the more time it takes to generate text report.
  5. The cover report timings are acceptable (below 2 minutes) up to 210 test suites. But after 210 test suites , the cover text report takes consistently longer time (beyond 2 mins to any where upto 11 mins)

Our harness test suite set up:

  1. Our harness system has around 30,000 test suites (.t files).
  2. We've divided these into several smaller chunks for easier execution.
  3. I've customized harness to capture coverage using DEVEL_COVER_OPTIONS env var.
  4. All the harness invocations runs through CI/CD.

What i tried to improve performance?

  1. Did quick benchmark on print_statement() & print_subroutine() in Devel::Cover::Report::Text::report(). The results were not that bad, at least from what i've seen so far. It took about 3 minutes in total to generate report for each of the 13 test suites.

  2. I tried to generate JSON report, but that report doesn't provide 'covered' or 'uncovered' modules information. So that's of no use for us. But i've customized the code to include covered modules list as well. Still the performance has not improved.

Did any one see these kind of issues before? I'm clueless on what other optimizations i could do on cover command.

I really appreciate your help/insight into this issue. I'm happy to supply additional data supporting the stats above.

Thank you!

@pjcj
Copy link
Owner

pjcj commented Jul 9, 2018

Thanks very much for reporting this. Did you manage to get any further with it? If you still think it's an IO speed problem that can be reasonably solved in hardware then I'm happy to close the ticket. But if you think there's a real problem here then we should investigate further.

I'm aware that merging the DBs can be quite slow and can use a fair bit of memory. Fixing that would probably require a fundamental change to the way the coverage DB is structured. Perhaps by using a real DB. But obviously that's a fair bit of work.

@jpsalvesen
Copy link
Contributor

A query-able sqllite backend? Yes please! That would significantly lower the effort to mine the data, basically opening it up.

One piece of advice aka premature optimization: Start without indexes. Create them after the coverage is gathered but before you generate the report. Inserting without indexes is much faster than inserting with indexes - especially once your data set grows enough for the associated trees to grow deep.

But looking at the code, this would be a very significant rewrite indeed - especially if we are to realize the potential in doing such a change besides the inital retionale (faster reports and merges).

@jtk18
Copy link

jtk18 commented May 24, 2023

I had some luck in speeding up a Devel::Cover coverage run for a large codebase by specifying JSON as an output format instead of Sereal. The parser for the cover output can read this output much faster than the Sereal database; this is important for repositories with a large number of perl files to cover. It's also much easier to create your own parser -- I wrote one in Golang and one in Rust for a code analysis tool for work.

I could probably re-write and improve the golang script pretty easily. I'll look into doing that for faster parsing of JSON outputted runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants