Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data frame #831

Merged
merged 41 commits into from May 16, 2021
Merged

Data frame #831

merged 41 commits into from May 16, 2021

Conversation

dantleech
Copy link
Member

@dantleech dantleech commented May 9, 2021

Introduces the concept of a data frame.

The data frame can be used programatically in reports and can be accessed through the expression language. The data frame is introduced as a new parameter suite in the expression report.

This addition will enable any value from the suite to be referenced, the following expression f.e. will show a % comparison to the named benchmark for each row:

percent_diff(mode(result_time_avg), mode(suite[subject_name = "benchFoo"]["result_time_avg"]))

TODO:

  • Create DataFrame and associated classes
  • Support expression access for DataFrames
  • Improve tests
  • Regression test perforamance
  • Update docs
  • Split out unrelated improvements/features

Performance

Initially report generation performance was degraded due to the way the evaluation cache worked (it serialized the parameters as a cache key), removing the cache actually results in a significant gain for report generation at the expense of assertions and progtress loggers:

+------------------------------+---------------+-------------+------+-----+------------------+-------------------+----------------+
| benchmark                    | subject       | set         | revs | its | mem_peak         | mode              | rstdev         |
+------------------------------+---------------+-------------+------+-----+------------------+-------------------+----------------+
| ParserBench                  | benchEvaluate | comp. w/tol | 10   | 10  | 1.883mb -2.83%   | 74.146μs +1.40%   | ±1.18% -62.76% |
| ParserBench                  | benchEvaluate | comp.       | 10   | 10  | 1.883mb -2.83%   | 63.677μs +0.55%   | ±2.23% -6.48%  |
| AssertionProcessorBench      | benchAssert   | 0           | 10   | 10  | 3.093mb -3.53%   | 412.578μs +29.59% | ±1.16% -24.57% |
| ExpressionGeneratorBench     | benchGenerate | 0           | 10   | 10  | 12.214mb -52.33% | 8.260ms -35.83%   | ±1.51% -20.55% |
| RunCommandBench              | benchDefault  | 0           | 1    | 10  | 5.881mb -1.02%   | 73.072ms -20.05%  | ±2.25% -3.37%  |
| RunCommandBench              | benchInBand   | 0           | 1    | 10  | 5.881mb -1.02%   | 74.573ms -25.33%  | ±1.45% -50.43% |
| RunCommandBench              | benchNoEnv    | 0           | 1    | 10  | 5.501mb -1.26%   | 21.621ms -3.46%   | ±2.16% -10.81% |
| VariantSummaryFormatterBench | benchFormat   | 0           | 10   | 10  | 3.269mb +1.98%   | 622.163μs +9.24%  | ±2.13% +6.92%  |
+------------------------------+---------------+-------------+------+-----+------------------+-------------------+----------------+

As the regressions are at are sub-millisecond it seems like an acceptable trade-off with the added benefit of removing a cache.

lib/Data/DataFrame.php Outdated Show resolved Hide resolved
lib/Data/DataFrame.php Outdated Show resolved Hide resolved
lib/Data/Series.php Outdated Show resolved Hide resolved
@dantleech dantleech force-pushed the data-frame branch 2 times, most recently from c115b50 to 105a5ab Compare May 15, 2021 17:38
@dantleech dantleech added this to Backlog in PHPBench 1.1 May 15, 2021
@dantleech dantleech moved this from Backlog to In Progress in PHPBench 1.1 May 15, 2021
@dantleech dantleech marked this pull request as ready for review May 15, 2021 21:38
lib/Data/DataFrame.php Outdated Show resolved Hide resolved

$columnValues = array_merge(
[
'suite' => $allFrame
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth prefixing _? the columns are are not extensible however

* @return array<string,array<string,mixed>>
*/
public function suiteToTable(SuiteCollection $collection, bool $includeBaseline = false): array
public function suiteToTable(SuiteCollection $collection, bool $includeBaseline = false): DataFrame
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a BC break potentially - add another method and deprecate the array one

@dantleech dantleech merged commit 4ecdb15 into master May 16, 2021
@dantleech dantleech deleted the data-frame branch May 16, 2021 17:39
@dantleech dantleech mentioned this pull request May 16, 2021
@dantleech dantleech moved this from In Progress to Done in PHPBench 1.1 May 16, 2021
@tillkruss
Copy link
Contributor

This is fantastic!

Screen Shot 2021-05-16 at 5 17 14 PM

@dantleech
Copy link
Member Author

dantleech commented May 17, 2021

Nice! I'm not sure percent_diff works here though - red is "bad", while negative is "good". You could also change the expresion to use the x format or do that maths for percentage rather than using percent_diff to remove the colors. Or just leave it if it works :)

@tillkruss
Copy link
Contributor

Agreed. The red is a little aggressive. Could you explain how to do the Nx notation?

@dantleech
Copy link
Member Author

Math and concatenation: (mode(suite[subjectName="foo"]) / mode(result_time_avg)) ~ "x" or something ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants