Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature cProfile #59

Merged
merged 8 commits into from
Oct 9, 2016
Merged

Feature cProfile #59

merged 8 commits into from
Oct 9, 2016

Conversation

Artimi
Copy link

@Artimi Artimi commented Sep 9, 2016

Hello again,

this PR implements addition of cProfile to pytest-benchmark. With cProfile we will know what are the most demanding subfunctions of benchmarked function. When you run py.test with option --benchmark-cprofile you will get top 10 functions of every benchmarked function as you can see here:
pytest_benchmark_cprofile

Those statistics are also saved to storage and you can compare them.

@@ -168,6 +169,7 @@ def __init__(self, fixture, iterations, options):
self.group = fixture.group
self.param = fixture.param
self.params = fixture.params
self.cprofile_stats = fixture.cprofile_stats

self.iterations = iterations
self.stats = Stats()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it becomes really obvious that I picked a confusing name for these two stats attributes. If you have naming ideas let me know (eg: what if it's would be rawstats or data? or maybe rename BenchmarkStats to BenchmarkData and so on?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a bit confusing. What about
BenchmarStats -> BenchmarkMetadata
and
Stats -> BenchmarkStats?
And you can probably keep self.stats.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but we should strip the „Benchmark” prefix, no sense to have it everywhere (obvious the stuff is going to be about benchmarks right?).

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

This saves the cprofile top ten into the storage right? I was wondering, what if we'd save the full data (so that the user can sort differently later, without re-running)?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

Yes it saves top ten function according to current sorting column. I would rather keep it limited because there can be hundreds to thousands functions and usually you are not interested in the most of them. Also, as we discussed in the team, we use mostly cumtime column and sometime maybe tottime and top functions of these two columns should be most of the times same.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

What if we's save just the top-10 for all the cols? That can't be that much.

It's just that usually I have very slow suites and it's a pain in the ass just to rerun everything cause you wanted to see a different column. I'd expect it's the same for other people?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

Ok, I will think about the best way to have all needed information.

@Artimi
Copy link
Author

Artimi commented Sep 12, 2016

I updated the code so we now have stored top ten functions for all columns (not redundant). It is still ordered by cprofile_sort_by and functions for following columns are added only if they are not present in the list already. It's not optimal right now because I have to sequentially scan through list but there are always at max 70 dictionaries so it won't be computationally demanding. With this implementation if you later choose different column you will see all needed results.

Copy link
Owner

@ionelmc ionelmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty good, there are just few nits.

@@ -215,6 +215,13 @@ def pytest_addoption(parser):
help="Fail test if performance regresses according to given EXPR"
" (eg: min:5%% or mean:0.001 for number of seconds). Can be used multiple times."
)
group.addoption(
"--benchmark-cprofile",
metavar="COLUMN", default=None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option should have some validation.

@@ -196,7 +199,8 @@ def __getitem__(self, key):
def has_error(self):
return self.fixture.has_error

def as_dict(self, include_data=True, flat=False, stats=True):
def as_dict(self, include_data=True, flat=False, stats=True,
cprofile_sort_by="cumtime", cprofile_all_columns=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to reduce the cprofile arguments to a sole cprofile argument (cause this call boiler plate is all over the place). If it's specified it means we want to display cprofile stats (sorted by column). If it's not specified it means we want all the columns for storage.

Eg:

  • .as_dict(cprofile=getoption(...)) if we display the stats in table
  • .as_dict() if we generate json for storage

stats_columns.insert(0, cprofile_sort_by)
for column in stats_columns:
cprofile_functions.sort(key=operator.itemgetter(column), reverse=True)
for cprofile_function in cprofile_functions[:10]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe default to 25 rows? Just in case we need more info later.

@@ -207,6 +211,20 @@ def as_dict(self, include_data=True, flat=False, stats=True):
(k, funcname(v) if callable(v) else v) for k, v in self.options.items()
)
}
if self.cprofile_stats:
result["cprofile"] = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use an intermediate variable (eg cprofile = result["cprofile"] = []) for the loops bellow.

@Artimi
Copy link
Author

Artimi commented Oct 6, 2016

I've worked on your suggestions. It should be ok right now. Please review and merge.

@codecov-io
Copy link

Current coverage is 60.92% (diff: 73.13%)

No coverage report found for master at e21d193.

Powered by Codecov. Last update e21d193...f078f60

@ionelmc ionelmc merged commit ee2f36b into ionelmc:master Oct 9, 2016
@ionelmc
Copy link
Owner

ionelmc commented Oct 9, 2016

Alright, thanks for these changes, I know few people have been asking for the cprofile stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants