Elasticsearch report backend #58

Artimi · 2016-09-09T08:36:46Z

Hello,

we found your plugin very useful. I wrote this PR so we can be able to store information about benchmarks in elasticsearch. As we have plenty of services and we want to track them in CI this is more suitable option for us than files. We can also then generate reports in Kibana as this:

I put saving to files to FileReportBackend and created new one ElasticsearchReportBackend. Package elasticsearch is only optional. I hope I kept spirit of your awesome plugin and did not bend it too much.

ionelmc · 2016-09-09T08:45:32Z

Very interesting! It'll take me bit to fully review but meanwhile some question regarding the --benchmark-elasticsearch-* options. I'm not familiar with elasticsearch but could we compress those into a single option that looks like host/index:doctype? It's just that there are so many options people will forget that they are using pytest :D

Also, --benchmark-elasticsearch-host=URL is that an url or a hostname?

Artimi · 2016-09-09T09:04:26Z

@ionelmc Thanks for fast response. I can squash that information to host/index/doctype so there will be only one argument. However, elasticsearch python lib allows to specify more hosts and it would be more complicated to add that functionality later if we squashed the information. Maybe I can add more hosts right now although we don't need it currently. Something like --benchmark-elasticsearch-hosts host1:9200 host2:9200 and --benchmark-elasticsearch-index-doctype benchmark/benchmark?

--benchmark-elasticsearch-host should have only host not url. Thanks for catching that.

ionelmc · 2016-09-09T09:09:09Z

To be honest I'd rather have a single option for the backends, eg: --benchmark-storage=elasticsearch://host1,host2/index/doctype or --benchmark-storage=file://... (with some backwards compatibility that supports old style paths).

Can you specify doctype without index? elasticsearch://host1,host2//doctype could work but is confusing, perhaps a different scheme like elasticsearch://host1,host2/index:doctype? What kinds of values are acceptable for index and doctype?

Artimi · 2016-09-09T09:28:15Z

Ok we can do that. But I would not like to invent new protocols like elasticsearch://. You are connected to elasticsearch via HTTP so it does not make sense to add protocol elasticsearch://, although I don't know how else tell plugin to use elasticsearch.

You usually call host/index/doctype so I think that colon instead of slash would be confusing. And doctype is part of one index so if you want to specify doctype you probably want to specify also index.

Index and doctype are lower case strings that don't start with underscore and dot. elastic/elasticsearch#6736.

ionelmc · 2016-09-09T09:32:16Z

Well is either inventing an URI protocol or inventing 5 more command line options. There are way too many options, I'm just trying to keep the option list under control. What if someone else comes with a different backend?

Does elasticsearch support just http? Or https too? Is there support for other protocols?

ionelmc · 2016-09-09T09:35:02Z

In hindsight, elasticsearch+https://host1:123,host2:234/index/doc should support that as well.

Unfortunately the URI parser in python doesn't support well the host1:123,host2:234 convention. Implementing a custom parser shouldn't be that hard right?

Artimi · 2016-09-09T09:38:00Z

Elasticsearch can be accessed via http or https so I can use this format elasticsearch+https://host1:123,host2:234/index/doc that you suggested. I will implement it manually.

Artimi · 2016-09-09T13:35:54Z

I removed those --benchmark-elasticsearch* options and updated behavior of --benchmark-storage as requested.

ionelmc · 2016-09-09T14:35:49Z

One more thing that I was wondering about is what if there would be some package holding the report backends, eg: src/pytest_benchmark/backends/something.py? But leave that thought for now (just typing so I don't forget) ...

There are two issues I think:

If you enable the ES storage then --benchmark-json option don't work anymore? I think users would expect that it works.
The two reporting backends kinda have similar code, it should be factored out a bit (later on it would bite me in the ass cause I'll forget to update one of the backends 😁)

Also, the --benchmark-project - should it be merged into the --benchmark-storage URI - what do you think?

Artimi · 2016-09-09T14:53:39Z

Oh yeah I left --benchmark-json functionality out. I will add it there
I was thinking about refactoring but then I didn't want to change your code too much. I will move similar code to BaseReportingBackend
--benchmark-project should contain name of the project so you can later filter by it in elasticsearch so I don't see way how to merge it with path to files/address to elasticsearch in --benchmark-storage. However, I can leave it out if you don't want to specify it and we can rely on method utils.get_project_name.

ionelmc · 2016-09-09T16:41:08Z

On Fri, Sep 9, 2016 at 5:53 PM, Petr Šebek notifications@github.com wrote:

--benchmark-project should contain name of the project so you can later
filter by it in elasticsearch so I don't see way how to merge it with path
to files/address to elasticsearch in --benchmark-storage. However, I can
leave it out if you don't want to specify it and we can rely on method
utils.get_project_name.

Could we integrate that into the storage URI? Also, out of curiosity,
does ES have any optional options (anything from hostname, proj name,
index, doctype)?

Artimi · 2016-09-12T08:14:57Z

I've refactored report backends so there is no duplicated code. Therefore, --benchmark-json and --benchmark-compare now works properly in ElasticsearchReportBacked. Check it out.

I don't understand how we should integrate project name to storage URI. We can live without it and just use function utils.get_project_name().

Yes you can add more options to Elasticsearch connection, e.g. use_ssl and certificates paths (see in docs).

ionelmc · 2016-09-14T00:51:57Z

src/pytest_benchmark/elasticsearch_storage.py

+
+    @staticmethod
+    def _benchmark_from_es_record(source_es_record):
+        result = {}


Does the ES record have stuff we don't need?

You can see how single record looks in es in the bottom. I had to split benchmarks of each run to single record and therefore put common information (e.g. machine_info, datetime) to each record. Function _benchmark_from_es_record selects only keys that belongs to single benchmark, while _run_info_from_es_record selects data that are common among these benchmarks.

{ "_index": "benchmark", "_type": "benchmark", "_id": "ukuran_6eac319724dbd9beebf21aafd52d00ece31f61a4_20160912_080154_tests/test_performance.py::test_pedantic", "_score": null, "_source": { "version": "3.0.0", "options": { "timer": "perf_counter", "warmup": false, "disable_gc": false, "max_time": 1, "min_rounds": 5, "min_time": 0.000005 }, "commit_info": { "project": "ukuran", "id": "6eac319724dbd9beebf21aafd52d00ece31f61a4", "dirty": false }, "stats": { "stddev": 0.0000033206052045174887, "q1": 0.005085450800015679, "stddev_outliers": 2, "q3": 0.005090232499969716, "rounds": 10, "median": 0.005088654999963182, "max": 0.0050920920999487865, "iqr": 0.000004781699954037148, "outliers": "2;0", "min": 0.005080863300008787, "iqr_outliers": 0, "ld15iqr": 0.005080863300008787, "mean": 0.005087793719976617, "iterations": 10, "hd15iqr": 0.0050920920999487865 }, "machine_info": { "node": "ubuntu", "processor": "x86_64", "machine": "x86_64", "system": "Linux", "release": "4.4.0-36-generic", "python_implementation": "CPython", "python_build": [ "default", "Jul 5 2016 12:43:10" ], "python_implementation_version": "3.5.2", "python_compiler": "GCC 5.4.0 20160609", "python_version": "3.5.2" }, "group": null, "fullname": "tests/test_performance.py::test_pedantic", "name": "test_pedantic", "datetime": "2016-09-12T08:01:57.865021", "param": null, "params": null }, "sort": [ 1473667317865 ] }

ionelmc · 2016-09-14T01:43:02Z

About the project name, could the scheme be es+http://host/index/doc/project or es+http://host?project_name=foobar? What do you think?

Artimi · 2016-09-15T08:13:31Z

I think that adding project_name to --benchmark-storage would only confuse user. Don't you like solution I made? That project_name is deduced automatically?

Artimi · 2016-09-19T06:35:13Z

@ionelmc Hi, so what do you think about those two PRs? Will you eventually merge it? Is there some problem? What's your plan with it?

ionelmc · 2016-09-19T06:40:46Z

Yes they will be merged. I was a bit busy last week.

ionelmc · 2016-09-19T06:45:24Z

Regarding project_name I like the fact there is a sensible default. It's just that it's a backend specific (even if other future backends might need it as well), so it makes sense to "integrate" it somehow in the storage URI. Do you like the ?project_name=foobar idea? It's a bit verbose but acceptable given how verbose other stuff is and how often it's required ...

ionelmc · 2016-09-19T06:50:37Z

(?project_name=foobar being optional, of course)

codecov-io · 2016-09-26T14:52:10Z

Current coverage is 56.71% (diff: 42.91%)

Merging #58 into master will decrease coverage by 4.21%

@@             master        #58   diff @@
==========================================
  Files            15         20     +5   
  Lines          1423       1564   +141   
  Methods           0          0          
  Messages          0          0          
  Branches        260        271    +11   
==========================================
+ Hits            867        887    +20   
- Misses          507        631   +124   
+ Partials         49         46     -3

Powered by Codecov. Last update 7198674...f44cfba

Artimi · 2016-09-26T14:55:25Z

Ok, I added ?project_name=foobar although I don't think it fits there. It required more magic with config.
I also fixed problem with unserializable parameters in a similar manner you've done it.

ionelmc · 2016-09-26T17:05:30Z

src/pytest_benchmark/elasticsearch_storage.py

@@ -11,7 +32,7 @@ def __init__(self, elasticsearch_hosts, elasticsearch_index, elasticsearch_docty
        self._elasticsearch_hosts = elasticsearch_hosts
        self._elasticsearch_index = elasticsearch_index
        self._elasticsearch_doctype = elasticsearch_doctype
-        self._elasticsearch = elasticsearch.Elasticsearch(self._elasticsearch_hosts)
+        self._elasticsearch = elasticsearch.Elasticsearch(self._elasticsearch_hosts, serializer=SaveElasticsearchJSONSerializer())


Note that this will create an error at runtime if elasticsearch could not be imported (None is not callable).

If elasticsearch could not be imported it raises ImportError before we get to this point. The control is at start of ElasticsearchStorage.__init__ method.

Artimi · 2016-10-06T06:59:45Z

@ionelmc So is there something more I could do with this pull request? Can you merge it now?

ionelmc · 2016-10-06T07:10:15Z

src/pytest_benchmark/plugin.py

+        config.option.__dict__["benchmark_project_name"] = project_name
+        config.option.__dict__["benchmark_autosave"] = get_tag(project_name)
+        return ElasticReportBackend(config)
+    else:


Regarding #58 (comment), can we also support legacy-style paths (auto-convert them to file://<value> if value doesn't contain ://)? Tests are failing anyway because of this :)

@ionelmc Done, thanks for the fast reply.

ionelmc · 2016-10-06T07:11:13Z

@Artimi I think that's the last nit there, getting pretty close now.

ionelmc · 2016-10-09T12:36:56Z

Don't worry about the conflicts, I can do the rebasing.

…standalone CLI.

ionelmc · 2016-10-09T16:14:25Z

Ok so I've played with this a bit, what I've found:

the ES backend doesn't support namespacing by machine_id. If you run benchmarks on different machines, and they do into same storage then you'll get wrong comparisons. Such "wrong" use should be allowed, but only by use of explicit options. The defaults should prevent the user from making such mistake (saving benchmarks run on different machines/vms into same "place").
the CLI part (the standalone py.test-benchmark bin) is still rigged to use only the file storage. It should work with any backend.

Another strange thing in the CLI is that it don't use any machine_id (it has access to all the runs, regardless of what machine_id).

What I'm proposing here:

Rip out the file/ES reporting classes. IOW: move out all the storage-concerned parts into the storage classes.
Instantiate the storage as early as possible.

For purpose of speeding this up a bit I did the rebasing and refactoring in this branch https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support (refactoring commit: 7d38c66). I probably broke something in there (waiting on tests, don't consider it something final).

So (how) would it be possible to support the machine_id namespacing in the ES storage?

I can explain more if not clear, or feel free to argue if you disagree :-)

Artimi · 2016-10-10T15:12:11Z

Thanks for your suggestions. I removed ReportBackend classes and returned handle_saving and handle_loading to Session.
I also added machine_id to the id of elasticsearch document and benchmark_id which uniquely identifies each benchmark.
Last thing was the cli which should now work with elasticsearch too.

ionelmc · 2016-10-10T15:12:42Z

@Artimi hmmm, it would be easier if you'd work new changes on top of https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support no?

Artimi · 2016-10-10T15:17:47Z

Ah ok, I thought we will talk about it here. I will cherry pick tomorrow.

Artimi · 2016-10-11T10:22:05Z

@ionelmc I've pushed requested changes to our organization repo https://github.com/qntln/pytest-benchmark/tree/es-storage-cli-supportt. It seems that I don't have permissions to write to your branch https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support.

ionelmc · 2016-10-11T10:36:01Z

@Artimi it seems it's a private org. You can either push -f to feature-reporting-backend or make another PR?

Artimi · 2016-10-11T11:15:27Z

@ionelmc I force pushed it here.

ionelmc · 2016-11-24T00:54:53Z

src/pytest_benchmark/utils.py

+def get_tag(project_name=None):
+    info = get_commit_info(project_name)
+    parts = []
+    if info['project']:


@Artimi I plan to remove the project part (to get back the old "brief" naming style from 3.0) in the next release (alpha2). Let me know if you have objections or not.

ionelmc reviewed Sep 14, 2016
View reviewed changes

ionelmc reviewed Sep 26, 2016

View reviewed changes

ionelmc requested changes Oct 6, 2016

View reviewed changes

Petr Šebek added 5 commits October 9, 2016 15:14

Separate report backend

7734158

Save records to elasticsearch

a325fbb

Save project name to commit info and elasticsearch _id

53241cc

Create mapping of elasticsearch to prevent analysis

6857f64

Get project name from mercurial

31b75e7

Petr Šebek and others added 6 commits October 9, 2016 15:20

Project name as a part of --benchmark-storage

1255a91

Handle unserializable parameters to elasticsearch

a14e352

Legacy benchmark-storage without schema

904af6f

Merge last two changelog entries.

6bbe117

Styling.

39eca25

Require pytest 2.8 at minimum (we don't test with <2.8 anymore).

e2dd7bf

ionelmc approved these changes Oct 9, 2016

View reviewed changes

ionelmc added 4 commits October 9, 2016 15:48

Styling.

8bcb09b

Add note about backwards compat.

51d49ed

Some restructuring.

b9c586a

Heavy hand refactoring to allow the new ES storage to be used in the …

7d38c66

…standalone CLI.

ionelmc added 5 commits October 10, 2016 03:06

Add missing file.

51c3faf

Update assertion.

c06c537

Also update template.

dec243b

Pin it and fix env name.

d4dbb04

Fix imports and some mockings.

544e023

Petr Šebek added 2 commits October 11, 2016 10:19

handle_{saving,loading} to Session

4018589

Add to es storage benchmark_id and machine_id

264d645

Artimi force-pushed the feature-reporting-backend branch from 3cb4230 to 264d645 Compare October 11, 2016 11:14

ionelmc merged commit 264d645 into ionelmc:master Oct 22, 2016

ionelmc reviewed Nov 24, 2016

View reviewed changes

varac mentioned this pull request Mar 21, 2017

Graph soledad benchmark results pixelated/puppet-pixelated#65

Closed

Elasticsearch report backend #58

Elasticsearch report backend #58

Conversation

Artimi commented Sep 9, 2016

ionelmc commented Sep 9, 2016

Artimi commented Sep 9, 2016

ionelmc commented Sep 9, 2016

Artimi commented Sep 9, 2016

ionelmc commented Sep 9, 2016

ionelmc commented Sep 9, 2016

Artimi commented Sep 9, 2016

Artimi commented Sep 9, 2016

ionelmc commented Sep 9, 2016

Artimi commented Sep 9, 2016

ionelmc commented Sep 9, 2016

Artimi commented Sep 12, 2016

ionelmc Sep 14, 2016

Choose a reason for hiding this comment

Artimi Sep 15, 2016

Choose a reason for hiding this comment

ionelmc commented Sep 14, 2016

Artimi commented Sep 15, 2016

Artimi commented Sep 19, 2016

ionelmc commented Sep 19, 2016

ionelmc commented Sep 19, 2016

ionelmc commented Sep 19, 2016

codecov-io commented Sep 26, 2016 • edited Loading

Current coverage is 56.71% (diff: 42.91%)

Artimi commented Sep 26, 2016

ionelmc Sep 26, 2016

Choose a reason for hiding this comment

Artimi Sep 27, 2016

Choose a reason for hiding this comment

Artimi commented Oct 6, 2016

ionelmc Oct 6, 2016

Choose a reason for hiding this comment

Artimi Oct 6, 2016

Choose a reason for hiding this comment

ionelmc commented Oct 6, 2016

ionelmc commented Oct 9, 2016

ionelmc commented Oct 9, 2016

Artimi commented Oct 10, 2016

ionelmc commented Oct 10, 2016

Artimi commented Oct 10, 2016

Artimi commented Oct 11, 2016

ionelmc commented Oct 11, 2016

Artimi commented Oct 11, 2016

ionelmc Nov 24, 2016

Choose a reason for hiding this comment

codecov-io commented Sep 26, 2016 •

edited

Loading