Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch report backend #58

Merged
merged 31 commits into from
Oct 22, 2016
Merged

Conversation

Artimi
Copy link

@Artimi Artimi commented Sep 9, 2016

Hello,

we found your plugin very useful. I wrote this PR so we can be able to store information about benchmarks in elasticsearch. As we have plenty of services and we want to track them in CI this is more suitable option for us than files. We can also then generate reports in Kibana as this:
pytest_benchmark_elasticsearch_kibana_graph

I put saving to files to FileReportBackend and created new one ElasticsearchReportBackend. Package elasticsearch is only optional. I hope I kept spirit of your awesome plugin and did not bend it too much.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

Very interesting! It'll take me bit to fully review but meanwhile some question regarding the --benchmark-elasticsearch-* options. I'm not familiar with elasticsearch but could we compress those into a single option that looks like host/index:doctype? It's just that there are so many options people will forget that they are using pytest :D

Also, --benchmark-elasticsearch-host=URL is that an url or a hostname?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

@ionelmc Thanks for fast response. I can squash that information to host/index/doctype so there will be only one argument. However, elasticsearch python lib allows to specify more hosts and it would be more complicated to add that functionality later if we squashed the information. Maybe I can add more hosts right now although we don't need it currently. Something like --benchmark-elasticsearch-hosts host1:9200 host2:9200 and --benchmark-elasticsearch-index-doctype benchmark/benchmark?

--benchmark-elasticsearch-host should have only host not url. Thanks for catching that.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

To be honest I'd rather have a single option for the backends, eg: --benchmark-storage=elasticsearch://host1,host2/index/doctype or --benchmark-storage=file://... (with some backwards compatibility that supports old style paths).

Can you specify doctype without index? elasticsearch://host1,host2//doctype could work but is confusing, perhaps a different scheme like elasticsearch://host1,host2/index:doctype? What kinds of values are acceptable for index and doctype?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

Ok we can do that. But I would not like to invent new protocols like elasticsearch://. You are connected to elasticsearch via HTTP so it does not make sense to add protocol elasticsearch://, although I don't know how else tell plugin to use elasticsearch.

You usually call host/index/doctype so I think that colon instead of slash would be confusing. And doctype is part of one index so if you want to specify doctype you probably want to specify also index.

Index and doctype are lower case strings that don't start with underscore and dot. elastic/elasticsearch#6736.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

Well is either inventing an URI protocol or inventing 5 more command line options. There are way too many options, I'm just trying to keep the option list under control. What if someone else comes with a different backend?

Does elasticsearch support just http? Or https too? Is there support for other protocols?

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

In hindsight, elasticsearch+https://host1:123,host2:234/index/doc should support that as well.

Unfortunately the URI parser in python doesn't support well the host1:123,host2:234 convention. Implementing a custom parser shouldn't be that hard right?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

Elasticsearch can be accessed via http or https so I can use this format elasticsearch+https://host1:123,host2:234/index/doc that you suggested. I will implement it manually.

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

I removed those --benchmark-elasticsearch* options and updated behavior of --benchmark-storage as requested.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

One more thing that I was wondering about is what if there would be some package holding the report backends, eg: src/pytest_benchmark/backends/something.py? But leave that thought for now (just typing so I don't forget) ...

There are two issues I think:

  • If you enable the ES storage then --benchmark-json option don't work anymore? I think users would expect that it works.
  • The two reporting backends kinda have similar code, it should be factored out a bit (later on it would bite me in the ass cause I'll forget to update one of the backends 😁)

Also, the --benchmark-project - should it be merged into the --benchmark-storage URI - what do you think?

@Artimi
Copy link
Author

Artimi commented Sep 9, 2016

  • Oh yeah I left --benchmark-json functionality out. I will add it there
  • I was thinking about refactoring but then I didn't want to change your code too much. I will move similar code to BaseReportingBackend
  • --benchmark-project should contain name of the project so you can later filter by it in elasticsearch so I don't see way how to merge it with path to files/address to elasticsearch in --benchmark-storage. However, I can leave it out if you don't want to specify it and we can rely on method utils.get_project_name.

@ionelmc
Copy link
Owner

ionelmc commented Sep 9, 2016

On Fri, Sep 9, 2016 at 5:53 PM, Petr Šebek notifications@github.com wrote:

--benchmark-project should contain name of the project so you can later
filter by it in elasticsearch so I don't see way how to merge it with path
to files/address to elasticsearch in --benchmark-storage. However, I can
leave it out if you don't want to specify it and we can rely on method
utils.get_project_name.

​Could we integrate that into the storage URI?​ Also, out of curiosity,
does ES have any optional options (anything from hostname, proj name,
index, doctype)?

@Artimi
Copy link
Author

Artimi commented Sep 12, 2016

I've refactored report backends so there is no duplicated code. Therefore, --benchmark-json and --benchmark-compare now works properly in ElasticsearchReportBacked. Check it out.

I don't understand how we should integrate project name to storage URI. We can live without it and just use function utils.get_project_name().

Yes you can add more options to Elasticsearch connection, e.g. use_ssl and certificates paths (see in docs).


@staticmethod
def _benchmark_from_es_record(source_es_record):
result = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the ES record have stuff we don't need?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see how single record looks in es in the bottom. I had to split benchmarks of each run to single record and therefore put common information (e.g. machine_info, datetime) to each record. Function _benchmark_from_es_record selects only keys that belongs to single benchmark, while _run_info_from_es_record selects data that are common among these benchmarks.

{ "_index": "benchmark", "_type": "benchmark", "_id": "ukuran_6eac319724dbd9beebf21aafd52d00ece31f61a4_20160912_080154_tests/test_performance.py::test_pedantic", "_score": null, "_source": { "version": "3.0.0", "options": { "timer": "perf_counter", "warmup": false, "disable_gc": false, "max_time": 1, "min_rounds": 5, "min_time": 0.000005 }, "commit_info": { "project": "ukuran", "id": "6eac319724dbd9beebf21aafd52d00ece31f61a4", "dirty": false }, "stats": { "stddev": 0.0000033206052045174887, "q1": 0.005085450800015679, "stddev_outliers": 2, "q3": 0.005090232499969716, "rounds": 10, "median": 0.005088654999963182, "max": 0.0050920920999487865, "iqr": 0.000004781699954037148, "outliers": "2;0", "min": 0.005080863300008787, "iqr_outliers": 0, "ld15iqr": 0.005080863300008787, "mean": 0.005087793719976617, "iterations": 10, "hd15iqr": 0.0050920920999487865 }, "machine_info": { "node": "ubuntu", "processor": "x86_64", "machine": "x86_64", "system": "Linux", "release": "4.4.0-36-generic", "python_implementation": "CPython", "python_build": [ "default", "Jul 5 2016 12:43:10" ], "python_implementation_version": "3.5.2", "python_compiler": "GCC 5.4.0 20160609", "python_version": "3.5.2" }, "group": null, "fullname": "tests/test_performance.py::test_pedantic", "name": "test_pedantic", "datetime": "2016-09-12T08:01:57.865021", "param": null, "params": null }, "sort": [ 1473667317865 ] }

@ionelmc
Copy link
Owner

ionelmc commented Sep 14, 2016

About the project name, could the scheme be es+http://host/index/doc/project or es+http://host?project_name=foobar? What do you think?

@Artimi
Copy link
Author

Artimi commented Sep 15, 2016

I think that adding project_name to --benchmark-storage would only confuse user. Don't you like solution I made? That project_name is deduced automatically?

@Artimi
Copy link
Author

Artimi commented Sep 19, 2016

@ionelmc Hi, so what do you think about those two PRs? Will you eventually merge it? Is there some problem? What's your plan with it?

@ionelmc
Copy link
Owner

ionelmc commented Sep 19, 2016

Yes they will be merged. I was a bit busy last week.

@ionelmc
Copy link
Owner

ionelmc commented Sep 19, 2016

Regarding project_name I like the fact there is a sensible default. It's just that it's a backend specific (even if other future backends might need it as well), so it makes sense to "integrate" it somehow in the storage URI. Do you like the ?project_name=foobar idea? It's a bit verbose but acceptable given how verbose other stuff is and how often it's required ...

@ionelmc
Copy link
Owner

ionelmc commented Sep 19, 2016

(?project_name=foobar being optional, of course)

@codecov-io
Copy link

codecov-io commented Sep 26, 2016

Current coverage is 56.71% (diff: 42.91%)

Merging #58 into master will decrease coverage by 4.21%

@@             master        #58   diff @@
==========================================
  Files            15         20     +5   
  Lines          1423       1564   +141   
  Methods           0          0          
  Messages          0          0          
  Branches        260        271    +11   
==========================================
+ Hits            867        887    +20   
- Misses          507        631   +124   
+ Partials         49         46     -3   

Powered by Codecov. Last update 7198674...f44cfba

@Artimi
Copy link
Author

Artimi commented Sep 26, 2016

Ok, I added ?project_name=foobar although I don't think it fits there. It required more magic with config.
I also fixed problem with unserializable parameters in a similar manner you've done it.

@@ -11,7 +32,7 @@ def __init__(self, elasticsearch_hosts, elasticsearch_index, elasticsearch_docty
self._elasticsearch_hosts = elasticsearch_hosts
self._elasticsearch_index = elasticsearch_index
self._elasticsearch_doctype = elasticsearch_doctype
self._elasticsearch = elasticsearch.Elasticsearch(self._elasticsearch_hosts)
self._elasticsearch = elasticsearch.Elasticsearch(self._elasticsearch_hosts, serializer=SaveElasticsearchJSONSerializer())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will create an error at runtime if elasticsearch could not be imported (None is not callable).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If elasticsearch could not be imported it raises ImportError before we get to this point. The control is at start of ElasticsearchStorage.__init__ method.

@Artimi
Copy link
Author

Artimi commented Oct 6, 2016

@ionelmc So is there something more I could do with this pull request? Can you merge it now?

config.option.__dict__["benchmark_project_name"] = project_name
config.option.__dict__["benchmark_autosave"] = get_tag(project_name)
return ElasticReportBackend(config)
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #58 (comment), can we also support legacy-style paths (auto-convert them to file://<value> if value doesn't contain ://)? Tests are failing anyway because of this :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ionelmc Done, thanks for the fast reply.

@ionelmc
Copy link
Owner

ionelmc commented Oct 6, 2016

@Artimi I think that's the last nit there, getting pretty close now.

@ionelmc
Copy link
Owner

ionelmc commented Oct 9, 2016

Don't worry about the conflicts, I can do the rebasing.

@ionelmc
Copy link
Owner

ionelmc commented Oct 9, 2016

Ok so I've played with this a bit, what I've found:

  • the ES backend doesn't support namespacing by machine_id. If you run benchmarks on different machines, and they do into same storage then you'll get wrong comparisons. Such "wrong" use should be allowed, but only by use of explicit options. The defaults should prevent the user from making such mistake (saving benchmarks run on different machines/vms into same "place").
  • the CLI part (the standalone py.test-benchmark bin) is still rigged to use only the file storage. It should work with any backend.

Another strange thing in the CLI is that it don't use any machine_id (it has access to all the runs, regardless of what machine_id).

What I'm proposing here:

  • Rip out the file/ES reporting classes. IOW: move out all the storage-concerned parts into the storage classes.
  • Instantiate the storage as early as possible.

For purpose of speeding this up a bit I did the rebasing and refactoring in this branch https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support (refactoring commit: 7d38c66). I probably broke something in there (waiting on tests, don't consider it something final).

So (how) would it be possible to support the machine_id namespacing in the ES storage?

I can explain more if not clear, or feel free to argue if you disagree :-)

@Artimi
Copy link
Author

Artimi commented Oct 10, 2016

Thanks for your suggestions. I removed ReportBackend classes and returned handle_saving and handle_loading to Session.
I also added machine_id to the id of elasticsearch document and benchmark_id which uniquely identifies each benchmark.
Last thing was the cli which should now work with elasticsearch too.

@ionelmc
Copy link
Owner

ionelmc commented Oct 10, 2016

@Artimi hmmm, it would be easier if you'd work new changes on top of https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support no?

@Artimi
Copy link
Author

Artimi commented Oct 10, 2016

Ah ok, I thought we will talk about it here. I will cherry pick tomorrow.

@Artimi
Copy link
Author

Artimi commented Oct 11, 2016

@ionelmc I've pushed requested changes to our organization repo https://github.com/qntln/pytest-benchmark/tree/es-storage-cli-supportt. It seems that I don't have permissions to write to your branch https://github.com/ionelmc/pytest-benchmark/tree/es-storage-cli-support.

@ionelmc
Copy link
Owner

ionelmc commented Oct 11, 2016

@Artimi it seems it's a private org. You can either push -f to feature-reporting-backend or make another PR?

@Artimi
Copy link
Author

Artimi commented Oct 11, 2016

@ionelmc I force pushed it here.

@ionelmc ionelmc merged commit 264d645 into ionelmc:master Oct 22, 2016
def get_tag(project_name=None):
info = get_commit_info(project_name)
parts = []
if info['project']:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Artimi I plan to remove the project part (to get back the old "brief" naming style from 3.0) in the next release (alpha2). Let me know if you have objections or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants