-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I've chatted with @honzakral about this in the past, but with the 5.X series unfortunately we are having a lot of issues getting the elasticsearch-py package 5.X versions to talk nicely to 1.X clusters so I wanted to re-surface this suggestion (whereas elasticsearch-py 2.X versions mostly talked fine to 2.X and 1.X clusters).
Context: If an organization has a large Python code base that needs to communicate with multiple versions of Elasticsearch, it's really hard to do with the current packaging strategy of elasticsearch-py because you can't install multiple versions of the same library in the same virtualenv. In the past we've gotten away with this because orgs were using other libraries (e.g. pyes) to communicate with 0.X and then elasticsearch-py for 1.X; so you could install both in the same virtualenv.
Proposal: I propose that since elasticsearch-py already does branch based development and uses relative imports, major version changes publish packages under a different package name, e.g. elasticsearch5. This way large python projects can import as many major versions of the client library as they want and not worry about cross version compatibility.
Tradeoffs: This solution mainly translates to some maintenance overhead in tests, where the code actually references the package name. Since elasticsearch-py already uses branches, this ought be relatively straightforward. You can also still publish master to "elasticsearch" maintaining backwards compatibility, or if it's annoying to apply patches use symlinks and tell setuptools explicitly which packages to build (this wouldn't work for tests).
Alternatives: Last time I chatted with Honza he suggested two alternatives. The first is to just use the latest version of elasticsearch-py to talk to all versions and it would probably mostly work because the bodies are what change version to version. For Elasticsearch 2.X this worked out ok, but Elasticsearch 5.X appears to have introduced some gnarly if-else complexity around node status endpoints and the like example and our integration tests are definitely complaining about elasticsearch-py 5.X's ability to communicate with 1.X clusters especially around the fields/store_fields/source_fields search API change. The second alternative he suggested was a 'compat' namespace using AddonClient. I think this alternative could definitely work, but I feel like there is a lot of maintenance overhead for customers running multiple versions (having to maintain the compatibility layer).
Totally cool with the answer being "let's just have the customers that need it do namespaces with AddonClient" or "we'll put the if-else logic in elasticsearch-py", but I just wanted to raise the suggestion again. I think there is a lot of good prior art in system packaging (apache vs apache2), java packaging (shaded jars and package relocation) and even Python packaging (boto2 vs boto3) to support the usefulness of changing the package name for major version changes.
Yelp#1 was my go at this in our fork, took me about 15 minutes to relocate the package with the help of sed (also fwiw well done elasticsearch-py is a super simple library!).