Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Componentisation has gone too far. #19957

Closed
lowenna opened this issue Feb 3, 2016 · 16 comments
Closed

Componentisation has gone too far. #19957

lowenna opened this issue Feb 3, 2016 · 16 comments

Comments

@lowenna
Copy link
Member

lowenna commented Feb 3, 2016

While I agree that making clearly delineated separation between components is good, the approach that has gone on in the past few months is severely hindering ongoing development and bug-fixing of the docker engine.

I concur that certain components make perfect sense to be in independent, such as libcontainer or libnetwork. But for other components, engine-api for example being the latest I hit today, IMO it simply makes it harder to develop changes/bug fixes to the engine for reasonably little gain. As it stands as I type, it is not even possible to build the docker engine with the latest engine api - it's not in a state that can be vendored in and compiled. That makes issues such as #18077 (comment) almost impossible to fix in the current code base.

My suggestion is to keep separation at the pkg/ level, rather than fracturing out into so many distinct repos. The current design philosophy makes submitting changes incrementally hard and onerous, and I'm far from convinced that this level of componentisation is worth the advantages. It sets the the learning curve bar for new contributors at 'extreme'. And 'complicated++' for relatively seasoned ones.

It also (in the current CI world where Windows CI is only enabled on docker/docker) is prone to increased error possibilities. We should try to make this easy for everyone, not overly complicate things.

Componentisation to separate repos makes sense if the overall code base were stable. However, this remains far from the case with the level of changes going through daily and weekly.

Please discuss 😄

@shykes @docker-maintainers @thaJeztah @friizm @icecrime

@duglin
Copy link
Contributor

duglin commented Feb 3, 2016

@jhowardmsft agreed - and mentioned it here: docker/engine-api#65 (comment)

In addition to what you mentioned, the spit at the repo level forces a duplication of testing too - and a lot of work to "mock" things. This extra "mocking" effort ends up 1) being a lot of wasted effort when the real data is available to us if we just used docker itself, and 2) gives us a false sense of security because we only end up verifying the most basic cases and miss the unplanned interactions of a real live daemon. And if the answer there is "we'll catch those when we vendor it into the engine", then we should just move it back to the engine now because then its not really as independent as we claim since it can't be truly tested on its own.

With go's import/pkg system there's really no reason (at this time) not to keep it with the engine and let people import it if they want just that bit. If at some point the engine-api (and the go-builder repo too) are really stand-alone and can be fully run/tested w/o the engine, then I think we can reconsider this but right now its just pain with no real gain. I don't see what we get by being in a different repo that we can't do in a pkg (aside from a different set of maintainers, which IMO isn't necessarily a good thing since the apis & builder are so related to the engine anyway).

@icecrime
Copy link
Contributor

icecrime commented Feb 3, 2016

This is an important discussion to have, I'll share my current feelings.

[...] for other components, engine-api for example being the latest I hit today, IMO it simply makes it harder to develop changes/bug fixes to the engine for reasonably little gain

Yes, that is absolutely true: it makes it harder on the Engine. But the reasoning behind this decision is that Engine is not alone anymore. Tons of projects out there use the Docker API, both outside (Rancher, CI systems, ...), and inside Docker (Swarm, Compose, ...). One of the most recurring criticism against the project is that the API changes too much, sometimes for no good reason.

Having a reference API implementation that is upstream from all products seems like a sane thing to do: we can discuss changes to the API independently of any particular product, and when such changes are approved, all clients stand on equal ground and can manage their own related patches.

What we are experiencing on Engine right now is what every other project out there has been experiencing forever. We made it worse on Engine, to make it better for everybody else.

Componentisation to separate repos makes sense if the overall code base were stable. However, this remains far from the case with the level of changes going through daily and weekly.

I honestly wonder how much more stable would the API be if it hadn't been so trivial to change, including by mistake (!), through the Engine codebase. I think it's worth considering that maybe nothing is stable because there is no technical incentive for it to be.

This extra "mocking" effort ends up 1) being a lot of wasted effort when the real data is available to us if we just used docker itself, and 2) gives us a false sense of security because we only end up verifying the most basic cases and miss the unplanned interactions of a real live daemon.

This statement is Engine centric: what about all the other clients depending on the API today?

If at some point the engine-api (and the go-builder repo too) are really stand-alone and can be fully run/tested w/o the engine, then I think we can reconsider this but right now its just pain with no real gain.

Do you think anybody is happy having those repositories untested? The only reason there are no tests is because nobody had time, and the fact that it's issue docker/engine-api#1 makes it pretty clear how much we think it matters. It's extremely ironic for a maintainer to complain that something is missing.

I'm just going to conclude this way. Am I convinced that a separate repository is a better idea than a sub pkg/? No, but I don't think we tried hard enough to make that call just yet. Maybe the conclusion will be that it indeed doesn't make sense, we'll say "well, we tried", and we'll move on. I know for sure that I'd be happier writing code against docker/engine-api than against docker/docker/pkg/engine-api.

@duglin
Copy link
Contributor

duglin commented Feb 3, 2016

On Feb 3, 2016, at 12:07 PM, Arnaud Porterie notifications@github.com wrote:

This is an important discussion to have, I'll share my current feelings.
[...] for other components, engine-api for example being the latest I hit today, IMO it simply makes it harder to develop changes/bug fixes to the engine for reasonably little gain

Yes, that is absolutely true: it makes it harder on the Engine. But the reasoning behind this decision is that Engine is not alone anymore. Tons of projects out there use the Docker API, both outside (Rancher, CI systems, ...), and inside Docker (Swarm, Compose, ...). One of the most recurring criticism against the project is that the API changes too much, sometimes for no good reason

The problem with this is that the engine-api’s don’t stand alone either - they have no meaning without an impl behind it - and the only real impl (as of now) that matters is the engine. No change should go into the engine-api, that will require a change to the engine, without also being reviewed by the engine team. Otherwise the engine-api change will be an orphan and will probably need to be removed. Wasted time/effort.

See comment below about “API changes too much”.

Having a reference API implementation that is upstream from all products seems like a sane thing to do: we can discuss changes to the API independently of any particular product, and when such changes are approved, all clients stand on equal ground and can manage their own related patches.

An API w/o impl makes no sense. All api changes should be reviewed within the full context of Docker, meaning (mainly) the engine. Unless you actually see a point in time where the API has a feature that the Docker engine doesn’t support - but even if that’s possible (way way in the future) I don’t see it in the cards right now.
What we are experiencing on Engine right now is what every other project out there has been experiencing forever. We made it worse on Engine, to make it better for everybody else.

Componentisation to separate repos makes sense if the overall code base were stable. However, this remains far from the case with the level of changes going through daily and weekly.

I honestly wonder how much more stable would the API be if it hadn't been so trivial to change, including by mistake (!), through the Engine codebase. I think it's worth considering that maybe nothing is stable because there is no technical incentive for it to be.

I don’t think the stability, or lack of it, can be blamed on which repo the code sits in. That’s strictly on us for allowing changes when we maybe shouldn’t have. And, in fact, a separate repo will make it worse because there’s not only nothing stopping us from making these types of changes in the future, but now we’re doing it w/o any analysis (testing) of what it means for the engine and the rest of Docker.

This extra "mocking" effort ends up 1) being a lot of wasted effort when the real data is available to us if we just used docker itself, and 2) gives us a false sense of security because we only end up verifying the most basic cases and miss the unplanned interactions of a real live daemon.

This statement is Engine centric: what about all the other clients depending on the API today?

People care about the engine-api's because they want compatibility with Docker - its not some random API out there. So, yes its very engine centric because the engine-api repo is for the engine :-) Trying to give the main interface of a product a different lifecycle/feature-set/etc... from the product itself seems really odd to me - unless its stand-alone, which its not.
If at some point the engine-api (and the go-builder repo too) are really stand-alone and can be fully run/tested w/o the engine, then I think we can reconsider this but right now its just pain with no real gain.

Do you think anybody is happy having those repositories untested? The only reason there are no tests is because nobody had time, and the fact that it's issue docker/engine-api#1 docker/engine-api#1 makes it pretty clear how much we think it matters. It's extremely ironic for a maintainer to complain that something is missing.

But we have tests - its called integration cli :-) What you’re suggesting is that we either duplicate it or mock it. Who has the time to maintain two sets of tests that should look pretty similar? Who really wants to?
I'm just going to conclude this way. Am I convinced that a separate repository is a better idea than a sub pkg/? No, but I don't think we tried hard enough to make that call just yet. Maybe the conclusion will be that it indeed doesn't make sense, we'll say "well, we tried", and we'll move on. I know for sure that I'd be happier writing code against docker/engine-api than against docker/docker/pkg/engine-api.

A better approach would be to move it to the pkg dir and see how that feels first. It provides the isolation 3rd party tooling needs w/o the pain. Once we have proven that’s ok AND we have a solid framework for testing and reviewing PRs for that pkg w/o the rest of Docker then we can consider moving it to a new repo.

@icecrime
Copy link
Contributor

icecrime commented Feb 3, 2016

Let's ask actual & potential users: ping @vieux @aluzzardi @ibuildthecloud @justincormack @ehazlett @donhcd @dnephin @vdemeester.

@dnephin
Copy link
Member

dnephin commented Feb 3, 2016

I think the split into engine-api is actually working extremely well. It's forcing us to consider API changes in isolation, which leads to more consistent and well thought out API design. It allows other projects which depend on the API to watch for changes and contribute feedback without the torrent of implementation-related changes in the engine.

We've just started this process, so I think it's normal there are some pain points. We should work on addressing them directly instead of abandoning the idea entirely.

What we are experiencing on Engine right now is what every other project out there has been experiencing forever. We made it worse on Engine, to make it better for everybody else.

+100

Most of the issues I see here I think can be fixed with some better tooling.

But we have tests - its called integration cli

integration-cli tests are part of the problem. If you test from the client you completely miss any API regressions. The client can cover up unexpected api changes. The end-to-end tests for engine should really be using the API directly, not the cli.

engine-api is a special case. You don't test an interface by testing its implementation. I think the relevant tests for an interface are:

  • is it consistent
  • is it backwards compatible
  • is it complete

The first two can be verified without any implementation, and the last can't be verified without every implementation, so can't be verified in engine either (each implementation is responsible for testing itself).

Testing of packages like go-builder is easy to do in isolation. All the important unit and functional tests can be done in the go-builder repo. engine only needs a couple high level end-to-end tests for the integration of the package.

I don’t think the stability, or lack of it, can be blamed on which repo the code sits in.

I think it makes a big difference. When you look at API changes as a set of 20+ other file changes it's easy to miss problems. You're focused on too many things at once. If you look at API changes in the context of "the api repo" and there are only 2 or 3 file changes, it's much easier to focus on the API design, and not get lost in the implementation changes.

It also makes it possible for other projects who depend on the API to be involved, because watching the engine-api repo is a much easier than watching every change to docker/docker.

With go's import/pkg system there's really no reason (at this time) not to keep it with the engine and let people import it if they want just that bit.
A better approach would be to move it to the pkg dir and see how that feels first.

If we went with this approach it would never be extracted. In order to make any progress on this we need to pull it out and start severing the ties early. It's painful at first, but it gives us a chance to identify the pain points and create the right tooling to address it.

Keeping it in the engine means the pain points are only felt by those outside the engine team, which aren't in a position to fix them.

I personally think this split out is long overdue, not premature.

@LK4D4
Copy link
Contributor

LK4D4 commented Feb 3, 2016

Also I saw that @bobrik used engine-api for his project.

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 3, 2016

Note too that once the server api is moved out things should be less painful since the API change happens in one place, vendor-in and update the implementation only.

I agree it's certainly been more painful, however I think in the long run it's a huge benefit to all projects involved and will help interoperability.
Changes to the API becomes a discussion of how the change affects multiple projects rather than "what cool new thing can we make Docker do"

@vieux
Copy link
Contributor

vieux commented Feb 3, 2016

Swarm maintainer here,

I am very happy about the engine-api split.
Swarm is currently using samalba/dockercient and it's extremely outdated, every time a new command/flag is merged in engine, we have to update samalba/dockercient.
Even worse, we aren't even notified.

When swarm-1.1.0-rc1 was released, it was incompatible with docker update and all the new networking flags like --ip or --alias because there is no way for us to see what's new in engine except by reading the changelog, that was produced with engine 1.10-rc1 (at the same time)

Hopefully we had time to fix during our "code freeze" period, but it's really not ideal.
Thanks to engine-api we are going to share:

  1. the API types so they will always be up to date (--ip or --alias would have worked right away)
  2. the API Client
  3. In the future, the API server, so we now when a new endpoint is added (like docker update)

@vdemeester
Copy link
Member

Well almost everything I wanted to say has been said, either by @icecrime or @dnephin 😻.

I think the split into engine-api is actually working extremely well. It's forcing us to consider API changes in isolation, which leads to more consistent and well thought out API design. It allows other projects which depend on the API to watch for changes and contribute feedback without the torrent of implementation-related changes in the engine.

We've just started this process, so I think it's normal there are some pain points. We should work on addressing them directly instead of abandoning the idea entirely.

+1000

integration-cli tests are part of the problem. If you test from the client you completely miss any API regressions. The client can cover up unexpected api changes. The end-to-end tests for engine should really be using the API directly, not the cli.

I couldn't agree more. That's a thought that is turning and turning in my head (both for docker/docker and docker/libcompose project), that all our integration tests are cli-oriented ; there should be way more API integration tests than cli's.

The docker/engine-api most certainly needs more maintainers (from docker/docker but most importantly from other project that actively depends on it — swarm, etc., soon libcompose and probably many other projects). And @cpuguy83 said it very well ; couldn't agree more.

Changes to the API becomes a discussion of how the change affects multiple projects rather than "what cool new thing can we make Docker do"

In short, I agree it appears a little bit more painful but in the long run, I really think it's gonna pay of !

@tiborvass
Copy link
Contributor

@vdemeester in fact, we should probably use the engine-api client in our tests too unless necessary to use the binary client. The client itself has to be tested to make sure it does the right api calls.

@ehazlett
Copy link
Contributor

ehazlett commented Feb 3, 2016

Coming from using a third party Go client I'm a huge +1 for this. It's very painful to try and keep up with the change (types and lack of support for X in the third party, etc). Having a single client that is always in step reduces a tremendous amount of maintenance burden.

@unclejack
Copy link
Contributor

API testing and integration-cli testing are two different things. Before integration-cli, we only had API testing and testing done using internal APIs. Tests were passing and we still had lots of regressions between Docker versions. I don't see API based tests deprecating integration-cli.

I think these separate repositories have been useful when some projects had to evolve rapidly (libnetwork and distribution), but it's also a problem for projects which use these. I don't think we should merge these back in to the main repository for now, but we need to make sure we don't cause problems for downstream users of these packages and that we avoid problems for those who use them.

@dnephin
Copy link
Member

dnephin commented Feb 4, 2016

API testing and integration-cli testing are two different things. Before integration-cli, we only had API testing and testing done using internal APIs. Tests were passing and we still had lots of regressions between Docker versions. I don't see API based tests deprecating integration-cli.

Definitely, I didn't mean to suggest that integration-cli was completely unnecessary. It's still important to have some tests that cover the client behaviour, I was just suggesting that engine behaviour should be tested at the API level. There have been major regressions because of a lack of API level testing.

I was trying to suggest that having tests at different boundaries isn't test duplication, it's a necessity for complete test coverage.

@justincormack
Copy link
Contributor

There are good use cases for engine-api, as outlined above. But please do file issues for enabling Windows CI where it is not being run, and for "testing X is too difficult", these are definitely issues that need fixing.

@ibuildthecloud
Copy link
Contributor

Just random two cents. The engine-api project immensely helps us (Rancher). If it is a separate project as in github.com/docker/engine-api or in Docker as github.com/docker/pkg/engine-api makes little difference to me. One possible thing I could think of is that currently docker as a A LOT of pkg/ projects that I use. I could possibly see an issue in which I want a different version of the client, but not a different version of all the pkg/ projects. For example, what I could see is that often we are conservative in bumping the client version (for most of our stuff we still use Docker 1.6 API) so we might want an old client but newer code from pkg/ for something else. From that perspective I'd I lean toward a separate repo.

Another thing about separating repos is that because you have to change in two places and coordinate it makes people think a lot more about the impact. This may have an advantage in that I feel like the API often changes too rapidly and people to easily break backwards compatibility.

@thaJeztah
Copy link
Member

engine-api has been moved back in, so I'll close this issue, but as always, feel free to comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests