Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for meta-generator #1315

Closed
wants to merge 6 commits into from

Conversation

pwittrock
Copy link
Member

No description provided.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 1, 2017
kubegen --apis-dir notpkg/apis --apis-dir pkg/notapis
```

- run all code generators against discovered APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discovered api mean types.go exists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think it should since folks may want to break up a monolithic types.go file. I was thinking purely based off directory structure, and then the presence of +genclient

provide both as positional arguments.

```sh
kubegen apps apps/v1 apps/v1beta1 extensions extensions/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have to pass internal and external groups? Will conversions be created if you only pass the external groups? What about clientsets and informers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups are always internal, external requires a version.

@pwittrock
Copy link
Member Author

PTAL

@pwittrock
Copy link
Member Author

cc @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-feature-requests

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. labels Nov 6, 2017
@pwittrock
Copy link
Member Author

@caesarxuchao @mbohlool

Mind taking a look at this?

@mikedanese
Copy link
Member

@BenTheElder @ixdy please take a look at bazel pieces.

@sttts
Copy link
Contributor

sttts commented Nov 6, 2017

/cc @munnerz @nikhita @ericchiang

@pwittrock
Copy link
Member Author

@thockin No, the goal is not to move the existing generators to a single binary. Existing Makefiles and build systems will continue to function without change.

User story: As an author of Kubernetes extensions, I want a simple way to run the code generators against my types.go files.

The goal is to provided a simplified interface for running code generators so that folks building extensions don't need to spend hours reading about each of our code generators and writing and maintaining dozens of lines of bespoke Makefile logic.

@thockin
Copy link
Member

thockin commented Nov 13, 2017

I see. Devil's advocate: If we linked all our generators into a single binary and made it single-pass we could:

  • get rid of the crazy Makefile (one pass would probably be fast enough to just always do generation)
  • use the same tool for ourselves as 3rd parties
  • get rid of the crazy Makefile

A big part of the perf hit is parsing ~ the whole codebase multiple times.

@sttts
Copy link
Contributor

sttts commented Nov 14, 2017

A big part of the perf hit is parsing ~ the whole codebase multiple times.

This is worth a prototype. This was not in our focus at all, but if your claims holds, it would be a great improvement.

@pwittrock
Copy link
Member Author

@thockin I like the idea. Though it wasn't the original focus, any area where we can simultaneously reduce both development complexity and build times seems like a good place to focus our efforts.

I suggest we prototype performance optimizations after completing the MVP described in this proposal. I am guessing that doing so will require some restructuring of the code generator libraries to inject singleton instances of dependencies.

@sttts
Copy link
Contributor

sttts commented Nov 14, 2017

I suggest we prototype performance optimizations after completing the MVP described in this proposal. I am guessing that doing so will require some restructuring of the code generator libraries to inject singleton instances of dependencies.

The MVP exists with the shell scripts already. We need to prototype a binary which can launch the other generators. Command line is actually secondary for that.

@pwittrock
Copy link
Member Author

What is still required to get consensus on this?

Command line is actually secondary for that

Do you mean compiling them into a single binary is secondary?

@sttts
Copy link
Contributor

sttts commented Nov 17, 2017

Do you mean compiling them into a single binary is secondary?

There is work to be done to make the generators work inside one binary (some preparations for that already merged, so it shouldn't be too hard). But to evaluate whether Tim's use-case has any chance to work out performance-wise (it would be fantastic for our tooling complexity), the actual command line is secondary.

But: if the use-case "generate for the whole kube/kube repo" is feasible, it might have bigger influence on our generator command line: we have more than one clientset in the repo, i.e. we would need a way to specify that on a kubegen commandline, while sharing the gengo Universe (i.e. to avoid re-parsing for every clientset).

@thockin
Copy link
Member

thockin commented Nov 19, 2017

Even if we reduced the Makefile to a single target that figured out what needed to be generated-for, it would be a HUGE reduction in complexity.

@pwittrock
Copy link
Member Author

Thanks @thockin and @sttts for the feedback. Your comments have been insightful and will help come up with an implementation that will result in better future iterations N+.

Now that this proposal has been open for discussion for nearly 3 weeks without any comments suggesting significant changes in design or goals. Can we call this lazy consensus and assume that we will address additional things we discover as we implement V0 and then iterate? Is there anything we need to resolve that we can't iterate on after this is merged?

But to evaluate whether Tim's use-case has any chance to work out performance-wise (it would be fantastic for our tooling complexity), the actual command line is secondary.

I suggest we worry about performance optimizations after we have a working implementation of the desired user experience.

But: if the use-case "generate for the whole kube/kube repo" is feasible, it might have bigger influence on our generator command line: we have more than one clientset in the repo, i.e. we would need a way to specify that on a kubegen commandline, while sharing the gengo Universe (i.e. to avoid re-parsing for every clientset).

We will want to get here eventually, but I would prefer launch something meeting the use case to reduce complexity for extensions (since they extension authors will not have already built out complex Makefiles that meet their needs). It sounds like there is enough complexity re multi-clientsets that we don't want to block starting development on issues related to this.

@thockin
Copy link
Member

thockin commented Nov 20, 2017

I have no further opinion on this PR as-is :)

@sttts
Copy link
Contributor

sttts commented Nov 20, 2017

We will want to get here eventually, but I would prefer launch something meeting the use case to reduce complexity for extensions (since they extension authors will not have already built out complex Makefiles that meet their needs). It sounds like there is enough complexity re multi-clientsets that we don't want to block starting development on issues related to this.

It makes a big difference in scope and the command line interface. I don't think it's a good idea to leave this question open if we don't have a plan how to extend the tool towards that use-case. I am fine with starting small, but let's flash out a sketch how this "multi clientset" use-case can look like. Until that I am against calling the proposal ready.

@pwittrock
Copy link
Member Author

It makes a big difference in scope and the command line interface

I don't have a great grasp of the details around multi-clientset. Can you elaborate? How does it change the interface for the command? What additional information is missing that will need to be present?

@sttts
Copy link
Contributor

sttts commented Nov 21, 2017

I don't have a great grasp of the details around multi-clientset. Can you elaborate? How does it change the interface for the command? What additional information is missing that will need to be present?

Now the command does one clientset, i.e. all given API groups are put into one clientset. For the kubernetes/kubernetes use-case we need multiple sets of API groups and API directories. I could imagine this:

kubegen \
  internal --api-dir k8s.io/api --api-dir --target pkg/client pkg/apis apps/v1 apps/v1beta1 core/v1 \
  versioned --api-dir k8s.io/api --target k8s.io/client-go/kubernetes apps/v1 core/v1 \
  ...

This would also include that we detect overlapping directories for deepcopy, conversions, etc. (all those file which are created inside the api dirs) and avoid double-generation.

@sttts
Copy link
Contributor

sttts commented Nov 21, 2017

I want to see a very minimal proof-of-concept of the multi-clientset idea to proof that the speed increase of not-reparsing 25 times pays out enough. If it does, it's awesome for Kubernetes itself. If not, we don't need that additional complexity in the CLI implementation.

@pwittrock
Copy link
Member Author

The multi-clientset as shown definitely muddies up the interface and seems like it is more complicated and harder to build consensus around. It is also trying to solve a different problem - speeding up Kubernetes builds vs improving development velocity by reducing complexity.

I am worried that focussing on performance optimizations will keep us from being able to move forward on the original stated goal - make running code generators for extensions accessible to non-kubernetes-veterans. While I would love to reduce complexity in the main repo and support multi-clientsets, I don't have the domain expertise to drive such a thing.

Why don't we agree that multi-client set is out of scope for v1 and revisit in the future.

@sttts
Copy link
Contributor

sttts commented Nov 22, 2017

Why don't we agree that multi-client set is out of scope for v1 and revisit in the future.

If our command line is extensible towards a multi-client use-case, I am fine with that plan. But I would like to keep that door open.

IMO, using a multi-subcommand syntax the command line is not that bad. The question is whether the CLI library that we will use supports this.

@pwittrock
Copy link
Member Author

I think we have enough agreement on the details to green light development. We can leave the door open to adapting this proposal based on feedback given during development.

IMO, using a multi-subcommand syntax the command line is not that bad. The question is whether the CLI library that we will use supports this.

Lets just use one that does. (e.g. cobra)

If our command line is extensible towards a multi-client use-case, I am fine with that plan. But I would like to keep that door open.

Sure. As long as trying to do this doesn't block making progress on the stated goals and keep us from making any improvements.

@sttts
Copy link
Contributor

sttts commented Nov 22, 2017

@pwittrock and I talked on slack of how to go forward with this. There was some misunderstanding about who does what and how the bigger use-case of multi-clientsets fits into the design proposal and its implementation plan.

As far as I see, there are the following work items:

  1. sketch how multi-clientsets can look like in the CLI as an extensions of the single-clientset use-case
  2. refactor main.go of defaulter-gen, conversion-gen, client-gen, lister-gen, informer-gen, deepcopy-gen to be free of non-command-line logic such that the generators can be instantiated either in kubegen or in their standalone-binaries
  3. prototype multi-clientsets without reparsing packages again and again (requires the client-gen part of 2)
  4. implement this proposal (requires 2)
  5. extend it to multi-clientset (requires 1, 2, 3, 4)

My plan is to do the most complicated one of (2), namely client-gen as a blueprint for the others ASAP, possibly this or early next week. I would be happy if @pwittrock or his colleagues can support with refactoring the other generators' main.gos following the client-get example.

When client-gen's main.go is done I will do the clientset-only prototype of (3) with a shared gengo universe until KubeCon.

Step 4 does not depend on (3) to be finished. For (1), here is an extension from the previous sketch that should unblock (4):

kubegen --dry-run --license BSD \
  internal --apis-dir k8s.io/api --apis-dir pkg/apis --client-output pkg/client --api-version apps/v1 --api-version apps/v1beta1 --apiversion core/v1 -- \
  versioned --apis-dir k8s.io/api --client-output k8s.io/client-go/kubernetes --api-group apps --api-group core \
  ...

In other words: we define a separator -- between sub-commands and their flags and pre-parse the command line into multiple logical lists of flags, which we then pass to cobra. I.e. the upper command will behave as two single, independent invocations:

kubegen internal --dry-run --license BSD --apis-dir k8s.io/api --apis-dir pkg/apis --client-output pkg/client --api-version apps/v1 --api-version apps/v1beta1 --apiversion core/v1
kubegen versioned --dry-run --license BSD --apis-dir k8s.io/api --client-output k8s.io/client-go/kubernetes --api-group apps --api-group core
  ...

This way we can implement the single-clientset use-case soon'ish, with a feasible extension to multi-clientset later, without too much ugliness in the command line.

Potentially, we will add more sub-commands to kubegen, especially we know that we need kubegen deepcopy <directories-that-are-no-apis>. The multi-clientset sketch above naturally extends this way, i.e. we can handle real api-groups and support other non-api generation steps in parallel.

@pwittrock
Copy link
Member Author

SGTM

```py
http_archive(
name = "io_k8s_rules_go",
url = "https://github.com/kubernetes/bazelbuild/releases/download/v1.8.0/rules_go-1.8.0.tar.gz",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new repo? or is it a clone of something?

### Running the Bazel target

```sh
bazel run //:kubegen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a separate bazel run command means that you can't just do bazel build and have it do the right thing.

for MVP this is fine, but you can certainly get more benefits by turning this into something more like a genrule.

There are 2 methods for running kubegen

- Directly through the kubegen command line by downloading the binary and running it from the project root
- Through Bazel by adding rules to `WORKSPACE` and `BUILD.gazel`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: gazel should be bazel?

bazel run //:kubegen
```

### Bazel options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by "options" I'm assuming you mean attributes of the kubegen rule?

### Running the Bazel target

```sh
bazel run //:kubegen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one other thing that might be worth noting is that calling bazel run kubegen may need to be followed by bazel run gazelle or whatever, since source files may change.

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet