New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial workflow (OWNERS, nodes, node channels, etc.) #1
Conversation
b694135
to
3ba7267
Compare
|
Two high level questions on this:
|
The latter. A postmerge prow hook will collect all of this, massage it into a single JSON blob, and publish to somewhere outside of GitHub for Cincinnati to consume (S3? An app-sre ConfigMap? Quay labels?).
The org name is OpenShift-specific, and folks who need to distinguish between multiple Cincinnati can rename of fork (e.g. my fork of openshift/release is wking/openshift-release). But on both points, the repo is new. Push back if I'm not convincing ;). |
|
All good. On the former, it's nice to have a massaged format. It'd be good to keep that closer to the way we query the key-value quay API. On the latter, it's just a minor naming concern. Cincinnati is originally the name of the protocol replacing Omaha. We are slowly conflating implementation AND metadata format under that name now. But only specifically for openshift API. However, there are already other implementations of the protocol (e.g. the endpoint for nightlies) and other metadata formats (e.g. the FCOS one). |
JSON looks like Python dicts and JavaScript objects, so it can't be too human-unfriendly ;). But sure, as long as a YAML/TOML parser is an acceptable dependency for the massaging/CI tools, I'd be ok with that. I'm fine deferring to the release managers; it should be easy to change later.
My long-term goal is to get nightlies into this flow and serve a graph for
I'd expect FCOS metadata to flow out of repos under other orgs. But no sense in diverging data formats. I'll take a closer look at the format in that repo. |
|
Checking the FCOS node definition, it looks like most of the time is spent distinguishing between multiple arches and formats. These rolls will eventually be managed in OpenShift with multi-arch images, so I don't think we need to be quite so elaborate, especially out of the gate when we are amd64-only. Do you think the FCOS feeder and this Cincinnati feeder are consuming different-enough artifacts to motivate different input formats? Or do you think my current |
3ba7267
to
bbcf2d8
Compare
|
I've pushed up bbcf2d8 adding edge handling and doing some other polishing. |
5b6281e
to
bfd6034
Compare
|
I've pushed a WIP script for publishing this data to Quay labels, which we can use until we work out a better way to feed it into Cincinnati. Doesn't actually have POST/DELETE support yet, but you can run it to see what it would change: $ hack/push-to-quay.py
changing 4.1.0-rc.6 previous.add from [] to [u'4.1.0-rc.5']
(u'4.1.0-rc.6', 'FIXME post', 'https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/manifest/sha256:1dabe42b5c94841fd8736d8f3a80afeaf5f5ad3833cef8d304c419a97b0efbc3/labels')
(u'4.1.0-rc.5', 'FIXME delete', 'https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/manifest/sha256:dc67ad5edd91ca48402309fe0629593e5ae3333435ef8d0bc52c2b62ca725021/labels/io.openshift.upgrades.graph.next.add')
changing 4.1.0-rc.8 previous.add from [u'4.1.0-rc.7'] to []
(u'4.1.0-rc.8', 'FIXME delete', 'https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/manifest/sha256:8250bbf79d4f567e24a7b85795aba97f9e75a9df18738891a0cb6ba42e422584/labels/io.openshift.upgrades.graph.previous.add')That's replacing the |
c3ed336
to
91cf6a3
Compare
|
Ok, fixed up the POST and DELETE handling (although I don't have creds to actually run those, so there are a handful of untested lines in the script. That should be all we need to start using this repo to manage the production graph metadata, vs. directly editing Quay tags. I'll start a follow-up branch to add CI checks to catch things like "listed version does not exist", impossible edge, cyclic graph, version removed from channel, etc. And once this branch lands, I'll add a postsubmit job to Prow to automatically publish after a PR merges. |
cd30087
to
7d73466
Compare
3f93e46
to
1dafd8c
Compare
|
@wking the metadata you were looking at for FCOS is only used for boot-images. The relevant ones for Cincinnati are (example from the
I think it is fine if they diverge, just let's try to keep the overall approach similar. Relevant semantic differences in FCOS are:
|
How does this represent edges? It has a |
|
Correct, a rollout only specifies the You can observe the current FCOS Note: all of these are moving pieces as of today, full details will be in a public repo once settled. |
|
Additionally, we just triggered a 24h-spread rollout right now, procedure is coreos/fedora-coreos-streams#17. |
Say a user upgrades from A->B, and then some flaw is discovered in B and you mark it a dead-end. How do they get off? With explicit from/to you can tune this sort of thing directly, and tooling can automate the tedium (e.g. see 1dafd8c). |
Luke pushed this up to Quay. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.0
Generated with: $ mkdir channels/prerelease-4.2 $ ln -rs nodes/4.2/4.2.0-rc.0.json channels/prerelease-4.2/ $ hack/graph-util.py extract-edges 4.2.0-rc.0
Luke pushed this up to Quay earlier today. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.1.19 We aren't going to tag it into 4.1.z because it failed it's 4.2.0-rc.3->4.1.19 promotion gate [1,2]. Possibly because the Kubernetes API server melted down: $ ERRORS="$(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/8217/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-7c5476564-hznw4_cluster-version-operator.log)" $ echo "${ERRORS}" | grep 'server is currently unable to handle the request' | head -n2 E1010 16:01:57.406046 1 memcache.go:135] couldn't get resource list for project.openshift.io/v1: the server is currently unable to handle the request E1010 16:09:57.352725 1 memcache.go:135] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request $ echo "${ERRORS}" | grep 'server is currently unable to handle the request' | tail -n2 E1010 17:17:57.303418 1 memcache.go:135] couldn't get resource list for user.openshift.io/v1: the server is currently unable to handle the request E1010 17:17:57.352110 1 memcache.go:135] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request $ echo "${ERRORS}" | grep 'server is currently unable to handle the request' | wc -l 255 I think this is just a flake in the 4.2->4.1 rollback, but we're going to skip the release just in case. [1]: https://openshift-release.svc.ci.openshift.org/releasestream/4-stable/release/4.1.19 [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/8217
Luke pushed this up to Quay on the 4th. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.1
Avoid:
$ hack/graph-util.py extract-edges 4.2.0-rc.1
Traceback (most recent call last):
File "hack/graph-util.py", line 360, in <module>
extract_edges_for_versions(directory='.', versions=args.version)
File "hack/graph-util.py", line 319, in extract_edges_for_versions
extract_edges(node=node, directory=os.path.join(directory, 'edges', '{major}.{minor}'.format(**match.groupdict()), version))
File "hack/graph-util.py", line 303, in extract_edges
with open(os.path.join(directory, '{}.json'.format(previous)), 'w+') as f:
IOError: [Errno 2] No such file or directory: './edges/4.2/4.2.0-rc.1/4.1.18.json'
Generated with: $ ln -rs nodes/4.2/4.2.0-rc.1.json channels/prerelease-4.2/ $ hack/graph-util.py extract-edges 4.2.0-rc.1
Luke pushed this up to Quay on the 8th. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.2 But we're not going to tag it into 4.2.z, because its machine-os-content was from a failed build, the RHCOS pipeline will recycle its tag for its next build, and the machine-os-content referenced from rc.2 will be garbage-collected.
Luke pushed this up to Quay on the 9th. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.3 But that hit [1], so we're not going to tag it into 4.2.z. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1760103
Luke pushed this up to Quay on the 10th. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.4 But that hit [1], so we're not going to tag it into 4.2.z. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1721583
Luke pushed this up to Quay a few hours ago. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0-rc.5
Only mark edges into channels to which both the previous and next nodes belong. Also skip edges where the previous node doesn't exist at all.
4.1.18 is not in prerelease-4.2. Pick up the fix from the previous graph-util commit. Generated with: $ hack/graph-util.py extract-edges 4.2.0-rc.1
Also tagging 4.2.0-rc.5 into prerelease-4.2. Generated with: $ mkdir channels/candidate-4.2 $ ln -rs nodes/4.2/4.2.0-rc.5.json channels/prerelease-4.2/ $ ln -rs nodes/4.2/4.2.0-rc.5.json channels/candidate-4.2/ $ ln -rs nodes/4.1/4.1.18.json channels/candidate-4.2/ $ hack/graph-util.py extract-edges 4.2.0-rc.5
Because: edge channels [u'prerelease-4.2'] for 4.1.18->4.2.0-rc.1 differ from node channels [] is more actionable than the previous: edge channels [u'prerelease-4.2'] differ from node channels []
15ce1fc
to
ced4fed
Compare
Luke pushed this up to Quay a few hours ago. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.1.20 It failed it's upgrade from 4.2.0-rc.5 [1], but we may decide to promote it into channels anyway (at least the 4.1 channels). [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/8325
Luke pushed this up to Quay 27 hours ago. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.0
Generated with: $ ln -rs nodes/4.1/4.1.20.json channels/prerelease-4.1/ $ ln -rs nodes/4.1/4.1.20.json channels/stable-4.1/ $ hack/graph-util.py extract-edges 4.1.20
Generated with: $ mkdir channels/fast-4.2 channels/stable-4.2 $ ln -rs nodes/4.2/4.2.0.json channels/candidate-4.2/ $ ln -rs nodes/4.2/4.2.0.json channels/fast-4.2/ $ ln -rs nodes/4.2/4.2.0.json channels/stable-4.2/ $ hack/graph-util.py extract-edges 4.2.0
Thiago pushed this up to Quay yesterday. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.1.21
Thiago pushed this up to Quay yesterday. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.1
Thiago pushed this up to Quay today, when Jessica noticed that we left 4.2.0->4.2.1 out of the 4.2.1 metadata. 4.2.2 has the same payload excepting the fixed previous-version metadata. Generated with: $ hack/graph-util.py update-node quay.io/openshift-release-dev/ocp-release:4.2.2
Generated with: $ ln -rs nodes/4.1/4.1.21.json channels/prerelease-4.1/ $ hack/graph-util.py extract-edges 4.1.21
Generated with: $ ln -rs nodes/4.2/4.2.1.json channels/prerelease-4.2/ $ hack/graph-util.py extract-edges 4.2.1 We left 4.2.0->4.2.1 off the 4.2.1 metadata by mistake, and fixed by cutting 4.2 as described in c9e45b0 (nodes/4.2/4.2.2: Add a new release, 2019-10-25).
Generated with: $ ln -rs nodes/4.2/4.2.2.json channels/prerelease-4.2/ $ hack/graph-util.py extract-edges 4.2.2
Each time before updating Quay. This is slow (tens of seconds), but means release admins don't have to bother tracking the nodes and their default edges in this repository (personally, I'd rather track nodes and edges [1], but I'm not a release admin ;). The delay is mitigated by keeping a local cache (outside of version control) of the node metadata. [1]: openshift#1
Still need to work out edge handling, but here's what I have so far for folks to look at. CC @lucab, @steveej