Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Helm release to name resources #1246

Merged

Conversation

TBBle
Copy link
Contributor

@TBBle TBBle commented Aug 5, 2020

This set of changes ensures that every k8s resource created by the Helm chart includes the fullname macro which includes the name of the Helm release.

It's a big change, and is probably much easier to read commit-by-commit, due to the distance between different parts for some of the specific changes.

The main observable effect is that all the resources that were hard-coded to om-XXX or open-match-XXX are now RELEASE-open-match-XXX, although if RELEASE contains "open-match" then they will be RELEASE-XXX.

In a special case (or rather, removing a special case), e01c77b causes the open-match-scale subchart's resources to be RELEASE-open-match-scale-XXX, which means a release named "open-match" gets, e.g., Deployments named open-match-backend and open-match-open-match-scale-backend. It wasn't clear to be if this is desirable, and if not, that commit can be removed, producing open-match-scale-backend instead, but cementing that the subcharts are really just part of the main chart.

Similarly, redis no longer has a fullnameOverride set, so its resources will generally be of the form RELEASE-redis-XXX, e.g., a StatefulSet named open-match-redis-master.

Note: Redis PVCs names change with this. Since it's a StatefulSet, the old PVCs are left behind. It's still possible to set redis.fullnameOverride to "om-redis" to use any existing PVCs in a namespace, when upgrading. If your release is named om, this will be the default name anyway.

As noted in the values.yaml, the hostName and serviceAccount values now default to empty to generate names, and can be overridden where desired.

To regain the short-names, you can use the YAML file (inside the Details) with --values. The only thing that will differ is the ConfigMap created by open-match-scale for Grafana dashboards, because it was already named open-match-scale-dashboard. If you do this, you can only have one such release in a namespace, and you cannot then have a release named 'om' or 'open-match' installed at the same time, as some resources will conflict.

# Reinstate all the resource names as if I hadn't done anything
fullnameOverride: "om"
open-match-scale:
  fullnameOverride: "om"
  configs:
    default:
      configName: "om-configmap-default"
    override:
      configName: "om-configmap-override"
  scaleBackend:
    hostName: "om-scale-backend"
  scaleFrontend:
    hostName: "om-scale-frontend"

open-match-customize:
  fullnameOverride: "om"

redis:
  fullnameOverride: "om-redis"
  serviceAccount:
    name: "open-match-redis-service"
global:
  kubernetes:
    serviceAccount: "open-match-unprivileged-service"

configs:
  default:
    configName: "om-configmap-default"
  override:
    configName: "om-configmap-override"

Tested to support multiple installs using the following commands from install/helm/open-match

helm dep build
helm install open-match . --values om-install-all.values.yaml --set-string global.image.tag=1.0.0
helm install another-open-match . --values om-install-all.values.yaml --set-string global.image.tag=1.0.0

with TLS keys generated per TLS Encryption

where om-install-all-values.yaml is set to install as many of the templated resources as possible, as below:

open-match-core:
    enabled: true
    swaggerui:
        enabled: true
    redis:
        enabled: true

open-match-customize:
    enabled: true
    function:
        enabled: true
    evaluator:
        enabled: true

open-match-scale:
    enabled: true

open-match-telemetry:
    enabled: true

open-match-override: true

usingHelmTemplate: false

ci: false

global:
    tls:
        enabled: true
    telemetry:
        grafana:
            enabled: true
        jaeger:
            enabled: true
        prometheus:
            enabled: true

I repeated the test with --set ci=true appended to both, to confirm that the testing resources do not conflict.

Fixes: #1235

Paul "Hampy" Hampson added 4 commits August 4, 2020 14:20
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Picked up the VSCode Yaml auto-formatter.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
It's not used.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
They're not used.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
@TBBle
Copy link
Contributor Author

TBBle commented Aug 5, 2020

My build failed, but I cannot see why, as I don't have access to the logs.

@TBBle
Copy link
Contributor Author

TBBle commented Aug 5, 2020

I tried to run the build on our own GCP account, but it fails to submit:

PS D:\third-party-repos\open-match> gcloud builds submit --config=cloudbuild.yaml --substitutions=_OM_VERSION=DEV .
Creating temporary tarball archive of 483 file(s) totalling 13.0 MiB before compression.
Uploading tarball of [.] to [gs://GCP_ID_cloudbuild/source/1596626035.019253-a76e8f7428934d93bc573650e12090a6.tgz]
ERROR: (gcloud.builds.submit) INVALID_ARGUMENT: Request contains an invalid argument.

There's no clues as to which argument is invalid, nor any logs in Cloud Build that I can see.

{
  "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "status": "INVALID_ARGUMENT"
  }
}

If it'd give better feedback, I could try and arrange access to a Linux host with Docker installed, and set up cloud-build-local here to diagnose this. Although I assume the INVALID_ARGUMENT would be the same...

@Laremere
Copy link

Laremere commented Aug 5, 2020

Due to limitations from google cloud build, you are required to join the Open Match mailing list to see build failures: https://groups.google.com/forum/#!forum/open-match-discuss

@Laremere Laremere self-requested a review August 5, 2020 18:15
@TBBle
Copy link
Contributor Author

TBBle commented Aug 6, 2020

I've joined the discussion group, and I can see the build logs now.

@TBBle
Copy link
Contributor Author

TBBle commented Aug 6, 2020

CI failure was

--- FAIL: TestMatchFunctionMatchCollision (0.45s)
    fetch_matches_test.go:158: 
        	Error Trace:	fetch_matches_test.go:158
        	Error:      	Should be true
        	Test:       	TestMatchFunctionMatchCollision
        	Messages:   	218.840219ms

which should not have any connection to my changes, but appears to have been a 20ms overshoot for a 200ms interval, but it's not clear if this is a failure, because it's not clear what this is actually trying to protect/ensure. Introduced in #1210 by @Laremere, which by my reading was supposed to be ensuring that registrationInterval had elapsed?

We'll see if a new test run fails the same way.

@TBBle TBBle force-pushed the release_name_in_helm_resource_names branch from 9b4b721 to 697ddcb Compare August 6, 2020 08:56
@TBBle
Copy link
Contributor Author

TBBle commented Aug 6, 2020

Good. Test rerun got past the timeout issue, and is now seeing the changes.

@TBBle TBBle force-pushed the release_name_in_helm_resource_names branch 2 times, most recently from 9ffce26 to 9b954c0 Compare August 6, 2020 10:16
@TBBle TBBle marked this pull request as ready for review August 6, 2020 10:44
@TBBle
Copy link
Contributor Author

TBBle commented Aug 6, 2020

I still need to fix the two Grafana dashboards which hard-code the pod names (redis.json and go-processes.json) to use labels instead, but apart from that this should be ready for review.

Edit: Oops, a few hard-coded hostnames referring to prometheus and jaeger as well. I suspect these would previously have only worked when a release named open-match was installed with the telemetry systems enabled, otherwise the names would still be wrong as there was never a fullnameOverride for the telemetry subcharts like there was for the redis subchart.

Paul "Hampy" Hampson added 10 commits August 7, 2020 16:54
This ensures that multiple OpenMatch installs in a single namespace do
not attempt to install Redis stacks with the same resource names.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
This avoids conflicts between multiple Open Match installations in the
same namespace.

`openmatch.fullname` named template per Helm default chart.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
This makes the existing global.kubernetes.serviceAccount value an
override if specified, but if left unspecified, an appropriate name will
be chosen.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
This makes the hostname templates more standard in their case, because
there is no need to coordinate the hostname with the superchart.

This chart still uses a lot of templates from the open-match chart
though, so it's not yet standalone-installable.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
A specific ConfigMap can be applied in the same way it was previously,
by overriding configs.default.configName and
configs.override.configName, in which case it is up to the person doing
the deployment to manage name conflicts.

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
@TBBle TBBle force-pushed the release_name_in_helm_resource_names branch 2 times, most recently from d477a25 to dcaacad Compare August 7, 2020 07:38
@TBBle TBBle force-pushed the release_name_in_helm_resource_names branch 5 times, most recently from 3a3e541 to 8441dd0 Compare August 7, 2020 09:07
Paul "Hampy" Hampson added 2 commits August 7, 2020 19:27
This fixes an existing issue where the Jaeger connection URLs in
the configuration would be incorrect if your Helm chart was not
installed as a release named "open-match".

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
This allows us to access the Prometheus subchart's named template to get
the correct Service name for the datasource.

This fixes an existing issue where the Prometheus data source URL in
Grafana would be incorrect if your Helm chart was not installed as
a release named "open-match".

Signed-off-by: Paul "Hampy" Hampson <p_hampson@wargaming.net>
@TBBle TBBle force-pushed the release_name_in_helm_resource_names branch from 8441dd0 to 993e48b Compare August 7, 2020 09:28
@TBBle
Copy link
Contributor Author

TBBle commented Aug 7, 2020

Okay, this should now be complete. I believe I've looked at and addressed (or ignored) every occurrance of om- or open-match- in this codebase.

The docs are going to need work to align with this too.

Both the Jaeger and Grafana setup were already invalid if you didn't name your release open-match, as the configuration had hard-coded names to match names the subcharts generate based on the release name. This was fixed as well, although it pushes some hard-coding into the Makefile when trying to generate partial manifests that refer to data only known by previous or future partial manifests.

It's also probably feasible to combine Helm options weirdly and get invalid setups, but hopefully that's limited to situations that would be invalid anyway, e.g., enabling Grafana without Prometheus won't give you a usable Grafana, but it didn't before I started either.

@Laremere
Copy link

Hey thanks for the PR.

I've already starting looking through it, but there's a lot going on here so the review will take a bit, and probably take a couple rounds...

Are you able to handle the documentation changes as well?

@TBBle
Copy link
Contributor Author

TBBle commented Aug 11, 2020

Yeah, I'm happy to do the documentation changes as well. I didn't want to start on those until I had some feedback on the work itself, since it lives in a separate repo so I can't make it part of this PR anyway, and I don't want to rework the documentation changes if there's UX-visible changes needed in this PR.

I broke it down into commits that should each be reasonably easy to review. Trying to process it just from the "Files changed" list is hard, due to a lot of adjacent changes for unconnected things, and distant changes in connected things.

Copy link

@Laremere Laremere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing the service names, which will screw with anyone who is hard coding access to those. I'm not too worried though - as long as we include in the release notes something about that and needing to set the overrides in the config, I'm good.

On a different note - Do people.... actually think helm is a good tool? Your PR is not to blame, but this is a templating nightmare for what is a few deployments, services, and autoscalers. It's really forcing a lot of unnecessary complexity.

Makefile Show resolved Hide resolved
@TBBle
Copy link
Contributor Author

TBBle commented Aug 12, 2020

On a different note - Do people.... actually think helm is a good tool? Your PR is not to blame, but this is a templating nightmare for what is a few deployments, services, and autoscalers. It's really forcing a lot of unnecessary complexity.

Frankly, this chart setup has been done in a very complex (and rather non-idiomatic) way, which is why it's such a pain to work with. It's not really a Helm thing, it's mostly because it's trying to install a lot of stuff optionally, in very manual ways, and has not really taken good advantage of Helm.

The entire telemetry subchart would be greatly simplified, for example, if we made use of operators rather than directly installing (probably-duplicate) daemonsets and deployments for Prometheus, Grafana, Jaeger, etc.

I haven't pushed on simplification here at all, because I wanted to leave that for #1244 instead; I feel there's already plenty going on in this PR already.

@Laremere
Copy link

Would it be easier to start from scratch?

@TBBle
Copy link
Contributor Author

TBBle commented Aug 13, 2020

I don't think so, particularly. Easier to evolve from working state to working state. And we'd need to have the same discussions either way about what aspects of the current Helm chart UX can be or should be changed, either way.

@@ -30,16 +30,16 @@ var (
// serviceAddresses is a list of all the HTTP hostname:port combinations for generating a TLS certificate.
// It appears that gRPC does not care about validating the port number so we only add the HTTP addresses here.
serviceAddressList = []string{
"om-backend:51505",
"open-match-backend:51505",
"om-demo:51507",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't these values changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea where they come from. They're not part of the Helm chart generated code. I think they're used in the tutorials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of them come from https://github.com/googleforgames/open-match/blob/master/install/02-open-match-demo.yaml for example, which appears to be hand-maintained. I assume that will be cleaned up as part of #1244 cleaning up the openmatch-customize subchart.

@Laremere
Copy link

So I think were I am is that this PR looks pretty good, I don't spot in obvious flaws (sans outstanding comments), and the tests are happy. I'm not happy with how the helm charts keep growing in complexity, but I'll approve the PR after the comments are resolved.

@Laremere Laremere merged commit 79e9afe into googleforgames:master Aug 17, 2020
@TBBle TBBle deleted the release_name_in_helm_resource_names branch August 18, 2020 14:50
@TBBle
Copy link
Contributor Author

TBBle commented Aug 18, 2020

Am I correct in assuming that the 0.0.0-dev version in the Helm repository is an autobuild from master?

Also, note to @TBBle, do not forget the doc updates.

@Laremere
Copy link

I'm actually not sure - I haven't used the helm repository and others set it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support installing more than one Open Match instances in the same namespace
5 participants