-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[origin-aggregated-logging 207] Add diagnostics for aggregated logging #10964
Conversation
[test] |
cc @sosiouxme first pass review.. |
} | ||
for _, name := range clusterReaderRoleBindingNames.List() { | ||
if !boundServiceAccounts.Has(name) { | ||
r.Error("AGL0610", nil, fmt.Sprintf(clusterReaderUnboundServiceAccount, name, project, project, name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating arguments, use the %[n]s
arg-specifier notation in the template. So
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:%[2]s:%[1]s
|
||
func newFakeDiagnostic(t *testing.T) *fakeDiagnostic { | ||
return &fakeDiagnostic{ | ||
messages: make(map[string]fakeLogMessage, 20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make
is not used often. This would normally be:
map[string]fakeLogMessage{}
func (d *AggregatedLogging) Check() types.DiagnosticResult { | ||
project := retrieveLoggingProject(d.result, d.masterConfig, d.OsClient) | ||
if len(project) != 0 { | ||
checkServiceAccounts(d, d, project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look really, really weird. Is there a reason you couldn't have d.result
as the first arg for each of these and skip the Error/Debug/Warn/Info
definitions above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sosiouxme changed it to facilitate testing of the code by introducing an interface 'diagnosticReporter' that would allow verifying various messages were produced. It may be there is a way to utilize the actual code and retrieve the results from where it saves them. I sort of envision moving this reporter to a higher, public interface so it could be used across the spectrum of diagnostics for testing.
cc @openshift/ui-review |
@openshift/cli-review |
869d82e
to
e42e60e
Compare
|
||
const daemonSetPartialNodesLabeled = ` | ||
There are some nodes that do not match the selector for DaemonSet '%s'. | ||
A list of these nodes can be discovered by running: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this be the list of nodes that do match, might be more clear if this sentence said "A list of matching nodes can be discovered by running:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the error, but I'm wondering about the value now. Wouldn't be more useful to be able to determine the nodes that DO NOT match the selector which is what the first line in the sentence is saying. Is there a way to find all nodes that do not match these labels? That should be the correct ones right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified to address nodes that match... not sure how we would advise to get unmatched nodes.
The Pod '%[1]s' matched by DaemonSet '%[2]s' is not in '%[3]s' status: %[4]s. | ||
|
||
Depending upon the state, this could mean there is an error running the image | ||
for one or more pod container, the node could be pulling images, etc. Try running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plural containers, think this occurs in a couple other places in this PR
return false, errors.New("config must include a cluster-admin context to run this diagnostic") | ||
} | ||
if d.KubeClient == nil { | ||
return false, errors.New("config must include a cluster-admin context to run this diagnostic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these two don't start capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
func decodeSecret(secret *kapi.Secret, key string) (string, error) { | ||
value, ok := secret.Data[key] | ||
if !ok { | ||
return "", errors.New(fmt.Sprintf("The %s secret did not have an data entry for %s", secret.ObjectMeta.Name, key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'an entry' or 'a data entry'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
) | ||
|
||
const routeUnaccepted = ` | ||
An unaccepted route is a most likely due to one of the following reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'is most likely' or 'is usually'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if route.Spec.TLS != nil && len(route.Spec.TLS.Certificate) != 0 && len(route.Spec.TLS.Key) != 0 { | ||
checkRouteCertificate(r, route) | ||
} else { | ||
r.Debug("AGL0331", fmt.Sprintf("Skipping check of key and certificate of route '%s'. It could be be missing one or the other or both", route.ObjectMeta.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping key and certificate checks on route X. Either of them may be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
var serviceAccountNames = sets.NewString("logging-deployer", "aggregated-logging-kibana", "aggregated-logging-curator", "aggregated-logging-elasticsearch", fluentdServiceAccountName) | ||
|
||
const serviceAccountsMissing = ` | ||
Did not find any logging ServiceAccounts. The logging infrastructure may not be able to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging infrastructure may not function properly without a service account with cluster permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
re-run the installer. | ||
` | ||
const serviceAccountMissing = ` | ||
Did not find ServiceAccount '%s'. The logging infrastructure may not be able to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging infrastructure may not function properly without a service account with cluster permissions.
var loggingServices = sets.NewString("logging-es", "logging-es-cluster", "logging-es-ops", "logging-es-ops-cluster", "logging-kibana", "logging-kibana-ops") | ||
|
||
const serviceNotFound = ` | ||
Expected to find '%s' among the logging services for the project but did not. It may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the options chosen while running the deployer, these services may not have been provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified slightly to exclude mention of the deployer since it will be going away
return | ||
} | ||
if len(endpoints.Subsets) == 0 { | ||
r.Warn("AGL0225", nil, fmt.Sprintf("There are no endpoints found for service '%s'. This may be immaterial if the backing pods were not deployed (e.g. ops).", service)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a fan of using the word immaterial but I dont have another suggestion at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a fan of using the word immaterial but I dont have another suggestion at the moment
"This might not be a problem if..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
On Fri, Sep 23, 2016 at 2:06 PM, Sam Padgett notifications@github.com
|
f9dafd2
to
2d26c36
Compare
@openshift/cli-review @sosiouxme any more comments? any +1, LGTM...Beuller? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit otherwise LGTM. This is nice!
There are no nodes that match the selector for DaemonSet '%[1]s'. | ||
An example of a command to target a specific node for this DaemonSet: | ||
|
||
oc label node/ip-172-18-2-170.ec2.internal %[2]s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the indentation is consistent across all messages. We usually like 2 spaces which is the same used in commands examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which indentation are you refering to? The initial statement like "There are..." or the command example: " oc label"? Do you maybe have an example to reference I can follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this line oc label ...
. Just make sure there are 2 spaces of indentation here and in other messages where you give command examples. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed one file where it was an issue
f9c763e
to
51b07af
Compare
` | ||
|
||
func checkClusterRoleBindings(r diagnosticReporter, adapter clusterRoleBindingsAdapter, project string) { | ||
r.Info("AGL0600", "Checking ClusterRoleBindings...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an effort to minimize the "wall of text" that you get when running diagnostics, I would prefer that these sub-steps be announced only at Debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if loggingUrl.Host == route.Spec.Host { | ||
project := route.ObjectMeta.Namespace | ||
r.Debug("AGL0015", fmt.Sprintf("aggregated logging project name: '%s'", project)) | ||
return project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this loop to go on looking for other routes that match and raise a warning if there are any, as that would be a pretty clear sign that someone had multiple logging deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it should check that the project has an empty nodeSelector annotation and issue a warning if it doesn't, as that's an easy problem to have and not notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to look for additional issues. Added empty nodeSelector check
r.Debug("AGL0013", fmt.Sprintf("Comparing URL to route.Spec.Host: %s", route.Spec.Host)) | ||
if loggingUrl.Host == route.Spec.Host { | ||
project := route.ObjectMeta.Namespace | ||
r.Debug("AGL0015", fmt.Sprintf("aggregated logging project name: '%s'", project)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be promoted to an Info e.g.
"Found route %s matching logging URL %s in project %s"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
d.result.Info(id, message) | ||
} | ||
|
||
func (d *AggregatedLogging) Error(id string, err error, message string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's one thing I really don't like about these wrapper routines... they obscure the source of the error. You get:
ERROR: [AGL0515 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:96]
... regardless of where in the code it actually was called. So instead of a file and line number to look at to see what the context of the error was, you have to go by ID. I don't know if that's worth changing your whole testing framework over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests have value. I am leaning towards leaving it as is and to modify it if we find it becomes an issue going forward.
d.Error("AGL0505", err, fmt.Sprintf("There was an error while trying to retrieve the pods for the AggregatedLogging stack: %s", err)) | ||
return | ||
} | ||
if len(saList.Items) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be service accounts in a project... they're created by default. So this will never be seen. I would rather see this message instead of four "can't find SA" msgs, but the meaning should be that none of the expected service accounts are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to log single message. removed zero check
|
||
//checkKibanaRoutesInOauthClient verifies the client contains the correct redirect uris | ||
func checkKibanaRoutesInOauthClient(r types.DiagnosticResult, osClient *client.Client, project string, oauthclient *oauthapi.OAuthClient) { | ||
r.Info("AGL0141", "Checking oauthclient redirectURIs for the logging routes...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
if foundServices.Has(service) { | ||
checkServiceEndpoints(r, adapter, project, service) | ||
} else { | ||
r.Warn("AGL0215", nil, fmt.Sprintf(serviceNotFound, service)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that all the services are treated as equal here. The non-ops ones should be an error if they're missing, as nothing will work. The ops ones should be errors too, if we can determine that they're supposed to exist, but I'm not sure we can cleanly, and I think the deployment creates them regardless of whether there's anything to use them. So I think the ops ones should just be warnings if they're missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to warn for ops and error on non-ops
return | ||
} | ||
if len(dcList.Items) == 0 { | ||
r.Error("AGL0047", nil, fmt.Sprintf("Did not find any matching DeploymentConfigs in project '%s'", project)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little context for what this implies would be good. Something like "This means that no logging components have been deployed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
r.Info("AGL0400", fmt.Sprintf("Checking DaemonSets in project '%s'...", project)) | ||
dsList, err := adapter.daemonsets(project, kapi.ListOptions{LabelSelector: loggingInfraFluentdSelector.AsSelector()}) | ||
if err != nil { | ||
r.Error("AGL0405", err, fmt.Sprintf("There was an error while trying to retrieve the logging DaemonSets in project '%s'", project)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would almost certainly be a transient error, would probably be nice to say that and print the error (which would probably be something like "connection reset" or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to include the error. Im not sure how else to identify its transient. These errors as you note I would expect to almost never see.
return | ||
} | ||
if len(dsList.Items) == 0 { | ||
r.Error("AGL0407", err, fmt.Sprintf("There were no DaemonSets in project '%s' that included label '%s'", project, loggingInfraFluentdSelector.AsSelector())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give context on what this implies: The fluentd log collectors are not deployed, or possibly they still have an old version of logging that they need to upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
} | ||
nodeList, err := adapter.nodes(kapi.ListOptions{}) | ||
if err != nil { | ||
r.Error("AGL0410", err, "There was an error while trying to retrieve the list of Nodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the error and note that it's probably transient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
) | ||
|
||
const daemonSetNoLabeledNodes = ` | ||
There are no nodes that match the selector for DaemonSet '%[1]s'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, context for if I don't really know what a Daemonset is or why it matters...
"... so fluentd isn't running and gathering logs from any nodes."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
r.Debug("AGL0435", fmt.Sprintf("Checking for running pods for DaemonSet '%s' with matchLabels '%s'", ds.ObjectMeta.Name, podSelector)) | ||
podList, err := adapter.pods(project, kapi.ListOptions{LabelSelector: podSelector}) | ||
if err != nil { | ||
r.Error("AGL0438", err, fmt.Sprintf("There was an error retrieving pods matched to DaemonSet '%s'", ds.ObjectMeta.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print error, note it's transient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
} | ||
|
||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse the conditional and return from this branch, so that you don't have to indent the main flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
r.Error("AGL0438", err, fmt.Sprintf("There was an error retrieving pods matched to DaemonSet '%s'", ds.ObjectMeta.Name)) | ||
return | ||
} | ||
if len(podList.Items) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it also check that the number of pods is equal to the number of nodes labeled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
r.Error("AGL0443", nil, fmt.Sprintf("The number of deployed pods %s does not match the number of labeled nodes %s", len(podList.Items), numLabeledNodes))
exists := found.Has(entry) | ||
if !exists { | ||
if strings.HasSuffix(entry, "-ops") { | ||
r.Warn("AGL0060", nil, fmt.Sprintf(deploymentConfigWarnMissingForOps, entry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like having warnings for completely correct configurations. Once again I wish that we could determine after the deployment whether there is supposed to be an ops deployment or not. Someone without one shouldn't "get used to" seeing warnings from diagnostics. Maybe we should go down the messy path of checking the fluentd env vars for whether the ops host is different from the regular one.
Otherwise, I'd say this only warrants an Info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to info
return project | ||
} | ||
} | ||
r.Error("AGL0014", errors.New("aggregated logging project name is empty"), "Unable to determine the project from the loggingPublicURL defined in the master config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to find a route matching the loggingPublicURL defined in the master config:
[URL]
Check that the URL is correct and aggregated logging is deployed.
(This may be the only message they see so make it informative)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
||
routeList, err := osClient.Routes(kapi.NamespaceAll).List(kapi.ListOptions{LabelSelector: loggingSelector.AsSelector()}) | ||
if err != nil { | ||
r.Error("AGL0012", err, fmt.Sprintf("There was an error while trying to find the route associated with '%s'", loggingUrl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the error, note it's probably transient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
func retrieveLoggingProject(r types.DiagnosticResult, masterCfg *configapi.MasterConfig, osClient *client.Client) string { | ||
r.Debug("AGL0010", fmt.Sprintf("masterConfig.AssetConfig.LoggingPublicURL: '%s'", masterCfg.AssetConfig.LoggingPublicURL)) | ||
|
||
loggingUrl, err := url.Parse(masterCfg.AssetConfig.LoggingPublicURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before doing this, need to exclude the case where the URL is empty. I should not get an error in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added check of len for 0, added debug message, returned empty
return | ||
} | ||
if len(endpoints.Subsets) == 0 { | ||
r.Warn("AGL0225", nil, fmt.Sprintf("There are no endpoints found for service '%s'. This may not be a problem if the backing pods were not deployed (e.g. ops).", service)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another place that warns needlessly for just not deploying ops... could this be an info for -ops services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a little context... what does this mean to the end user? Need to let them know it means there are no pods serviced by this service, so the component is not functioning.
f4372c5
to
38d3246
Compare
@sosiouxme care to glance at the updates again? |
} | ||
project, err := osClient.Projects().Get(projectName) | ||
if err != nil { | ||
r.Error("AGL0018", err, fmt.Sprintf("There was an error retrieving project '%s' which most likely a transient error: %s", projectName, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which /is/ most likely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
r.Debug("AGL0013", fmt.Sprintf("Comparing URL to route.Spec.Host: %s", route.Spec.Host)) | ||
if loggingUrl.Host == route.Spec.Host { | ||
if len(projectName) == 0 { | ||
projectName := route.ObjectMeta.Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be :=
but just plain =
means Fluentd is not running and is not gathering logs from any nodes. | ||
An example of a command to target a specific node for this DaemonSet: | ||
|
||
oc label node/ip-172-18-2-170.ec2.internal %[2]s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little too specific. Could it be something like:
oc label node/node1.example.com %[2]s
Alternatively, I think we could leave the specific case out and just give the command to label all of them which is most likely to be what we want anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
Try finding and deleting the other route by running the following: | ||
|
||
oc get --all-namespaces routes --template='{{println}}{{range .items}}{{if eq .spec.host "%[2]s"}}{{.metadata.name}}{{println}}{{end}}{{end}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest the slightly more informative:
oc get --all-namespaces routes --template='{{range .items}}{{if eq .spec.host "kibana.example.com"}}{{println .metadata.name "in" .metadata.namespace}}{{end}}{{end}}'
Has output like:
logging-kibana in logging
Also, I think "Try finding and deleting" is a little too imperative. I don't want them doing this blindly. How about something like "If a router has been deployed, look for duplicate matching routes by running the following:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually updated with the exception of removed 'kibana.example.com' in favor of '%[2]s'
38d3246
to
6580d57
Compare
6580d57
to
1fbfe81
Compare
Evaluated for origin test up to 1fbfe81 |
[merge] |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9615/) |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9615/) (Image: devenv-rhel7_5129) |
Evaluated for origin merge up to 1fbfe81 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some more nits, then LGTM.
func checkKibana(r types.DiagnosticResult, osClient *client.Client, kClient *kclient.Client, project string) { | ||
oauthclient, err := osClient.OAuthClients().Get(kibanaProxyOauthClientName) | ||
if err != nil { | ||
r.Error("AGL0115", err, fmt.Sprintf("Error retrieving the OauthClient '%s'. Unable to check Kibana", kibanaProxyOauthClientName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the error in the message
nodeSelector, ok := project.ObjectMeta.Annotations["openshift.io/node-selector"] | ||
if ok && len(nodeSelector) != 0 { | ||
r.Warn("AGL0030", nil, fmt.Sprintf(projectNodeSelectorWarning, projectName)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the annotation is not there then I would like them to get a warning because it leaves the project subject to the master config projectConfig.defaultNodeSelector which could be set now or later to something that would limit fluentd being deployed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !ok {
r.Warn(<new error>);
} else if len(nodeSelector) != 0 {
r.Warn("AGL0030"...)
}
r.Debug("AGL0100", "Checking oauthclient secrets...") | ||
secret, err := kClient.Secrets(project).Get(kibanaProxySecretName) | ||
if err != nil { | ||
r.Error("AGL0105", err, fmt.Sprintf("Error retrieving the secret '%s'", kibanaProxySecretName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the error in the message
} | ||
decoded, err := decodeSecret(secret, oauthSecretKeyName) | ||
if err != nil { | ||
r.Error("AGL0110", err, fmt.Sprintf("Unable to decode Kibana Secret")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the error
r.Debug("AGL0141", "Checking oauthclient redirectURIs for the logging routes...") | ||
routeList, err := osClient.Routes(project).List(kapi.ListOptions{LabelSelector: loggingSelector.AsSelector()}) | ||
if err != nil { | ||
r.Error("AGL0143", err, "Error retrieving the logging routes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put error in message
r.Debug("AGL0300", "Checking routes...") | ||
routeList, err := adapter.routes(project, kapi.ListOptions{LabelSelector: loggingSelector.AsSelector()}) | ||
if err != nil { | ||
r.Error("AGL0305", err, fmt.Sprintf("There was an error retrieving routes in the project '%s' with selector '%s'", project, loggingSelector.AsSelector())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put error in message
if block != nil { | ||
cert, err := x509.ParseCertificate(block.Bytes) | ||
if err != nil { | ||
r.Error("AGL0335", err, fmt.Sprintf("Unable to parse the certificate for route '%s'", route.ObjectMeta.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print error in message; this one is actually reasonably likely to happen at some point...
r.Debug("AGL0355", fmt.Sprintf("Checking certificate matches key for route '%s'", route.ObjectMeta.Name)) | ||
_, err := tls.X509KeyPair([]byte(route.Spec.TLS.Certificate), []byte(route.Spec.TLS.Key)) | ||
if err != nil { | ||
r.Error("AGL0365", err, fmt.Sprintf("Route '%s' key and certficate do not match: %s. The router will be unable to pass traffic using this route.", route.ObjectMeta.Name, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certificate
This PR satisfies: https://trello.com/c/BAwWkEiy
It provides diagnostic check for: