Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log tagging for aggregation #1828

Merged
merged 7 commits into from
Aug 4, 2022

Conversation

2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Jul 21, 2022

Add support for a hive.openshift.io/additional-log-fields annotation
on hive-owned CRs. If present, it is parsed as a JSON map containing
arbitrary key/value pairs which are included in all log lines for
controllers handling the CR. They are also appended to log lines from installer pods.

The assumption is that this is being used for log aggregation by an external consumer that needs to know which logs came from which component. Thus, if the value of the annotation is nonempty and parseable, we will add component=hive to whatever it contains when logging from controllers; and component=installer when decorating installer logs.

PROVISOS [1]:

  • Our controllers currently reconcile on all and only the following
    objects (so annotating others will have no effect):

    • ClusterClaim
    • ClusterDeployment
    • ClusterDeprovision
    • ClusterPool
    • ClusterProvision
    • DNSZone
    • FakeClusterInstall
    • MachinePool
      The clusterpoolnamespace controller reconciles on Namespace; but we
      do not support these annotations there.
  • Log fields are added to the logger for a given controller once it has
    loaded the CR. So there may be a couple of log messages that will be
    missed. They should be unimportant for consumers -- things like the
    initial "Reconciling $namespace/$name".

  • The hive code generates CRs in certain circumstances.

    • ClusterProvision and ClusterDeprovision are generated when the CD is
      created or deleted, respectively. They inherit the annotation at the
      time they are generated
      . If you want the fields updated after that,
      you must edit the ClusterProvision/ClusterDeprovision object directly.
    • DNSZone can be generated if not already present for clusters with
      managed DNS. The clusterdeployment controller will generate the
      DNSZone with the annotation from the CD, but will also sync it from
      the CD on each reconcile.
    • The ClusterPool controller generates ClusterDeployments. We do not
      copy the annotation from the ClusterPool.Metadata; you would have to
      put it into ClusterPool.Spec.Annotations instead.
    • ClusterDeployment and MachinePool are generated by hiveutil create cluster. You can use the -a/--annotations flag to initialize the
      ClusterDeployment with the annotation [2]. The machinepool controller
      will sync it from the CD to the MachinePool on each reconcile.
    • We currently do not support passing (any) annotations into
      hiveutil clusterpool create-pool or hiveutil clusterpool claim. In
      the future we may wish to add -a/--annotations like we have for
      create cluster.

[1] https://www.youtube.com/watch?v=pb0PuaikL4w
[2] The CLI syntax is a bit tricky. This wfm:
--annotations '"hive.openshift.io/additional-log-fields={""HELLO"":""WORLD""}"'

HIVE-1966

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2022
@2uasimojo
Copy link
Member Author

/cc @m1kola

/hold -- need to add a commit to plumb the fields into the installer.

@openshift-ci openshift-ci bot requested a review from m1kola July 21, 2022 23:52
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2022
@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #1828 (2effe85) into master (71e4b50) will increase coverage by 0.22%.
The diff coverage is 68.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1828      +/-   ##
==========================================
+ Coverage   41.59%   41.81%   +0.22%     
==========================================
  Files         362      363       +1     
  Lines       34120    34141      +21     
==========================================
+ Hits        14192    14277      +85     
+ Misses      18725    18657      -68     
- Partials     1203     1207       +4     
Impacted Files Coverage Δ
contrib/pkg/deprovision/alibabacloud.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/awstagdeprovision.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/azure.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/gcp.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/ibmcloud.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/openstack.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/ovirt.go 0.00% <0.00%> (ø)
contrib/pkg/deprovision/vsphere.go 0.00% <0.00%> (ø)
pkg/constants/constants.go 100.00% <ø> (ø)
pkg/controller/dnszone/dnszone_controller.go 28.70% <0.00%> (-0.09%) ⬇️
... and 32 more

@2uasimojo 2uasimojo changed the title Add log tagging Add log tagging for aggregation Jul 22, 2022
@2uasimojo
Copy link
Member Author

/cc @patrickdillon @rna-afk

Last commit has the installer plumbing. PTAL.

@2uasimojo
Copy link
Member Author

TODO: Deprovision pod logs??

@patrickdillon
Copy link
Contributor

Approach looks sane to me. I have opened openshift/enhancements#1197 to discuss log-config.yaml. openshift/installer#6161 provides an implementation which could be used for testing.

TODO: Deprovision pod logs??

FWIW logs from the installer in openshift-install destroy cluster would be decorated the same as all other logs

@2uasimojo
Copy link
Member Author

Approach looks sane to me.

Thanks for the look!

I have opened openshift/enhancements#1197 to discuss log-config.yaml.

👀

openshift/installer#6161 provides an implementation which could be used for testing.

Cool, building...

TODO: Deprovision pod logs??

FWIW logs from the installer in openshift-install destroy cluster would be decorated the same as all other logs

Thanks. It looks like we're invoking the go code directly -- but we're doing so with a logger that I can decorate up front. Just have to decide how to pass the tags into hiveutil. Considering using the same log-config.yaml...

@2uasimojo
Copy link
Member Author

TODO: Deprovision pod logs??

TODO:

  • Finish logtagger UT.
  • Squash out "Plumb additional log fields into installer".
  • Consider moving latest logtagger impl to "Add controller log tagging for aggregation". Maybe.

@2uasimojo
Copy link
Member Author

Tested this (at 0e6eb95) with openshift/installer#6161 (at 0dc2623) and everything seems to be working as expected. Snippet of provision pod logs:

time="2022-07-27T19:09:09Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping."
time="2022-07-27T19:09:11Z" level=debug msg="checking for SSH private key" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="unable to initialize host ssh key" HELLO=WORLD component=hive error="cannot configure SSH agent as SSH_PRIV_KEY_PATH is unset or empty" installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="found file" HELLO=WORLD component=hive installID=j9b6rqxp path=/output/openshift-install
time="2022-07-27T19:09:11Z" level=info msg="found file" HELLO=WORLD component=hive installID=j9b6rqxp path=/output/oc
time="2022-07-27T19:09:11Z" level=info msg="all files found, ready to proceed" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="copied /output/openshift-install to /home/hive/openshift-install" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="copied /output/oc to /home/hive/oc" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="copying install-config.yaml" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="waiting for files to be available: [/output/.openshift_install.log]" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="generating assets" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="running openshift-install create manifests" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="writing log-config.yaml" HELLO=WORLD component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="running openshift-install binary" HELLO=WORLD args="[create manifests]" component=hive installID=j9b6rqxp
time="2022-07-27T19:09:11Z" level=info msg="found file" HELLO=WORLD component=hive installID=j9b6rqxp path=/output/.openshift_install.log
time="2022-07-27T19:09:11Z" level=info msg="all files found, ready to proceed" HELLO=WORLD component=hive installID=j9b6rqxp
### [efried] This is where hiveutil has invoked the installer. Notice `component=hive` above, `component=installer` below ###
time="2022-07-27T19:09:11Z" level=debug msg="OpenShift Installer unreleased-master-6266-ge9ecb4596bc40370179a16539a6783e83de463b0-dirty" HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="Built from commit e9ecb4596bc40370179a16539a6783e83de463b0" HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="Fetching Master Machines..." HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="Loading Master Machines..." HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="  Loading Cluster ID..." HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="    Loading Install Config..." HELLO=WORLD component=installer
time="2022-07-27T19:09:11Z" level=debug msg="      Loading SSH Key..." HELLO=WORLD component=installer
...

@2uasimojo
Copy link
Member Author

UT is WIP

@2uasimojo
Copy link
Member Author

Still not quiiiite done with UT

Add support for a `hive.openshift.io/additional-log-fields` annotation
on hive-owned CRs. If present, it is parsed as a JSON map containing
arbitrary key/value pairs which are included in all log lines for
controllers handling the CR.

If the value of the annotation is nonempty and parseable, we will add
`component=hive` to whatever it contains. (The assumption is that this
is being used for log aggregation by an external consumer that needs to
know that hive's logs came from hive.)

PROVISOS [1]:

- Our controllers currently reconcile on all and only the following
objects (so annotating others will have no effect):
  - ClusterClaim
  - ClusterDeployment
  - ClusterDeprovision
  - ClusterPool
  - ClusterProvision
  - DNSZone
  - FakeClusterInstall
  - MachinePool
  The clusterpoolnamespace controller reconciles on Namespace; but we
  *do not* support these annotations there.

- Log fields are added to the logger for a given controller once it has
loaded the CR. So there may be a couple of log messages that will be
missed. They should be unimportant for consumers -- things like the
initial "Reconciling $namespace/$name".

- The hive code *generates* CRs in certain circumstances.
  - ClusterProvision and ClusterDeprovision are generated when the CD is
  created or deleted, respectively. They inherit the annotation *at the
  time they are generated*. If you want the fields updated after that,
  you must edit the ClusterProvision/ClusterDeprovision object directly.
  - DNSZone can be generated if not already present for clusters with
  managed DNS. The clusterdeployment controller will generate the
  DNSZone with the annotation from the CD, but will also *sync* it from
  the CD on each reconcile.
  - The ClusterPool controller generates ClusterDeployments. We do *not*
  copy the annotation from the ClusterPool.Metadata; you would have to
  put it into ClusterPool.Spec.Annotations instead.
  - ClusterDeployment and MachinePool are generated by `hiveutil create
  cluster`. You can use the `-a/--annotations` flag to initialize the
  ClusterDeployment with the annotation [2]. The machinepool controller
  will *sync* it from the CD to the MachinePool on each reconcile.
  - We currently do *not* support passing (any) annotations into
  `hiveutil clusterpool create-pool` or `hiveutil clusterpool claim`. In
  the future we may wish to add `-a/--annotations` like we have for
  `create cluster`.

[1] https://www.youtube.com/watch?v=pb0PuaikL4w
[2] The CLI syntax is a bit tricky. This wfm:
`--annotations '"hive.openshift.io/additional-log-fields={""HELLO"":""WORLD""}"'`

HIVE-1966
In preparation for needing to plumb some data from the ClusterProvision
into the invocations of the openshift-install command, add a new field
to InstallManager containing the actual ClusterProvision object. This
allows us to change a bunch of function prototypes to remove it as an
argument, as those functions can get it from the InstallManager argument
they already accept (or are methods on).

HIVE-1966
Get rid of yellow squiggly lines in my IDE.
A previous commit introduced support for adding fields specified via the
`hive.openshift.io/additional-log-fields` annotation to all (well, most)
log lines dealing with the CR containing the annotation.

This commit makes the `provision` pod subsume those values and
- add them to log lines produced by install-manager commands running in
that pod
- append them to each line (including redacted ones) of the install log,
overriding `component` to `installer`

Note that, if the annotation is specified, log lines from hive will
include `component=hive`; but we will configure the installer's
`log-config.yaml` with `component=installer`.

HIVE-1966
In preparation for supporting additional log fields for aggregation,
factor logger setup for hiveutil subcommands into a common utility
function so we can do that in one place.

HIVE-1966
- The `hiveutil` executable is changed to look for and process an
environment variable, `HIVE_ADDITIONAL_LOG_FIELDS`. This a JSON string
representing a flat map of key/value pairs to be added to all log lines
produced by `hiveutil` (and passed through to the installer for the same
purpose).
- Each instance where hive code invokes `hiveutil` via `Job`s is wired
to copy such JSON from the `hive.openshift.io/additional-log-fields`
annotation on the object on which it is operating, to the env of the
`Job` that will invoke `hiveutil`.

HIVE-1966
HIVE-1966
@2uasimojo
Copy link
Member Author

Updated to cut the installer log config out and instead wrap it and decorate its logs on the fly. Live test checks out. Snippet from controller logs:

time="2022-08-03T16:03:15.981Z" level=debug msg="loading pull secrets" HELLO=WORLD clusterDeployment=dnstimeout/logtest component=hive controller=clusterDeployment reconcileID=jbjcdfdq
time="2022-08-03T16:03:15.981Z" level=debug msg="Existing and the new merged pull secret are same" HELLO=WORLD clusterDeployment=dnstimeout/logtest component=hive controller=clusterDeployment reconcileID=jbjcdfdq
time="2022-08-03T16:03:15.981Z" level=debug msg="Controller expectations fulfilled &utils.ControlleeExpectations{add:0, del:0, key:\"dnstimeout/logtest\", timestamp:time.Time{wall:0xc0b2c4ea67cd56c4, ext:10702893077868, loc:(*time.Location)(0x7ba0940)}}" controller=clusterDeployment controllerKey=dnstimeout/logtest
time="2022-08-03T16:03:15.981Z" level=debug msg="reconciling existing provision" HELLO=WORLD clusterDeployment=dnstimeout/logtest component=hive controller=clusterDeployment provision=logtest-0-9tckq reconcileID=jbjcdfdq
time="2022-08-03T16:03:15.981Z" level=debug msg="still provisioning" HELLO=WORLD clusterDeployment=dnstimeout/logtest component=hive controller=clusterDeployment provision=logtest-0-9tckq reconcileID=jbjcdfdq
time="2022-08-03T16:03:15.981Z" level=info msg="reconcile complete" clusterDeployment=dnstimeout/logtest controller=clusterDeployment elapsedMillis=0 elapsedMillisGT=0 outcome=unspecified reconcileID=jbjcdfdq

Snippet from installer logs:

time="2022-08-03T16:03:14Z" level=debug msg="  Fetching Certificate (mcs)..." HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="  Reusing previously-fetched Certificate (mcs)" HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="  Fetching Root CA..." HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="  Reusing previously-fetched Root CA" HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="  Fetching Key Pair (service-account.pub)..." HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="  Generating Key Pair (service-account.pub)..." HELLO=WORLD component=installer
time="2022-08-03T16:03:14Z" level=debug msg="Failed to marshal PKIX public key: %!s(<nil>)" HELLO=WORLD component=installer

@2uasimojo
Copy link
Member Author

/hold cancel

ready for review

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2022
@2uasimojo
Copy link
Member Author

/retest-required

Copy link
Member

@abutcher abutcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid and tests are thorough. 👍

/lgtm

@@ -30,6 +30,7 @@ func cleanupDNSZone(dynClient client.Client, cd *hivev1.ClusterDeployment, logge
dnsZoneNamespacedName := types.NamespacedName{Namespace: cd.Namespace, Name: controllerutils.DNSZoneName(cd.Name)}
if err := dynClient.Get(context.TODO(), dnsZoneNamespacedName, dnsZone); err != nil {
logger.WithError(err).Error("error looking up managed dnszone")
// TODO: Return the err?!?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find.

log "github.com/sirupsen/logrus"
)

type AdditionalLogFieldHavinThing interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Would consider naming this something like AdditionalLogFieldAnnotatedObject but I am completely fine with AdditionalLogFieldHavinThing as it conveys the same info to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not the most formal/professional name, but it's an internal thing.

I'd just like to note (cc @dlom) that this is another example of using generics to increase the #LOC.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, abutcher

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 2 against base HEAD f6b7a1c and 8 for PR HEAD 2effe85 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 1 against base HEAD f6b7a1c and 7 for PR HEAD 2effe85 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f6b7a1c and 6 for PR HEAD 2effe85 in total

@2uasimojo
Copy link
Member Author

/retest

3 similar comments
@2uasimojo
Copy link
Member Author

/retest

@2uasimojo
Copy link
Member Author

/retest

@2uasimojo
Copy link
Member Author

/retest

@2uasimojo
Copy link
Member Author

/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2022

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 8350880 into openshift:master Aug 4, 2022
@2uasimojo 2uasimojo deleted the HIVE-1966/aro-log-tagging branch August 4, 2022 22:20
m1kola added a commit to m1kola/ARO-RP that referenced this pull request Aug 5, 2022
New version includes support for log decoration with extra fields
(see openshift/hive#1828 for more details).
m1kola added a commit to m1kola/ARO-RP that referenced this pull request Aug 5, 2022
New version includes support for log decoration with extra fields
For more details see:

* Azure#2284
* openshift/hive#1828
m1kola added a commit to Azure/ARO-RP that referenced this pull request Aug 5, 2022
New version includes support for log decoration with extra fields
For more details see:

* #2284
* openshift/hive#1828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants