Skip to content

Conversation

JStickler
Copy link
Contributor

@JStickler JStickler commented Nov 16, 2020

This PR documents how to connect the service mesh control plane to an existing Jaeger instance.
Adds a topic for enabling tracing.
Adds a topic for Jaeger configuration in 2.0.
Updates the note in the stand alone Jaeger docs about how to configure with SM.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 16, 2020
@openshift-docs-preview-bot

The preview will be available shortly at:

@JStickler JStickler force-pushed the OSSMDOC-153 branch 3 times, most recently from c6ad834 to 1ceda05 Compare November 16, 2020 21:51
@JStickler
Copy link
Contributor Author

@objectiser

Copy link

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, but this is primarily Service Mesh docs for configuring Jaeger, so needs to be reviewed/approved by that team.
cc @rcernich

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"value of 'name' exists" - may need to add "within the same namespace as the SMCP" or something like that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to explicitly state this or not, but using an existing jaeger resource is only supported for v2.0 control planes, i.e. not for spec.version=v1.0 or spec.version=v1.1.

Copy link
Contributor Author

@JStickler JStickler Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcernich , are you sure? It's been documented on the Maistra web site since April and I'm just now getting around to adding it to the service mesh docs.
https://maistra.io/docs/monitoring_and_tracing/using-existing-jaeger/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes/no. We do support it, but not by specifying the jaeger name alone. That said, I'm not sure the v1 method will work with the 2.0 operator (see my other comment about how this might be broken in 1.1). Really, what I meant to say was that referencing an existing jaeger resource only works with 2.0 installs. 1.x installs would need to go about specifying the zipkin address and other settings, as before. Having said all that, in my brief look through the code, that may not work anymore in 2.0, but that was just a brief scanning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration for 1.1 is in a separate PR -> #27263

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the documentation is correct, but I think there may be an issue with the 2.0 operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if I need to add a "known issue" to the release notes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments will be obsolete as soon as we have a release that includes the fix for using external jaeger with 1.1 control planes, and it will work using either the v2 method (set addons.jaeger.name to an existing jaeger cr) or using the old v1 method (specifying a number of fields). Here's the related pr, maistra/istio-operator#629

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to explicitly state this or not, but using an existing jaeger resource is only supported for v2.0 control planes, i.e. not for spec.version=v1.0 or spec.version=v1.1.

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 18, 2020
@JStickler JStickler requested a review from rcernich November 20, 2020 18:51
@JStickler
Copy link
Contributor Author

I'd like to try to get this approved and merged ASAP, since we keep getting questions about how to configure Jaeger with the new v2 SMCP. @objectiser , @rcernich

@objectiser
Copy link

I think it will need to be @rcernich decision as it is about the SMCP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments will be obsolete as soon as we have a release that includes the fix for using external jaeger with 1.1 control planes, and it will work using either the v2 method (set addons.jaeger.name to an existing jaeger cr) or using the old v1 method (specifying a number of fields). Here's the related pr, maistra/istio-operator#629

@FilipB
Copy link

FilipB commented Dec 3, 2020

@JStickler Could you please add links to chapters in preview docs where we can see the changes? Thank you.

@JStickler
Copy link
Contributor Author

@FilipB , the majority of the changes are in the topic for configuring the Custom Resource (AKA the Control Plane)
https://ossmdoc-153--ocpdocs.netlify.app/openshift-enterprise/latest/service_mesh/v2x/ossm-custom-resources.html#configuring-distributed-tracing
(FYI, I had to include some of the v1x files from the PR for the 1x configuration to get the build to pass for this PR, but those can be reviewed when #27423 is ready for review).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the default sampling rate. The default value is actually 100 (1%).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value sampling;10000 is what I'm seeing in the OpenShift console when creating a v2 SMCP. Is that a bug?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh. Well, not a bug, but not the default value, but a default value. The resource in the console may be initialized to that, but if it is unspecified, e.g. if the user deletes that field, the value used is 100. I don't think this is an issue with the documentation, just may be confusing when the heading for the example includes the word, "default." I'm okay with the changes, as they are, hence my approving this PR.

Copy link

@FilipB FilipB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any link to an example of Jaeger configuration for this external Jaeger setup. Maybe it should be added the same way as for 1.1. It's the "Example Jaeger resource" part from https://ossmdoc-152--ocpdocs.netlify.app/openshift-enterprise/latest/service_mesh/v1x/ossm-custom-resources.html#ossm-configuring-jaeger-existing-v1x_ossm-controler-items-v1x

Copy link

@FilipB FilipB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this jaeger configuration:

apiVersion: jaegertracing.io/v1
kind: "Jaeger"
metadata:
  name: "external-jaeger"
  # Deploy to the Control Plane Namespace
  namespace: istio-system
spec:
  # Set Up Authentication
  ingress:
    enabled: true
    security: oauth-proxy
    openshift:
      # This limits user access to the Jaeger instance to users who have access
      # to the control plane namespace. Make sure to set the correct namespace here
      sar: '{"namespace": "istio-system", "resource": "pods", "verb": "get"}'
      htpasswdFile: /etc/proxy/htpasswd/auth

  volumeMounts:
  - name: secret-htpasswd
    mountPath: /etc/proxy/htpasswd
  volumes:
  - name: secret-htpasswd
    secret:
      secretName: htpasswd

It's not referenced from Service Mesh 2.x chapter (only in 1.x) and I believe it's required to correctly configure the external jaeger. @rcernich could you please confirm?

@JStickler JStickler requested a review from rcernich January 4, 2021 21:18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The procedure in this module describes how to replace the allinone parameter with production elasticsearch. Can this title be more descriptive about that? It seems like you're configuring elasticsearch and Jaeger for a production deployment? I think that would be a more descriptive title.

I think it would be better to start this module with the paragraph from configuring elasticsearch. Then move the the procedure from the bottom under that. I think you could add the default allinone Jaeger parameters to that proceedure in step 7 as an example.

If you still need the default elasticsearch examples, you can add them after the procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you just look at the files and not view the preview? This section follows the existing format in the Custom Resources chapter but just updates the content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the files and the preview. There may be a format that exists, but from my perspective, I think that changing that format around would improve the readability of this content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this deployment example is right in the custom resources. Shouldn't it be added to the deployment assembly instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updating the Service Mesh custom resource assembly, not the Deploying Jaeger assembly (which is for stand alone Jaeger). Again I'm thinking that maybe you reviewed the individual files and didn't look at the preview to see how the information fits together in the assembly??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked at the preview and the files. As a customer, I don't think I would come to the Custom Resources section to find deployment information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that configuring distributed tracing describes what this is about. Is this use case about configuring an external Jaeger instance? If so, and I have an external Jaeger instance how do I search for that? I guess I'm a little confused because if you're using the default Jaeger, there isn't much configuration to do. For the external Jaeger scenario, there is more configuration but I don't see any of those keywords. Connecting to an existing Jaeger instance? Configuring distributed tracing outside of OSSM?

If they want a default Jaeger config, they have it already from the installation docs. I think this heading should more specifically speak to what steps you need to take if you want to set up an external instance.

I also think this might not be best in the custom resources assembly. If it's stepwise configuration information it might be better in the configuration assembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm confused, we no longer have a configuration assembly in OSSM, we moved that information into the Custom Resource assembly.

Yes the overall use case for this PR is how to point to an external Jaeger instance, which includes updating how to configure Jaeger in the new SMCP and pointing out that you MUST point to an external Jaeger instance for a streaming deployment (because you cannot configure streaming in the new SMCP). The file that actually supports the "external jaeger" use case is ossm-specifying-jaeger-configuration. Everything else in this PR is updating the Jaeger configuration docs for the SMCP v2.0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of "external jaeger," we should use, "user managed jaeger."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that configuration documentation should go in the configuration assembly. I moved this content to the custom resources assembly because it seemed more like reference documentation than configuration documentation.

I understand this is updating something that already exists. I'm not sure the content about how to enable the default Jaeger needs more explanation here. I think it would be stronger if it focused on the user managed jaeger content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like the rest of the product, the default Jaeger configuration is NOT suitable for production and must be configured before you deploy to a production environment. Which is why I didn't understand why you considered the configuration information "reference" material. But at the time, pulling that information out of the Customizing the Installation assembly wasn't something I was completely opposed to, as long as the reference section came after the Customization assembly (which it doesn't). If we're going to revist how we present the configuration and deployment information, I'd really like to do that in another PR, because the Jaeger configuration information in the docs has been incorrect ever since the v2 release on Nov 4. I really just want to get this information published.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the files and the preview. There may be a format that exists, but from my perspective, I think that changing that format around would improve the readability of this content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked at the preview and the files. As a customer, I don't think I would come to the Custom Resources section to find deployment information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that configuration documentation should go in the configuration assembly. I moved this content to the custom resources assembly because it seemed more like reference documentation than configuration documentation.

I understand this is updating something that already exists. I'm not sure the content about how to enable the default Jaeger needs more explanation here. I think it would be stronger if it focused on the user managed jaeger content.

Copy link

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, apart from the discussion about the content organization.

[id="ossm-enabling-tracing_{context}"]
= Enabling and disabling tracing

You enable tracing by specifying a tracing type and a sampling rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first sentence says enable tracing, but the next sentence after the example says that's done by default. So do you need to enable tracing?

The next describes how you disable it. If you need to disable it, it's not clear why as part of the larger assembly you would want to and how it connects to the rest of the content.


Currently the only tracing type that is supported is `Jaeger`. Jaeger is enabled by default. To disable tracing, set `type` to `None`.

The sampling rate determines how often a trace is generated. You configure `sampling` as a scaled integer representing 0.01% increments. For example setting the value to `1` samples 0.01% of traces and a setting of `10000` samples 100% of traces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does sampling more traces get you?

@neal-timpe neal-timpe merged commit f60c589 into openshift:master Jan 25, 2021
@neal-timpe
Copy link
Contributor

neal-timpe commented Jan 25, 2021

/cherry-pick enterprise-4.6

@neal-timpe
Copy link
Contributor

neal-timpe commented Jan 25, 2021

/cherry-pick enterprise-4.7

@openshift-cherrypick-robot
Copy link

openshift-cherrypick-robot commented Jan 25, 2021

@neal-timpe: new pull request created: #28825

In response to this:

/cherry-pick enterprise-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot
Copy link

openshift-cherrypick-robot commented Jan 25, 2021

@neal-timpe: new pull request created: #28826

In response to this:

/cherry-pick enterprise-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bergerhoffer
Copy link
Contributor

Hi @neal-timpe, can you please squash your commits in your PRs to 1 commit before merging? Thanks!

@bergerhoffer
Copy link
Contributor

Apologies, I meant @JStickler. I looked at the merger not the owner :)

@neal-timpe
Copy link
Contributor

@bergerhoffer Yes, sorry. I missed it in peer review this time. I know to check, but made a mistake. :-(

@JStickler
Copy link
Contributor Author

@bergerhoffer ah phooey, that's my mistake. Sorry about that. =(

@bergerhoffer
Copy link
Contributor

@JStickler @neal-timpe Not sure if you guys have ever looked at using git commend, but it saves you from ever having to squash. When you use that you don't ever have more than 1 commit, so you don't have to worry about squashing at the end. Just ping me if you have any questions on it if you're interested.

@JStickler
Copy link
Contributor Author

@bergerhoffer I love git commend, but I've had requests from my devs and QE not to use it because it makes subsequent reviews harder for them.

@bergerhoffer
Copy link
Contributor

@JStickler ack. Typically though you should be able to click on the "force-pushed" link when you do comment/force push to the PR, which adds an entry like below:

 @bergerhoffer bergerhoffer force-pushed the bergerhoffer:BZ-1907402-startup-probes branch from 7b6ecae to e7dbb96 8 days ago 

And then the "force-pushed" word is a link that shows you just the diff of what you just pushed. Just in case they weren't aware of that option.

@JStickler JStickler deleted the OSSMDOC-153 branch July 28, 2021 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

service-mesh Label for all Service Mesh PRs size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.