Skip to content
This repository has been archived by the owner on May 18, 2020. It is now read-only.

Traces randomly appearing/disappearing from web UI query #95

Closed
eroji opened this issue Jun 27, 2018 · 4 comments
Closed

Traces randomly appearing/disappearing from web UI query #95

eroji opened this issue Jun 27, 2018 · 4 comments

Comments

@eroji
Copy link

eroji commented Jun 27, 2018

Problem - what in Jaeger blocks you from solving the requirement?

Traces are randomly appearing and disappearing in the web UI if I try to perform searches for all operations.

Any open questions to address

I took the production deployment and made some modifications to suit my needs with a Cassandra cluster as the persistent storage. However, for some reason, the traces that are coming from my applications are randomly and intermittently appearing and disappearing in the web UI. I can't seem to identify what the issue is. Below are the YAML configs I currently have running. They are running within a 'jaeger' namespace, with Nginx ingress for the query service. The agent pods are accessed via NodePort currently as we work on migrating our application to the k8s cluster and convert them into microservices.

Collector:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
  creationTimestamp: 2018-06-13T06:12:38Z
  generation: 3
  labels:
    app: jaeger
    jaeger-infra: collector-deployment
  name: jaeger-collector
  namespace: jaeger
  resourceVersion: "6435888"
  selfLink: /apis/apps/v1/namespaces/jaeger/deployments/jaeger-collector
  uid: c5dd174c-6ed0-11e8-8721-0050568f492d
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: jaeger
      jaeger-infra: collector-pod
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jaeger
        jaeger-infra: collector-pod
    spec:
      containers:
      - command:
        - /go/bin/collector-linux
        - --config-file=/conf/collector.yaml
        env:
        - name: SPAN_STORAGE_TYPE
          valueFrom:
            configMapKeyRef:
              key: span-storage-type
              name: jaeger-configuration
        image: jaegertracing/jaeger-collector:1.5.0
        imagePullPolicy: IfNotPresent
        name: jaeger-collector
        ports:
        - containerPort: 14267
          protocol: TCP
        - containerPort: 14268
          protocol: TCP
        - containerPort: 9411
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: jaeger-configuration-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: collector
            path: collector.yaml
          name: jaeger-configuration
        name: jaeger-configuration-volume
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-06-13T06:12:38Z
    lastUpdateTime: 2018-06-27T20:09:16Z
    message: ReplicaSet "jaeger-collector-5c9757d497" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2018-06-27T20:10:04Z
    lastUpdateTime: 2018-06-27T20:10:04Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 3
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Query:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: 2018-06-13T06:12:38Z
  generation: 2
  labels:
    app: jaeger
    jaeger-infra: query-deployment
  name: jaeger-query
  namespace: jaeger
  resourceVersion: "6435897"
  selfLink: /apis/apps/v1/namespaces/jaeger/deployments/jaeger-query
  uid: c5f584e0-6ed0-11e8-8721-0050568f492d
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: jaeger
      jaeger-infra: query-pod
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jaeger
        jaeger-infra: query-pod
    spec:
      containers:
      - command:
        - /go/bin/query-linux
        - --config-file=/conf/query.yaml
        env:
        - name: SPAN_STORAGE_TYPE
          valueFrom:
            configMapKeyRef:
              key: span-storage-type
              name: jaeger-configuration
        image: jaegertracing/jaeger-query:1.5.0
        imagePullPolicy: IfNotPresent
        name: jaeger-query
        ports:
        - containerPort: 16686
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 16686
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: jaeger-configuration-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: query
            path: query.yaml
          name: jaeger-configuration
        name: jaeger-configuration-volume
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-06-13T06:12:38Z
    lastUpdateTime: 2018-06-27T20:09:41Z
    message: ReplicaSet "jaeger-query-bfc859864" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2018-06-27T20:10:04Z
    lastUpdateTime: 2018-06-27T20:10:04Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Agent:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "3"
  creationTimestamp: 2018-06-13T06:12:38Z
  generation: 3
  labels:
    app: jaeger
    jaeger-infra: agent-daemonset
  name: jaeger-agent
  namespace: jaeger
  resourceVersion: "6435952"
  selfLink: /apis/apps/v1/namespaces/jaeger/daemonsets/jaeger-agent
  uid: c607e875-6ed0-11e8-8721-0050568f492d
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: jaeger
      jaeger-infra: agent-instance
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jaeger
        jaeger-infra: agent-instance
    spec:
      containers:
      - command:
        - /go/bin/agent-linux
        - --config-file=/conf/agent.yaml
        image: jaegertracing/jaeger-agent:1.5.0
        imagePullPolicy: IfNotPresent
        name: agent-instance
        ports:
        - containerPort: 5775
          hostPort: 5775
          protocol: UDP
        - containerPort: 6831
          hostPort: 6831
          protocol: UDP
        - containerPort: 6832
          hostPort: 6832
          protocol: UDP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: jaeger-configuration-volume
      dnsPolicy: ClusterFirstWithHostNet
      hostNetwork: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: agent
            path: agent.yaml
          name: jaeger-configuration
        name: jaeger-configuration-volume
  updateStrategy:
    type: OnDelete
status:
  currentNumberScheduled: 6
  desiredNumberScheduled: 6
  numberAvailable: 6
  numberMisscheduled: 0
  numberReady: 6
  observedGeneration: 3
  updatedNumberScheduled: 6

Cassandra:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  creationTimestamp: 2018-06-20T22:02:16Z
  generation: 3
  labels:
    app: jaeger
  name: cassandra
  namespace: jaeger
  resourceVersion: "6265495"
  selfLink: /apis/apps/v1/namespaces/jaeger/statefulsets/cassandra
  uid: 987b5a14-74d5-11e8-8721-0050568f492d
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cassandra
  serviceName: cassandra
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: cassandra
        jaeger-infra: cassandra-replica
    spec:
      containers:
      - command:
        - /docker-entrypoint.sh
        - -R
        env:
        - name: MAX_HEAP_SIZE
          value: 1024M
        - name: HEAP_NEWSIZE
          value: 256M
        - name: CASSANDRA_LISTEN_ADDRESS
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: CASSANDRA_CLUSTER_NAME
          value: jaeger
        - name: CASSANDRA_DC
          value: dc1
        - name: CASSANDRA_RACK
          value: rack1
        - name: CASSANDRA_ENDPOINT_SNITCH
          value: GossipingPropertyFileSnitch
        - name: CASSANDRA_SEEDS
          value: cassandra-0.cassandra
        image: cassandra:3.11
        imagePullPolicy: Always
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - nodetool drain
        name: cassandra
        ports:
        - containerPort: 7000
          name: intra-node
          protocol: TCP
        - containerPort: 7001
          name: tls-intra-node
          protocol: TCP
        - containerPort: 7199
          name: jmx
          protocol: TCP
        - containerPort: 9042
          name: cql
          protocol: TCP
        - containerPort: 9160
          name: thrift
          protocol: TCP
        resources: {}
        securityContext:
          capabilities:
            add:
            - IPC_LOCK
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /cassandra_data
          name: cassandra-data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1800
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: cassandra-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
      storageClassName: fast
    status:
      phase: Pending
status:
  collisionCount: 0
  currentReplicas: 3
  currentRevision: cassandra-64c697f78
  observedGeneration: 3
  readyReplicas: 3
  replicas: 3
  updateRevision: cassandra-64c697f78

Services:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-06-13T06:12:38Z
  labels:
    app: jaeger
    jaeger-infra: collector-service
  name: jaeger-collector
  namespace: jaeger
  resourceVersion: "3682379"
  selfLink: /api/v1/namespaces/jaeger/services/jaeger-collector
  uid: c5e60b13-6ed0-11e8-8721-0050568f492d
spec:
  clusterIP: 10.107.96.143
  ports:
  - name: jaeger-collector-tchannel
    port: 14267
    protocol: TCP
    targetPort: 14267
  - name: jaeger-collector-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: jaeger-collector-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  selector:
    jaeger-infra: collector-pod
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-06-13T05:42:18Z
  labels:
    app: jaeger
    jaeger-infra: agent-service
  name: jaeger-agent
  namespace: jaeger
  resourceVersion: "5481075"
  selfLink: /api/v1/namespaces/jaeger/services/jaeger-agent
  uid: 88c5ad8f-6ecc-11e8-8721-0050568f492d
spec:
  clusterIP: 10.107.104.15
  externalTrafficPolicy: Cluster
  ports:
  - name: jaeger-agent-udp
    nodePort: 30831
    port: 6831
    protocol: UDP
    targetPort: 6831
  selector:
    jaeger-infra: agent-instance
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: NodePort
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-06-11T22:16:44Z
  labels:
    app: jaeger
    jaeger-infra: query-service
  name: jaeger-query
  namespace: jaeger
  resourceVersion: "3436616"
  selfLink: /api/v1/namespaces/jaeger/services/jaeger-query
  uid: 1feaf2b1-6dc5-11e8-8721-0050568f492d
spec:
  clusterIP: 10.101.239.12
  ports:
  - name: jaeger-query
    port: 80
    protocol: TCP
    targetPort: 16686
  selector:
    jaeger-infra: query-pod
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
@eroji eroji changed the title Traces randomly disappearing from web UI query Traces randomly appearing/disappearing from web UI query Jun 27, 2018
@jpkrohling
Copy link
Collaborator

Are they really disappearing (not in the database anymore), or are the results just listed in a different order? You can confirm this by loading the page, opening a few tabs with the first results, then reloading the page. Once one of the first results are missing, refresh the tab where that trace was opened. If you can still load the trace, then it's just the order of the results on the first page that is "wrong".

We've seen this before with Cassandra, where the ordering of the results seem odd and non consistent between page loads. Also, is this problem happening only when you have Jaeger deployed on Kubernetes, or do you think this would also happen outside? If it's more generic, I'd move this issue to the main repository, as this here is only for the Jaeger templates for Kubernetes.

@eroji
Copy link
Author

eroji commented Jun 28, 2018

This can be closed. My trying to frankenstein the setup seemed to be the root of the issue. I leveraged the Helm chart and it appears to be working correctly now. Thank you.

@eroji eroji closed this as completed Jun 28, 2018
@jpkrohling
Copy link
Collaborator

Would it be OK for you to share what exactly went wrong? Someone in the future might face the same problem and it would be valuable to them if you share what was wrong and how you fixed it :)

@eroji
Copy link
Author

eroji commented Jun 29, 2018

I am not entirely sure what was causing the strange behavior, but I my initial setup was dropped traces. Of the ones Jaeger did receive, they were incomplete in many cases.

I think confusion was also created with the "all" operation search with 20 results as the default. When I tried to search, the results returned for the last hour was either none or incomplete when my previous deployment. With Helm chart, this is better now. Items look complete, however I still don't seem to get 20 results. I'm not sure if this is intended.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants