[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

Stevenpc3 · 2024-05-22T15:44:00Z

What happened?

As a user of the "system architecture" tab in Jaeger I would like to use Spark to generate the diagrams.

But spark no longer works with new charts that use elasticsearch 8+

Spark job throws logs stating that it requires elasticsearch 7 and hadoop.

Steps to reproduce

Use the new charts to deploy using elasticsearch 8+
produce traces that go to elasticsearch
run a spark job
check logs of the spark job for errors

Expected behavior

spark job completes as it used to

Relevant log output

24/05/22 15:41:20 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2024-05-22T00:00Z, reading from jaeger-span-2024-05-22 index, result storing to jaeger-dependencies-2024-05-22
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:220)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:79)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:78)
        at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:691)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.groupBy(RDD.scala:690)
        at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
        at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:236)
        at io.jaegertracing.spark.dependencies.elastic.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:212)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.run(DependenciesSparkJob.java:54)
        at io.jaegertracing.spark.dependencies.DependenciesSparkJob.main(DependenciesSparkJob.java:40)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unsupported/Unknown Elasticsearch version [8.13.2].Highest supported version is [7.x]. You may need to upgrade ES-Hadoop.
        at org.elasticsearch.hadoop.util.EsMajorVersion.parse(EsMajorVersion.java:91)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:746)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
        ... 33 more

Screenshot

No response

Additional context

No response

Jaeger backend version

3.0.7

SDK

No response

Pipeline

No response

Stogage backend

Elasticsearch 8+

Operating system

Linux

Deployment model

Kubernetes

Deployment configs

# -- enable or disable Jaeger
enabled: true

storage:
  type: elasticsearch
  elasticsearch:
    # make this a template that decides based on devMode and can configure properly
    host: "jaeger-elasticsearch"
    usePassword: false
    antiAffinity: "soft"

# -- Preferred long term backend storage
elasticsearch:
  master:
    masterOnly: false
    replicaCount: 1
    lifecycleHooks:
      postStart:
        exec:
          command:
            - bash
            - -c
            - |
              #!/bin/bash
              # Add a template to adjust number of shards/replicas
              TEMPLATE_NAME=no_replicas
              # INDEX_PATTERN1="jaeger-span-*"
              # INDEX_PATTERN2="jaeger-service-*"
              INDEX_PATTERN1="jaeger-dependencies-*"
              ES_URL=http://localhost:9200
              while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
              curl -XPUT "$ES_URL/_index_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"'],"template":{"settings":{"number_of_replicas":"0"}}}'
  data:
    replicaCount: 0
  coordinating:
    replicaCount: 0
  ingest:
    replicaCount: 0
  fullnameOverride: "jaeger-elasticsearch"
  volumeClaimTemplate:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 3Gi

# -- For support with older Trace formats
agent:
    enabled: false

# -- The backend storage type to use
provisionDataStore:
  cassandra: false
  elasticsearch: true
  kafka: false

# -- The service that collects and serves trace information
collector:
  service:
    otlp:
      grpc:
        port: 4317
        name: oltp-grpc
      http:
        port: 4318
        name: oltp-http
  cmdlineParams:
    es.num-replicas: "0"

# -- The Jaeger UI service
query:
  agentSidecar:
    enabled: false
  # -- This should start with a /
  basePath: /jaeger

# Jaeger Spark job to generate the system architecture
spark:
  enabled: true
  schedule: "00 21 * * *"

Stevenpc3 · 2024-05-22T23:36:40Z

Need to update the registry in the chart based on this comment. #532 (comment)

Make the correct registry part of the chart

Stevenpc3 · 2024-05-28T13:15:38Z

@dpericaxon @yurishkuro Why is sparkdependencies hosted in github and the rest are hosted on docker? https://github.com/orgs/jaegertracing/packages

That is a bit confusing especially since there is on on docker that is claimed to be outdated via jaegertracing/spark-dependencies#137 (comment)

yurishkuro · 2024-05-28T16:04:46Z

I don't know how/why that decision was made. I agree it would've been better to use the same Docker and Quay hosting we use for other images.

yurishkuro · 2024-05-28T16:06:10Z

I updated the readme for spark-dependencies. I think this Helm chart should be pointing to a different location too:

helm-charts/charts/jaeger/values.yaml

Line 781 in f4213e2

repository: jaegertracing/spark-dependencies

Stevenpc3 · 2024-05-28T21:24:14Z

Yeah that link to the values is what I meant by #574 (comment)

I can make a PR. I think just setting the registry and repo to default to the ghcr.io will be fine. I did this locally since we use the global.imageRegistry. Then it will work out if the box for others.

Stevenpc3 added the bug Something isn't working label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

Stevenpc3 commented May 22, 2024

Stevenpc3 commented May 22, 2024 •

edited

Loading

Stevenpc3 commented May 28, 2024

yurishkuro commented May 28, 2024

yurishkuro commented May 28, 2024

Stevenpc3 commented May 28, 2024 •

edited

Loading

[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

[Bug]: Spark no longer works with new charts that use elasticsearch 8+ #574

Comments

Stevenpc3 commented May 22, 2024

What happened?

Steps to reproduce

Expected behavior

Relevant log output

Screenshot

Additional context

Jaeger backend version

SDK

Pipeline

Stogage backend

Operating system

Deployment model

Deployment configs

Stevenpc3 commented May 22, 2024 • edited Loading

Stevenpc3 commented May 28, 2024

yurishkuro commented May 28, 2024

yurishkuro commented May 28, 2024

Stevenpc3 commented May 28, 2024 • edited Loading

Stevenpc3 commented May 22, 2024 •

edited

Loading

Stevenpc3 commented May 28, 2024 •

edited

Loading