Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Data Connector cannot connect to locally running Spark cluster #1251

Closed
phillipleblanc opened this issue May 1, 2024 · 1 comment
Closed
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@phillipleblanc
Copy link
Contributor

Describe the bug
Configuring a dataset to use the Spark Data Connector and setting the parameter to connect locally to sc://localhost:15002 results in a transport error.

Spice.ai runtime starting...
2024-05-01T00:19:11.836924Z  INFO spiced: Metrics listening on 127.0.0.1:9000
2024-05-01T00:19:12.018627Z  WARN runtime: Failed to get data connector from source for dataset traces: Unable to initialize data connector spark: transport error

This is the Spicepod I used:

version: v1beta1
kind: Spicepod
name: spark

datasets:
- from: spark:traces
  name: traces
  params:
    spark_remote: sc://localhost:15002

To Reproduce
Follow this quickstart to start a Spark Connect cluster locally: https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html

Use pyspark to register a table from a parquet file at traces:

from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
# Replace the path with a path to a parquet file on your system
spark.catalog.createTable("traces", path="/path/to/traces.parquet", source="parquet")

Start the Spice runtime with the above Spicepod that connects to the Spark Cluster, and observe the transport error and this error from Spark:

24/05/01 09:19:13 INFO connections: Transport failed io.netty.handler.codec.http2.Http2Exception: HTTP/2 client preface string missing or corrupt. Hex dump for received bytes: 16030100f0010000ec03033053ae382b06852dedb277ee05

Expected behavior
The connection is made and I can query my table.

Additional context
I believe this is because we try to connect over TLS by default - we should provide an option to bypass TLS for this local scenario.

@Sevenannn
Copy link
Contributor

Close by #1439

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants