spark_read_jdbc returns columns with quotes instead of backticks #3196

crogers923 · 2021-11-01T18:31:55Z

spark_read_jdbc returns columns with quotes instead of backticks for query. This causes the results to return as literals instead of the data.

sc <- spark_connect(master = "yarn")
spark_read_jdbc(sc, "tbl_nm", options = list(
  url = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;..",
  driver = "com.simba.googlebigquery.jdbc.Driver",
  dbtable="project.dataset.tbl_nm",
  partitionColumn="COLUMN1",
  lowerBound = 1,
  upperBound = 100,
  numPartitions = 100
)

Generates (as an example):
SELECT "COLUMN1", "COLUMN2", "COLUMN3" FROM project.dataset.tbl_nm where "COLUMN1" >= 1;

This results in the error that: "No matching signature for operator >= for argument types: STRING, INT64"

This should instead be:
SELECT `COLUMN1`, `COLUMN2`, `COLUMN3` FROM project.dataset.tbl_nm where `COLUMN1` >= 1;

The text was updated successfully, but these errors were encountered:

gboyega1 · 2022-07-13T10:45:56Z

hello. How were you able to implement this connectivity between sparklyr and bigquery. I keep getting the following error:
Error: java.lang.NoSuchMethodError: 'java.util.List com.google.common.base.Splitter.splitToList(java.lang.CharSequence)'
Do you have any idea what I might be doing wrong?

gboyega1 · 2022-07-15T14:19:24Z

Hello again, I found that guava jar versions used by spark and simba bigquery connector are different (14.01 and 31.1 respectively) so I had to replace that of spark. I've also since had to replace or add a few other jars to the spark jars folder. Now I have the following error message:

Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 4 times, most recent failure: Lost task 0.3 in stage 24.0 (TID 39) (cluster-xxx.location-yyy.project-zzz.internal executor 2): java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to long. at com.simba.googlebigquery.exceptions.ExceptionConverter.toSQLException(Unknown Source) at com.simba.googlebigquery.utilities.conversion.TypeConverter.toLong(Unknown Source) at com.simba.googlebigquery.jdbc.common.SForwardResultSet.getLong(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$9(JdbcUtils.scala:446) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$9$adapted(JdbcUtils.scala:445) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:367) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:349) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

I've read a few blogs where they've suggested "implementing a custom JDBC dialect" to solve this, unfortunately, I have no idea how that is to be done, not to mention the fact that my coding ability is limited to R, some Python and SQL. I wonder if you have a more straight forward work around to this or how I can go about implementing the custom JDBC dialect.

olavloite mentioned this issue Jul 26, 2023

Bigquery 2.9.16: Using this connector in Spark is resulting in all values in the spark dataframe being the column names googleapis/java-spanner-jdbc#1245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark_read_jdbc returns columns with quotes instead of backticks #3196

spark_read_jdbc returns columns with quotes instead of backticks #3196

crogers923 commented Nov 1, 2021

gboyega1 commented Jul 13, 2022 •

edited

gboyega1 commented Jul 15, 2022

spark_read_jdbc returns columns with quotes instead of backticks #3196

spark_read_jdbc returns columns with quotes instead of backticks #3196

Comments

crogers923 commented Nov 1, 2021

gboyega1 commented Jul 13, 2022 • edited

gboyega1 commented Jul 15, 2022

gboyega1 commented Jul 13, 2022 •

edited