NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

ankitbko · 2020-09-05T07:56:39Z

Because the temp table remains unique to a worker for a table, multiple tasks executing on the same worker will result in deadlock,

pramodnagare · 2020-10-08T20:41:31Z

Hi ankitbko
Try adding below option, it will solve your table lock issue.

option("tableLock", True)

ankitbko · 2020-10-09T04:05:12Z

@pramodnagare tablelock option is meant to be used with heap tables only. tablelock option is set to true automatically for staging tables.

sql-spark-connector/src/main/scala/com/microsoft/sqlserver/jdbc/spark/connectors/ReliableSingleInstanceStrategy.scala

Line 85 in 526f08d

    
           val newOptions = new SQLServerBulkJdbcOptions(options.parameters + ("tableLock" -> "true"))

cc: @shivsood

bunkersdev · 2020-10-12T17:11:30Z

Hello, I am encountering the same issue. I need to use the following options
.option("schemaCheckEnabled", False) \ .option("reliabilityLevel","NO_DUPLICATES") \

@shivsood

bunkersdev · 2020-11-02T15:10:45Z

Any update on this ?

shivsood · 2021-06-07T21:31:31Z

@ankitbko does this happen only on databricks? Can u send repro step and your scripts for us to repro this. @luxu1-ms as FYI

gerardwolf · 2021-08-23T03:19:03Z

I also encounter this issue when running multiple instances of the same notebook on the same cluster (DBR 7.3 LTS) using Azure Databricks. My notebook attempts to load data from delta lake table source and overwrite the data in an existing Azure SQL Database table using the following options:

try: dfSource.write \ .format("com.microsoft.sqlserver.jdbc.spark") \ .mode("overwrite") \ .option("url", url) \ .option("dbtable", sinkTable) \ .option("user", userName) \ .option("password", password) \ .option("truncate", "true") \ .option("tablock", "true") \ .option("schemaCheckEnabled", "false") \ .option("batchsize", batchsize) \ .option("reliabilityLevel", "NO_DUPLICATES") \ .save() except ValueError as error : print("Connector write failed", error)

The intent is to preserve the table definition (and indexes) in the sink database. I realise the tablock is only required for heaps, but this is a generic notebook that loads to both indexed and heap tables.
Importantly, I don't get an error when I run the notebook in isolation, so it appears that there is some shared global temp table that is causing deadlocks in parallel execution scenario.

luxu1-ms · 2021-08-23T21:46:53Z

I think gerardwolf's issue is different from #49, but related with #132.

gerardwolf · 2021-08-23T21:58:22Z

Certainly looks like it is caused by the name of the temp staging table not being granular enough. Adding the sink table name to the temp table should resolve this yes.

luxu1-ms · 2021-09-27T18:47:13Z

@ankitbko Could you please provide repro scripts? I am not able to repro this issue.

luxu1-ms · 2021-10-04T18:37:55Z

Close this issue since no more info provided and cannot repro it. Please reopen if more info provided.

ankitbko mentioned this issue Sep 5, 2020

NO_DUPLICATES” fails with com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '-'. #22

Closed

rajmera3 assigned shivsood Mar 1, 2021

rajmera3 added the bug Something isn't working label Mar 1, 2021

shivsood assigned luxu1-ms and unassigned shivsood Jun 8, 2021

luxu1-ms added the more information needed label Aug 2, 2021

luxu1-ms added enhancement New feature or request and removed bug Something isn't working labels Sep 27, 2021

luxu1-ms closed this as completed Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

ankitbko commented Sep 5, 2020

pramodnagare commented Oct 8, 2020

ankitbko commented Oct 9, 2020

bunkersdev commented Oct 12, 2020 •

edited

Loading

bunkersdev commented Nov 2, 2020

shivsood commented Jun 7, 2021

gerardwolf commented Aug 23, 2021 •

edited

Loading

luxu1-ms commented Aug 23, 2021 •

edited

Loading

gerardwolf commented Aug 23, 2021

luxu1-ms commented Sep 27, 2021

luxu1-ms commented Oct 4, 2021

NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

Comments

ankitbko commented Sep 5, 2020

pramodnagare commented Oct 8, 2020

ankitbko commented Oct 9, 2020

bunkersdev commented Oct 12, 2020 • edited Loading

bunkersdev commented Nov 2, 2020

shivsood commented Jun 7, 2021

gerardwolf commented Aug 23, 2021 • edited Loading

luxu1-ms commented Aug 23, 2021 • edited Loading

gerardwolf commented Aug 23, 2021

luxu1-ms commented Sep 27, 2021

luxu1-ms commented Oct 4, 2021

bunkersdev commented Oct 12, 2020 •

edited

Loading

gerardwolf commented Aug 23, 2021 •

edited

Loading

luxu1-ms commented Aug 23, 2021 •

edited

Loading