Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO_DUPLICATES causes deadlock as multiple tasks try to insert data into same global temp table #49

Closed
ankitbko opened this issue Sep 5, 2020 · 10 comments
Assignees
Labels

Comments

@ankitbko
Copy link
Member

ankitbko commented Sep 5, 2020

Because the temp table remains unique to a worker for a table, multiple tasks executing on the same worker will result in deadlock,

image

@pramodnagare
Copy link

Hi ankitbko
Try adding below option, it will solve your table lock issue.

option("tableLock", True)

@ankitbko
Copy link
Member Author

ankitbko commented Oct 9, 2020

@pramodnagare tablelock option is meant to be used with heap tables only. tablelock option is set to true automatically for staging tables.

val newOptions = new SQLServerBulkJdbcOptions(options.parameters + ("tableLock" -> "true"))

cc: @shivsood

@bunkersdev
Copy link

bunkersdev commented Oct 12, 2020

Hello, I am encountering the same issue. I need to use the following options
.option("schemaCheckEnabled", False) \ .option("reliabilityLevel","NO_DUPLICATES") \

@shivsood

@bunkersdev
Copy link

Any update on this ?

@rajmera3 rajmera3 added the bug Something isn't working label Mar 1, 2021
@shivsood
Copy link
Collaborator

shivsood commented Jun 7, 2021

@ankitbko does this happen only on databricks? Can u send repro step and your scripts for us to repro this. @luxu1-ms as FYI

@gerardwolf
Copy link

gerardwolf commented Aug 23, 2021

I also encounter this issue when running multiple instances of the same notebook on the same cluster (DBR 7.3 LTS) using Azure Databricks. My notebook attempts to load data from delta lake table source and overwrite the data in an existing Azure SQL Database table using the following options:

try: dfSource.write \ .format("com.microsoft.sqlserver.jdbc.spark") \ .mode("overwrite") \ .option("url", url) \ .option("dbtable", sinkTable) \ .option("user", userName) \ .option("password", password) \ .option("truncate", "true") \ .option("tablock", "true") \ .option("schemaCheckEnabled", "false") \ .option("batchsize", batchsize) \ .option("reliabilityLevel", "NO_DUPLICATES") \ .save() except ValueError as error : print("Connector write failed", error)

The intent is to preserve the table definition (and indexes) in the sink database. I realise the tablock is only required for heaps, but this is a generic notebook that loads to both indexed and heap tables.
Importantly, I don't get an error when I run the notebook in isolation, so it appears that there is some shared global temp table that is causing deadlocks in parallel execution scenario.

@luxu1-ms
Copy link
Collaborator

luxu1-ms commented Aug 23, 2021

I think gerardwolf's issue is different from #49, but related with #132.

@gerardwolf
Copy link

Certainly looks like it is caused by the name of the temp staging table not being granular enough. Adding the sink table name to the temp table should resolve this yes.

@luxu1-ms
Copy link
Collaborator

@ankitbko Could you please provide repro scripts? I am not able to repro this issue.

@luxu1-ms luxu1-ms added enhancement New feature or request and removed bug Something isn't working labels Sep 27, 2021
@luxu1-ms
Copy link
Collaborator

luxu1-ms commented Oct 4, 2021

Close this issue since no more info provided and cannot repro it. Please reopen if more info provided.

@luxu1-ms luxu1-ms closed this as completed Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants