Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on acquiring lock when insert duplicated record in distributed mode #7391

Closed
rayxai opened this issue May 3, 2017 · 19 comments
Closed
Assignees
Labels
Milestone

Comments

@rayxai
Copy link

rayxai commented May 3, 2017

OrientDB Version: 2.2.18

Java Version: 1.8.0_73

OS: el6

issue:

when inserting duplicated record (with unique index) in distributed mode, have following exception:
ODistributedRecordLockedException: Timeout (3938ms) on acquiring lock on record #13:1 on server 'node2'. It is locked by request 0.14
For version 2.1.x, OrientDB will throw ORecordDuplicatedException, which I think make sense. because I can handle it immediately without delay.
why v2.2.x does not throw ORecordDuplicatedException, did I miss something?

Expected behavior

should throw ORecordDuplicatedException

Actual behavior

ODistributedRecordLockedException: Timeout (3938ms) on acquiring lock on record #13:1 on server 'node2'. It is locked by request 0.14

Steps to reproduce

  1. create unique index:
    drop class orion_v unsafe;
    create class orion_v extends v;
    create property orion_v.pk string;
    alter property orion_v.pk mandatory true;
    drop index idx_orion_v_pk;
    create index idx_orion_v_pk on orion_v(pk) unique_hash_index;

  2. run java code:
    String pk1="11";
    graph.addVertex("class:orion_v","pk",pk1);
    graph.addVertex("class:orion_v","pk",pk1);
    graph.commit();

@lvca
Copy link
Member

lvca commented May 3, 2017

In your pseudo code to reproduce the problem I don't see any edge. It's weird that you have a ODistributedRecordLockedException with just adding vertices. If this is the case, can you please provide a test case?

@lvca lvca self-assigned this May 3, 2017
@rayxai
Copy link
Author

rayxai commented May 4, 2017

yes, no need edge. I just tested it on win 10, same exception.

@Test
public void testDserver() throws Exception {
	OrientGraphFactory factory = new OrientGraphFactory("remote:localhost:2424/db1" , "admin", "admin").setupPool(1,50);
	OrientBaseGraph graph=factory.getTx();
	String pk1="11";
	graph.command(new OCommandSQL("drop class v1 unsafe")).execute();
	graph.command(new OCommandSQL("create class v1 extends v")).execute();
	graph.command(new OCommandSQL("create property v1.pk string")).execute();
	graph.command(new OCommandSQL("alter property v1.pk mandatory true")).execute();
	graph.command(new OCommandSQL("drop index idx_v1_pk")).execute();
	graph.command(new OCommandSQL("create index idx_v1_pk on v1(pk) unique_hash_index")).execute();
	graph.addVertex("class:v1","pk",pk1);
	graph.addVertex("class:v1","pk",pk1);
	graph.commit();
	graph.shutdown();
}

@lvca
Copy link
Member

lvca commented May 4, 2017

What record is #13:1 ? Can you print here the result of select from #13:1?

@rayxai
Copy link
Author

rayxai commented May 4, 2017

the result is empty, seem orientdb server catch the ORecordDuplicatedException and commit fail.
output of above test case is :

com.orientechnologies.orient.server.distributed.task.ODistributedRecordLockedException: Timeout (3963ms) on acquiring lock on record #22:0 on server 'node1'. It is locked by request 0.7
	DB name="db1"
	DB name="db1"
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.lang.reflect.Constructor.newInstance(Unknown Source)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167)
	at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2225)
	at com.orientechnologies.orient.client.remote.OStorageRemote$28.execute(OStorageRemote.java:1396)
	at com.orientechnologies.orient.client.remote.OStorageRemote$28.execute(OStorageRemote.java:1368)
	at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:198)
	at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:243)
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:195)
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:206)
	at com.orientechnologies.orient.client.remote.OStorageRemote.commit(OStorageRemote.java:1368)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:533)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:104)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2840)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2809)
	at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.commit(OrientTransactionalGraph.java:182)
	at com.smy.util.OrientDBUtilTest.testDserver(OrientDBUtilTest.java:469)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

@rayxai
Copy link
Author

rayxai commented May 4, 2017

additionally, there is following log in orient-server.log.0 after changing log level to FINE, seems orientdb server catch the ORecordDuplicatedException but not unlock the record.

2017-05-04 14:05:45:653 FINE [node1] 76 Selected cluster 21 for class 'v1' from [21, 22, 23, 24] (dCfgVersion=0) [OLocalClusterWrapperStrategy]
2017-05-04 14:05:45:655 FINE [node1] 76 Selected cluster 22 for class 'v1' from [21, 22, 23, 24] (dCfgVersion=0) [OLocalClusterWrapperStrategy]
2017-05-04 14:05:45:659 FINE [node1] Distributed transaction: registered request 0.1 in database 'db1' (thread=76) [ODistributedDatabaseImpl]
2017-05-04 14:05:45:662 FINE [node1] Locked record #21:-1 in database 'db1' (reqId=0.1 thread=76) [ODistributedDatabaseImpl]
2017-05-04 14:05:45:665 FINE [node1] Locked record #22:-1 in database 'db1' (reqId=0.1 thread=76) [ODistributedDatabaseImpl]
2017-05-04 14:05:45:670 FINE 76 Committing transaction 1 on database 'db1' (items=2)... [OLocalPaginatedStorage]
2017-05-04 14:05:45:678 FINE 76 Created record #21:0 v.1 size=13 bytes [OLocalPaginatedStorage]
2017-05-04 14:05:45:679 FINE 76 Created record #22:0 v.1 size=13 bytes [OLocalPaginatedStorage]Error during transaction commit, transaction will be rolled back (tx-id=1)
com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record #22:0: found duplicated key '11' in index 'idx_v1_pk' previously assigned to the record #21:0
DB name="db1" INDEX=idx_v1_pk RID=#21:0
at com.orientechnologies.orient.core.index.OIndexUnique$1.validate(OIndexUnique.java:47)
at com.orientechnologies.orient.core.index.OIndexUnique$1.validate(OIndexUnique.java:37)
at com.orientechnologies.orient.core.index.hashindex.local.OLocalHashTable.doPut(OLocalHashTable.java:1539)
at com.orientechnologies.orient.core.index.hashindex.local.OLocalHashTable.put(OLocalHashTable.java:1459)
at com.orientechnologies.orient.core.index.hashindex.local.OLocalHashTable.validatedPut(OLocalHashTable.java:412)
at com.orientechnologies.orient.core.index.engine.OHashTableIndexEngine.validatedPut(OHashTableIndexEngine.java:156)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doValidatedPutIndexValue(OAbstractPaginatedStorage.java:1990)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.validatedPutIndexValue(OAbstractPaginatedStorage.java:1966)
at com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:82)
at com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:35)
at com.orientechnologies.orient.core.index.OIndexAbstract.putInSnapshot(OIndexAbstract.java:960)
at com.orientechnologies.orient.core.index.OIndexAbstract.applyIndexTxEntry(OIndexAbstract.java:794)
at com.orientechnologies.orient.core.index.OIndexAbstract.addTxOperation(OIndexAbstract.java:766)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:1479)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1443)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.commit(ODistributedStorage.java:1380)
at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager$1.call(ODistributedTransactionManager.java:116)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:70)
at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager.commit(ODistributedTransactionManager.java:113)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.commit(ODistributedStorage.java:1410)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:539)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:104)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2840)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2809)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.commit(ONetworkProtocolBinary.java:1343)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:601)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.sessionRequest(ONetworkProtocolBinary.java:336)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:200)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)

@lvca
Copy link
Member

lvca commented May 4, 2017

This is interesting. So the problem is on duplicated key in the same distributed transaction?

@rayxai
Copy link
Author

rayxai commented May 4, 2017

yes

@lvca
Copy link
Member

lvca commented May 4, 2017

In branch 2.2.x we improved the management of timeouts with distributed. Could you please try with 2.2.20-SNAPSHOT if the error persists?

@rayxai
Copy link
Author

rayxai commented May 4, 2017

I tried 2.2.20-SNAPSHOT from git, the code stuck at line "graph.commit()", still failed.

@Test
public void testDserver() throws Exception {
	OrientGraphFactory factory = new OrientGraphFactory("remote:localhost:2424/db1" , "admin", "admin").setupPool(1,50);
	OrientBaseGraph graph=factory.getTx();
	String pk1="11";
	graph.command(new OCommandSQL("drop class v1 unsafe")).execute();
	graph.command(new OCommandSQL("create class v1 extends v")).execute();
	graph.command(new OCommandSQL("create property v1.pk string")).execute();
	graph.command(new OCommandSQL("alter property v1.pk mandatory true")).execute();
	graph.command(new OCommandSQL("drop index idx_v1_pk")).execute();
	graph.command(new OCommandSQL("create index idx_v1_pk on v1(pk) unique_hash_index")).execute();
	graph.addVertex("class:v1","pk",pk1);
	graph.addVertex("class:v1","pk",pk1);
	graph.commit();
	graph.shutdown();
}

@lvca
Copy link
Member

lvca commented May 19, 2017

We pushed a fix a few hours ago that seems the very same bug. Could you please build an OrientDB distribution from 2.2.x and check if this is working now? Thanks in advance.

@rayxai
Copy link
Author

rayxai commented May 19, 2017

I tried to start orientdb with -Ddistributed=true in IDEA, but have following exception, did I miss something.
BTW I can start with -Ddistributed=false, and pass the test case.

> 
> com.orientechnologies.orient.server.distributed.ODistributedStartupException: Error on starting distributed plugin
> 	at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:309)
> 	at com.orientechnologies.orient.server.OServer.registerPlugins(OServer.java:1242)
> 	at com.orientechnologies.orient.server.OServer.activate(OServer.java:397)
> 	at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:46)
> Caused by: com.orientechnologies.orient.core.exception.OStorageException: Cannot open local storage 'D:/1ray/tool/database/graph/OrientDB/orientdb-2.2.x/databases/db1' with mode=rw
> 	DB name="db1"
> 	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:298)
> 	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:258)
> 	at com.orientechnologies.orient.server.OServer.openDatabaseBypassingSecurity(OServer.java:980)
> 	at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:947)
> 	at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:932)
> 	at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.reassignClustersOwnership(ODistributedAbstractPlugin.java:727)
> 	at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin$6.call(OHazelcastPlugin.java:740)
> 	at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin$6.call(OHazelcastPlugin.java:727)
> 	at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeInDistributedDatabaseLock(ODistributedAbstractPlugin.java:1643)
> 	at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.loadLocalDatabases(OHazelcastPlugin.java:727)
> 	at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:254)
> 	... 3 more
> Caused by: com.orientechnologies.orient.core.exception.OStorageException: File with name internal.pcl does not exist in storage db1
> 	DB name="db1"
> 	at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.loadFile(OWOWCache.java:428)
> 	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.openFile(ODurableComponent.java:183)
> 	at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.open(OPaginatedCluster.java:215)
> 	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.openClusters(OAbstractPaginatedStorage.java:353)
> 	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:268)
> 	... 13 more

@lvca
Copy link
Member

lvca commented May 19, 2017

Which release are you using? The error is File with name internal.pcl does not exist in storage db1. It looks like some file is missing?

@rayxai
Copy link
Author

rayxai commented May 19, 2017

I can start it using branch 2.2.18, but after checkout to 2.2.x , it fail to start . with same config.

@rayxai
Copy link
Author

rayxai commented May 19, 2017

the test case passed just now, thanks.
will run other test cases later.

@lvca
Copy link
Member

lvca commented May 19, 2017

Cool, I'm closing this exception, in case the problem persists, please comment this issue, so I can reopen it. Thanks.

@lvca lvca closed this as completed May 19, 2017
@sblommers
Copy link

Just to confirm.

I had this same problem with Object API in distributed mode, using 2.2.20 it did not throw me this exception and halts everything, in my case it deadlocks without a timeout. After building 2.2.21 snapshot all is fine

@robfrank robfrank modified the milestones: 2.2.x (next hotfix), 2.2.21 May 31, 2017
@robfrank robfrank modified the milestones: 2.2.21, 2.2.x (next hotfix) May 31, 2017
@giggs131400
Copy link

I'm using orientdb 2.22 in distribute environment, when i use concurrent thread to write the data and it occurs the exception
Error: com.orientechnologies.orient.server.distributed.task.ODistributedRecordLockedException: Timeout (5000ms) on acquiring lock on record #46:0 on server 'node75'. It is locked by request 0.324309
DB name="graphics"
DB name="graphics"

below is my code:

i will create the sql by my java code, and then using java api to execute it.

java code:
OrientGraph graph = getOrientGraphTx();
graph.command(new OCommandScript(sql)).execute();

sql:

BEGIN

LET a0 = CREATE EDGE tic FROM #17:0 TO #52:0

LET a1 = CREATE EDGE trip FROM #17:0 TO #28:0

LET a2 = CREATE EDGE tmtel FROM #17:0 TO #59:0

COMMIT RETRY 0

RETURN [$a0, $a1, $a2]

I have some questions:

a. is that possible if a connection disconnected, and the record still locked?

b. how long the lock will be released? is there any way to unlock the recored?

c. Is there any better way to avoid such exceptipn?

d. if it is caused by a deallock, how can i config it to found it?

Thanks

@arpitaagarwal
Copy link

I am using Orient 3.0.10 and facing same issue on concurrent write. I have created following setup -

  • 2 node cluster
  • copied orientdb folder at 2 different locations and started ./dserver.sh file. Both nodes connected to each other through multicast
  • Using Person-FriendOf example code to write 10k Person-FriendOf-Person (vertex-edge-vertex). There are multiple threads writing same data each.
  • Person:name has unique index

It works well for 2 threads but as number of threads increases, it times out.
Getting following logs - Timeout (1244ms) on waiting for synchronous responses from nodes=[node2, node1] responsesSoFar=[node2] request=(id=1.105226 task=TxPhase1) [ODistributedDatabaseImpl]

Any clues?

@jonsalvas
Copy link

Same issue in 3.0.13. Any ideas how we can fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

8 participants