replication from compound using python #103

ciklysta · 2022-01-17T13:16:57Z

Bug Report

Irods version 4.2.11, centos7

I have the following resource hierarchy (OldResource being an old resource that I want to migrate data from)

DiskResource:unixfilesystem
OldResource:compound
├── OldArchiveResource:univmss
└── OldCacheResource:unixfilesystem

Data is only in the archive, the cache is empty.

I've written a custom rule that manages replication since it is a lengthy operation that I need to manage myself

migrateOneObj.r content:

main {
    testReplicationFromCompound();
}
INPUT null
OUTPUT ruleExecOut

core.re content:

testReplicationFromCompound {
    msiDataObjRepl("/zone/file.zip",
        "rescName=OldResource++++backupRescName=DiskResource", *Status);
}

When I run irule -F migrateOneObj.r under the user that owns file.zip, it works correctly.

However, if I move the function testReplicationFromCompound to python (core.py):

def testReplicationFromCompound(rule_args, callback, rei):
    callback.msiDataObjRepl("/zone/file.zip",
        "rescName=OldResource++++backupRescName=DiskResource", 0)

and run irule -F migrateOneObj.r (under the user that owns file.zip) the following happens:

the operation blocks on retrieving data from archive and never finishes
there are 2 irodsServer processes, that started when irule was started. strace says that
- first is blocked on read from a pipe
- second (child of the previous one) is blocked on futex(0x7f85e89d8ec8, FUTEX_WAIT_PRIVATE, 2, NULL

rodsLog's last line is

Jan 17 12:52:03 pid:88 NOTICE: execCmd:../../var/lib/irods/msiExecCmd_bin/migration- 
interface.sh argv:stageToCache '/data/archive/dev/file.zip' '/data/cache/dev/file.zip

univmss driver (shell script migration-interface.sh) is never run (otherwise it would create an entry in a custom log file)
there is an empty file created in the cache resource vault
in the sql db there are 2 rows:
- first one corresponds to the archive replica (data_is_dirty = 4)
- second one corresponds to the cache replica (data_is_dirty = 2)

The text was updated successfully, but these errors were encountered:

trel · 2022-01-17T17:36:58Z

I am not sure the functionality/quality of the backupRescName keyWd (it should probably be deprecated)...
try using destRescName instead to define the target of the replication operation.

https://docs.irods.org/4.2.11/doxygen/reDataObjOpr_8cpp.html#a957a06d93d1100dceb5a497bb9d1253f

It's possible you've found a similar issue to #54 - but this sounds a bit different.

ciklysta · 2022-01-19T17:04:13Z

I"ve just tried destRescName. There is no difference.

The linked bug arises only in case more threads are used. However this one indeed has different cause as it occurs even with numThreds=1.

trel · 2022-01-20T01:41:30Z

Upon further reading/consultation.... we think this is definitely the same as #54.

#54 was reported against 4.2.6 and 4.2.7, before we introduced logical locking in 4.2.9, which makes all data movement create a placeholder in the catalog first... like only parallel transfer did in 4.2.8 and before. This matches the scenario you're seeing above.

Pretty sure this is the reason... #1

a PREP-wide mutex getting wedged waiting on a child waiting on its parent.

Also explains why it works fine without coming through the Python rule engine plugin.

We'll test if whether that lock is still required/essential.

ciklysta · 2022-01-20T10:06:00Z

Thank you for investigation. In that case this ticket is a duplicate.

dworkin · 2022-11-09T12:55:02Z

@trel @ciklysta This may be a duplicate of irods/irods#6622 instead of #54: the problem occurs when numThreads=1 and msiExecCmd is used with a python rule on the call stack.

trel · 2022-11-12T20:34:58Z

Oh, interesting...

Any chance you think that irods/irods#6622 is actually, itself, the same as #54?

In other words, should we now re-test #54 to see if it is still a deadlock with the new irods/irods#6622 codefix in place?

dworkin · 2022-11-12T22:30:38Z

Both issues deadlock on the same lock, both require a python rule on the call stack, but they are distinct. #54 deadlocks without using msiExecCmd.

trel · 2022-11-13T16:00:37Z

Got it - right.

trel transferred this issue from irods/irods Jan 24, 2022

trel added bug duplicate labels Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication from compound using python #103

replication from compound using python #103

ciklysta commented Jan 17, 2022

trel commented Jan 17, 2022

ciklysta commented Jan 19, 2022

trel commented Jan 20, 2022 •

edited

Loading

ciklysta commented Jan 20, 2022

dworkin commented Nov 9, 2022

trel commented Nov 12, 2022

dworkin commented Nov 12, 2022

trel commented Nov 13, 2022

replication from compound using python #103

replication from compound using python #103

Comments

ciklysta commented Jan 17, 2022

Bug Report

trel commented Jan 17, 2022

ciklysta commented Jan 19, 2022

trel commented Jan 20, 2022 • edited Loading

ciklysta commented Jan 20, 2022

dworkin commented Nov 9, 2022

trel commented Nov 12, 2022

dworkin commented Nov 12, 2022

trel commented Nov 13, 2022

trel commented Jan 20, 2022 •

edited

Loading