You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After experimenting and comparing the behavior of the C++, Python, and Jargon parallel transfer implementations, the server appears to have a hidden requirement in regards to closing a replica and updating the catalog via rx_replica_close().
That requirement being ... The first stream to open a replica for parallel transfer is also required to be the final stream to close the replica and perform catalog updates.
Testing was performed against a slightly modified 4.2.11 iRODS server.
The zone does not contain the target data object prior to running the test.
The test creates a brand new data object targeting the repl resource.
Findings
The C++ and Python implementations trigger replication correctly.
This is because they force the first stream to handle the catalog updates (i.e. the first stream waits for sibling replicas to complete their transfers).
The Jargon implementation leaves updating of the catalog to the stream which finished its transfer last.
This results in hierarchy resolution issues (see table below).
The hierarchy resolution operation is CREATE for all streams instead of OPEN for overlapping secondary streams.
The voting operation is WRITE instead of OPEN for all overlapping secondary streams.
Resolution results in an empty replica list.
The replica state table issues a log warning about not having original replica statuses for restoration. (I'm not sure if this warning always appears)
This scheme makes a lot of sense, however, it is problematic for certain scenarios. For example:
NFSRODS is driven by the user. If the user runs a command such as dd, NFSRODS is at the mercy of how dd decides to write bytes. Jargon attempts to detect overlapping streams to simplify things for the developer, but this presents a problem because overlap may not happen due to how the JVM/OS decides to schedule threads.
How do we solve this for applications similar to NFSRODS?
A simple multi-threaded Jargon application, using latches to enforce the first-stream requirement, resulted in replication working consistently.
Additional Information regarding Hierarchy Resolution
The table below shows the values for three agents doing a parallel write of a zero-length file. The first row of each table represents the first stream to open the replica.
To resolve this issue, we have to claim that the requirement is part of the design or we lift the requirement so that the last stream to close is responsible for updating the catalog and triggering policy.
The Hidden Requirement
After experimenting and comparing the behavior of the C++, Python, and Jargon parallel transfer implementations, the server appears to have a hidden requirement in regards to closing a replica and updating the catalog via
rx_replica_close()
.That requirement being ... The first stream to open a replica for parallel transfer is also required to be the final stream to close the replica and perform catalog updates.
Testing was performed against a slightly modified 4.2.11 iRODS server.
The following resource hierarchy was used:
The zone does not contain the target data object prior to running the test.
The test creates a brand new data object targeting the
repl
resource.Findings
CREATE
for all streams instead ofOPEN
for overlapping secondary streams.WRITE
instead ofOPEN
for all overlapping secondary streams.dd
, NFSRODS is at the mercy of howdd
decides to write bytes. Jargon attempts to detect overlapping streams to simplify things for the developer, but this presents a problem because overlap may not happen due to how the JVM/OS decides to schedule threads.Additional Information regarding Hierarchy Resolution
The table below shows the values for three agents doing a parallel write of a zero-length file. The first row of each table represents the first stream to open the replica.
C++ Implementation
Jargon Implementation
The text was updated successfully, but these errors were encountered: