New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp #3616
[GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp #3616
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3616 +/- ##
=========================================
Coverage 46.58% 46.58%
- Complexity 10672 10675 +3
=========================================
Files 2133 2133
Lines 83557 83570 +13
Branches 9290 9294 +4
=========================================
+ Hits 38928 38935 +7
- Misses 41068 41069 +1
- Partials 3561 3566 +5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
a176b0e
to
48f6ff2
Compare
...ata-management/src/main/java/org/apache/gobblin/data/management/copy/OwnerAndPermission.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
48f6ff2
to
a7121d1
Compare
gobblin-data-management/src/main/java/gobblin/data/management/copy/OwnerAndPermission.java
Outdated
Show resolved
Hide resolved
...ata-management/src/main/java/org/apache/gobblin/data/management/copy/PreserveAttributes.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Outdated
Show resolved
Hide resolved
...ta-management/src/test/java/org/apache/gobblin/data/management/copy/TestCopyableDataset.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
...ata-management/src/test/java/org/apache/gobblin/util/commit/SetPermissionCommitStepTest.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking very close!
...ata-management/src/main/java/org/apache/gobblin/data/management/copy/PreserveAttributes.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriter.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
f92867a
to
1274aab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...a couple last Qs
Path outputDir = writer.outputDir; | ||
String[] splitExpectedOutputPath = expectedOutputPath.toString().split("output"); | ||
Path dstOutputPath = new Path(outputDir.toString().concat(splitExpectedOutputPath[1])).getParent(); | ||
Path stgFilePath = new Path("file:".concat(stagingDir.toString().concat("/file"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think I've ever actually used the String.concat
method before... what about the canonical +
? is there some advantage, since:
new Path("file:" + stagingDir + "/file");
seems easier to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.concat
handles NPEs if any compared to +
... I guess that's the only difference.
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just needs javadoc plus realignment so the expected, not actual/observed values drive the verification
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Show resolved
Hide resolved
230367c
to
5c86e22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on the test cases, meeth--they now verify the crux of the impl.
there still seems room to share impl between the two new test cases. ideally, the only source difference between the two would be the initialization of some different param values that lead to the changed response by the system. (plus maybe different or additional assertions).
that said, looks ok for now... we can always return to refactor, esp. if we later extend this suite. this looks ready to get in so we can start benefiting from it!
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Show resolved
Hide resolved
.../java/org/apache/gobblin/data/management/copy/writer/FileAwareInputStreamDataWriterTest.java
Outdated
Show resolved
Hide resolved
5c86e22
to
d8b378d
Compare
* upstream/master: [GOBBLIN-1774] Util for detecting non optional uniontypes Hive tables (apache#3632) [GOBBLIN-1773] Fix bugs in quota manager (apache#3636) [GOBBLIN-1782] Fix Merge State for Flow Pending Resume statuses (apache#3639) [GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp (apache#3616) [GOBBLIN-1780] Refactor/rename YarnServiceIT to YarnServiceTest (apache#3637) [GOBBLIN-1778] Add house keeping thread in DagManager to periodically sync in memory state with mysql table (apache#3635) Register gauge metrics for change monitors (apache#3634)
* upstream/master: [GOBBLIN-1774] Util for detecting non optional uniontypes Hive tables (apache#3632) [GOBBLIN-1773] Fix bugs in quota manager (apache#3636) [GOBBLIN-1782] Fix Merge State for Flow Pending Resume statuses (apache#3639) [GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp (apache#3616) [GOBBLIN-1780] Refactor/rename YarnServiceIT to YarnServiceTest (apache#3637) [GOBBLIN-1778] Add house keeping thread in DagManager to periodically sync in memory state with mysql table (apache#3635) Register gauge metrics for change monitors (apache#3634)
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
We want to support extended ACLs within for any type of file based distcp that is initiated. The previous of Hadoop i.e. 2.4 in Gobblin has this limitation since there wasn't APIs in the Hadoop FileSystem that we could leverage. Thus, as part of this PR, I built on top of our Hadoop upgrade version to 2.10 and provided the support for extended ACLs.
This PR now allows us to preserve ACLs as well during a distcp along with its previous other attributes which includes and not limited to: file permissions with sticky bit, owner, group, mod times, etc. We preserve these ACLs for the all the directories/paths that were created as part of the distcp operation on the destination. Having said that, the user can limit the ancestor directory up to which they want to preserve these attributes. All of the attribute preservation for the FileSystem is configurable including the current support for ACLs. Below are key changes made as part of this PR:
Tests
Commits