-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decryption functionality to presto. #17791
Add decryption functionality to presto. #17791
Conversation
This is a rebased PR for #17728 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shangxinli I think mostly good. Only 2 minor issues.
@@ -140,4 +163,14 @@ public static long getFirstRowIndex(int pageIndex, OffsetIndex offsetIndex) | |||
{ | |||
return offsetIndex == null ? -1 : offsetIndex.getFirstRowIndex(pageIndex); | |||
} | |||
|
|||
// additional authenticated data for AES cipher | |||
private Slice decryptSliceIfNeeded(Slice slice, byte[] aad) throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throws IOException
in new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
private int pageIndex; | ||
private byte[] dataPageAAD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/dataPageAAD/dataPageAdditionalAuthenticationData/g
s/dictionaryPageAAD/dictionaryPageAdditionalAuthenticationData/g
I feel AAD is kind of hard to understand, shall we replace all AAD to Additional Authentication Data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from.setPosition(0); | ||
from.read(serializedFooter, 0, serializedFooter.length); | ||
|
||
byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAD, let's replace with Additional Authentication Data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
} | ||
Decryptor footerDecryptor = null; | ||
// additional authenticated data for AES cipher | ||
byte[] aad = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/aad/additionalAuthenticationData/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
throws IOException | ||
{ | ||
LinkedList<DataPage> pages = new LinkedList<>(); | ||
DictionaryPage dictionaryPage = null; | ||
long valueCount = 0; | ||
int dataPageCount = 0; | ||
int pageOrdinal = 0; | ||
byte[] dataPageHeaderAAD = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
while (hasMorePages(valueCount, dataPageCount)) { | ||
PageHeader pageHeader = readPageHeader(); | ||
byte[] pageHeaderAAD = dataPageHeaderAAD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (stats != null) { | ||
return stats.hasDictionaryPages() && stats.hasDictionaryEncodedPages(); | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of stats == null, we will rely on encodings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, the logic is needed, shall we remove:
else {
since the if statement already returns
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class EncDecPropertiesHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/EncDecPropertiesHelper/EncryptDecryptUtil/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't started reviewing the functionality, but can we
- Fix the release note part
- For commit message, Limit the subject line to 50 characters to ensure that they are readable (https://cbea.ms/git-commit/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you squash two commits?
...to-parquet/src/main/java/com/facebook/presto/parquet/cache/CachingParquetMetadataSource.java
Outdated
Show resolved
Hide resolved
...to-parquet/src/main/java/com/facebook/presto/parquet/cache/CachingParquetMetadataSource.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/main/java/com/facebook/presto/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
@kewang1024 Thanks for the review! Fixed the release notes and the commit message, and addressed the feedback. |
7c70f16
to
b5804ad
Compare
Just did. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still in the process of understanding the overall logic, so my comments are mainly abut the tests
- Can we also add an additional benchmark test for parquetReader when it has
fileDecryptor
? - I have an improvement idea on the design of class
EncryptDecryptUtil
I think it would be better to be able to pass in key information (FOOTER_KEY, FOOTER_KEY_METADATA, COL_KEY, COL_KEY_METADATA) and create aEncryptDecryptGenerator
instance, which can generate a pair ofFileDecryptionProperties
andFileEncryptionProperties
With this design, in the future (or in this PR) we can also test edge cases forEncryptDecryptGenerator
presto-parquet/src/main/java/com/facebook/presto/parquet/cache/MetadataReader.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/EncryptionTestFile.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/TestEncryption.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/TestEncryption.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/TestEncryption.java
Outdated
Show resolved
Hide resolved
FileSystem fileSystem = path.getFileSystem(conf); | ||
FSDataInputStream inputStream = fileSystem.open(path); | ||
long fileSize = fileSystem.getFileStatus(path).getLen(); | ||
Optional<InternalFileDecryptor> fileDecryptor = createFileDecryptor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can fileDecryptor be shared? and looks like no need to make it an Optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shareable? yes, made it a member variable.
Optional? From a test perspective, we don't need it but the signature of readFooter() need optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: It turned out that I am wrong. The decryptor is not sharable. After I share, I got the error: 'ParquetCryptoRuntimeException: Decryptor re-use'.
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/TestEncryption.java
Outdated
Show resolved
Hide resolved
private static final byte[] FOOTER_KEY_METADATA = "footkey".getBytes(StandardCharsets.UTF_8); | ||
private static final byte[] COL_KEY = {0x02, 0x03, 0x4, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, | ||
0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11}; | ||
private static final byte[] COL_KEY_METADATA = "col".getBytes(StandardCharsets.UTF_8); | ||
|
||
public static FileDecryptionProperties getFileDecryptionProperties() | ||
throws IOException | ||
{ | ||
DecryptionKeyRetrieverMock keyRetriever = new DecryptionKeyRetrieverMock(); | ||
keyRetriever.putKey("footkey", FOOTER_KEY); | ||
keyRetriever.putKey("col", COL_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract "footkey" and "col" to static final variable
Correct me if I'm wrong, my understanding is that the names have be the same for (L55, L64), (L58, L65), otherwise, the decryptor and encryptor won't match. Thus we should avoid typing them manually in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already final static, right? Not able to understand the ask.
No, we don't need them match. The encrypt and decrypt using the same key would be fine.
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/EncryptDecryptUtil.java
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/TestEncryption.java
Show resolved
Hide resolved
|
9f4e66d
to
0b8bc6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have finished my review and most of my comments are regarding coding style and misleading naming; Resolving those would put us in a good state.
I'm not an expert in the parquet encryption/decryption logic, I will leave that to @zhenxiao for a final approval
presto-delta/src/main/java/com/facebook/presto/delta/DeltaPageSourceProvider.java
Outdated
Show resolved
Hide resolved
FileDecryptionProperties fileDecryptionProperties = (cryptoFactory == null) ? | ||
null : cryptoFactory.getFileDecryptionProperties(configuration, path); | ||
Optional<InternalFileDecryptor> fileDecryptor = (fileDecryptionProperties == null) ? | ||
Optional.empty() : Optional.of(new InternalFileDecryptor(fileDecryptionProperties)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
presto-parquet/src/main/java/com/facebook/presto/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
@@ -172,6 +176,7 @@ public ParquetReader(MessageColumnIO | |||
} | |||
this.currentBlock = -1; | |||
this.columnIndexFilterEnabled = columnIndexFilterEnabled; | |||
this.fileDecryptor = fileDecryptor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: requireNonNull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
presto-parquet/src/main/java/com/facebook/presto/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
// Lambda expression below requires final variable, so we define a new variable parquetDataSource. | ||
final ParquetDataSource parquetDataSource = buildHdfsParquetDataSource(inputStream, path, stats); | ||
dataSource = parquetDataSource; | ||
DecryptionPropertiesFactory cryptoFactory = DecryptionPropertiesFactory.loadFactory(configuration); | ||
FileDecryptionProperties fileDecryptionProperties = (cryptoFactory == null) ? | ||
null : cryptoFactory.getFileDecryptionProperties(configuration, path); | ||
Optional<InternalFileDecryptor> fileDecryptor = (fileDecryptionProperties == null) ? | ||
Optional.empty() : Optional.of(new InternalFileDecryptor(fileDecryptionProperties)); | ||
ParquetMetadata parquetMetadata = hdfsEnvironment.doAs(user, () -> MetadataReader.readFooter(parquetDataSource, fileSize, fileDecryptor).getParquetMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block is shared in all those three places “DeltaPageSourceProvider”, “”, “ParquetPageSourceFactory”, “IcebergPageSourceProvider”, can we extract them to be an util function / static function in presto-parquet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In addition to the new change now, there are a few other existing places that should do the same. I create the issue #17835 to work on this after this PR is merged. This PR is already too large.
presto-delta/src/main/java/com/facebook/presto/delta/DeltaPageSourceProvider.java
Show resolved
Hide resolved
if (!HiddenColumnChunkMetaData.isHiddenColumn(columnMetaData)) { | ||
Statistics<?> columnStatistics = columnMetaData.getStatistics(); | ||
if (columnStatistics != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Merge those two ifs, too many nested statements hurt code readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between these two diffs, we can save the .toArray() call if the first 'if' is not true. But fine, not a big deal anyway. I just merged it.
presto-parquet/src/main/java/com/facebook/presto/parquet/predicate/PredicateUtils.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/EncryptDecryptUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge all the appending commits that addressed the comments back into original commit, overall looks good to me
@zhenxiao a final approval for the decryption and encryption logic?
requireNonNull(path, "path should not be null"); | ||
this.path = path; | ||
requireNonNull(filePath, "filePath should not be null"); | ||
this.filePath = filePath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
this.path = requireNonNull(path, "path should not be null");
this.filePath = requireNonNull(filePath, "filePath should not be null");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, many thanks for the contribution!
0c00b2a
to
cbbe3c4
Compare
Thanks for all who provided comments! I just squashed all the commits and addressed the last feedback from @kewang1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks nice, @shangxinli
I am good to merge this PR
could you please fix the 2 remaining issues, and squash into 1 commit?
private final FSDataInputStream inputStream; | ||
private long readTimeNanos; | ||
private long readBytes; | ||
//private final ParquetReaderOptions options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove this line? @shangxinli
if (stats != null) { | ||
return stats.hasDictionaryPages() && stats.hasDictionaryEncodedPages(); | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, the logic is needed, shall we remove:
else {
since the if statement already returns
Co-authored-by: ggershinsky <ggershinsky@users.noreply.github.com> Summary: This is to port parquet-mr decryption apache/parquet-java@65b95fb
4c0ade5
to
b1f3d03
Compare
Fixed |
Co-authored-by: ggershinsky ggershinsky@users.noreply.github.com
Summary: This is to port parquet-mr decryption functionality. The main commits in parquet-mr for encryption/decryption are apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.
Test plan - (Please fill in how you tested your changes)
This feature was tested in the Uber environment and then rolled out to production for 2+ years.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.