Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ORC support for iceberg connector #16391

Merged
merged 2 commits into from Jul 15, 2021
Merged

Add ORC support for iceberg connector #16391

merged 2 commits into from Jul 15, 2021

Conversation

junyi1313
Copy link
Contributor

@junyi1313 junyi1313 commented Jul 6, 2021

Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483, trinodb/trino@ecce4a2

Co-authored-by: Parth Brahmbhatt pbrahmbhatt@netflix.com
Co-authored-by: David Phillips david@acz.org
Co-authored-by: Xingyuan Lin linxingyuan1102@gmail.com
Co-authored-by: Dain Sundstrom dain@iq80.com

This PR implements the issue: #16305
Test plan - Unit Tests

== RELEASE NOTES ==

Iceberg Connector Changes
* Add ORC support for iceberg connector

Cherry-pick of trinodb/trino@ecce4a2

Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
@junyi1313
Copy link
Contributor Author

@ChunxuTang @zhenxiao @beinan Could you help take a look? Thanks.

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, @junyi1313 one minor thing
@beinan @ChunxuTang could you please take a look?

@@ -184,5 +184,9 @@
<directory>${project.build.directory}/dependency/presto-bigquery-${project.version}</directory>
<outputDirectory>plugin/presto-bigquery</outputDirectory>
</fileSet>
<fileSet>
<directory>${project.build.directory}/dependency/presto-iceberg-${project.version}</directory>
<outputDirectory>plugin/iceberg</outputDirectory>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presto-iceberg or iceberg? either is fine, just a note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. Just like mysql connector: assembly/presto.xml#presto-mysql

@zhenxiao zhenxiao requested a review from beinan July 7, 2021 08:01
@ChunxuTang
Copy link
Member

@junyi1313 @zhenxiao
Sure, I'm glad to help with the review. will take a look this weekend.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 Thanks for your work! This is a very nice feature we want for the iceberg connector!!
From my initial review, the implementation on the iceberg connector part generally looks good to me.
Just one reminder: I noticed that there're some new features/improvements but are not in the PRs you cherry-picked. I think the new features may deserve a bit more documentation or clarification as they are unique contributions.

@zhenxiao @beinan There are some changes in the presto-orc package, including the upgrade of ORC. I'm not familiar with that package. Could you folks have a closer look at those presto-orc changes? Or any other folks know more?

@Singleton
@Provides
public OrcFileTailSource createOrcFileTailSource(OrcCacheConfig orcCacheConfig, MBeanExporter exporter)
{
OrcFileTailSource orcFileTailSource = new StorageOrcFileTailSource();
if (orcCacheConfig.isFileTailCacheEnabled()) {
Cache<OrcDataSourceId, OrcFileTail> cache = CacheBuilder.newBuilder()
.maximumWeight(orcCacheConfig.getFileTailCacheSize().toBytes())
.weigher((id, tail) -> ((OrcFileTail) tail).getFooterSize() + ((OrcFileTail) tail).getMetadataSize())
.expireAfterAccess(orcCacheConfig.getFileTailCacheTtlSinceLastAccess().toMillis(), MILLISECONDS)
.recordStats()
.build();
CacheStatsMBean cacheStatsMBean = new CacheStatsMBean(cache);
orcFileTailSource = new CachingOrcFileTailSource(orcFileTailSource, cache);
exporter.export(generatedNameOf(CacheStatsMBean.class, connectorId + "_OrcFileTail"), cacheStatsMBean);
}
return orcFileTailSource;
}

@Singleton
@Provides
public StripeMetadataSource createStripeMetadataSource(OrcCacheConfig orcCacheConfig, MBeanExporter exporter)
{
StripeMetadataSource stripeMetadataSource = new StorageStripeMetadataSource();
if (orcCacheConfig.isStripeMetadataCacheEnabled()) {
Cache<StripeReader.StripeId, Slice> footerCache = CacheBuilder.newBuilder()
.maximumWeight(orcCacheConfig.getStripeFooterCacheSize().toBytes())
.weigher((id, footer) -> toIntExact(((Slice) footer).getRetainedSize()))
.expireAfterAccess(orcCacheConfig.getStripeFooterCacheTtlSinceLastAccess().toMillis(), MILLISECONDS)
.recordStats()
.build();
Cache<StripeReader.StripeStreamId, Slice> streamCache = CacheBuilder.newBuilder()
.maximumWeight(orcCacheConfig.getStripeStreamCacheSize().toBytes())
.weigher((id, stream) -> toIntExact(((Slice) stream).getRetainedSize()))
.expireAfterAccess(orcCacheConfig.getStripeStreamCacheTtlSinceLastAccess().toMillis(), MILLISECONDS)
.recordStats()
.build();
CacheStatsMBean footerCacheStatsMBean = new CacheStatsMBean(footerCache);
CacheStatsMBean streamCacheStatsMBean = new CacheStatsMBean(streamCache);
stripeMetadataSource = new CachingStripeMetadataSource(stripeMetadataSource, footerCache, streamCache);
exporter.export(generatedNameOf(CacheStatsMBean.class, connectorId + "_StripeFooter"), footerCacheStatsMBean);
exporter.export(generatedNameOf(CacheStatsMBean.class, connectorId + "_StripeStream"), streamCacheStatsMBean);
}
return stripeMetadataSource;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the PRs you cherry-picked, but it seems that this snippet of code is not in those PRs. Is this a new feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to provide some context of these two functions -- they are for caching the metadata of orc files -- more details: #13501

But looks like these two are duplicated with the functions in presto-hive/src/main/java/com/facebook/presto/hive/HiveClientModule.java
Have we included HiveClientModule in the iceberg connector? if not, can we reuse the functions in HiveClientModule or extract them to a separate module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beinan @ChunxuTang These two functions are copied from HiveClientModule because we haven't included HiveClientModule in the iceberg connector. Besides, the presto-raptor StorageModule also has these two functions. I think extracting them to a separate module is better. Should we do it in this PR or open a new PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 I'm ok with either. @ChunxuTang what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 @beinan
Thanks for your clarification! Yeah, I'm also ok with either way. @junyi1313 your call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will find time to send a new PR about this work after this PR has been merged.

@@ -0,0 +1,164 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a new file that is not in the PRs cherry-picked. Any specific reasons to create this class?

nit: Some setter functions (e.g. setOrcType, setAttributes, etc.) are unused.

Copy link
Contributor Author

@junyi1313 junyi1313 Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergOrcColumn.java is similar to the OrcColumn. I have updated the cherry-pick infos(add trinodb/trino#1629, trinodb/trino#3483) and removed the unused functions. Pls help with the review again. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for your work!

min = min.setScale(((Types.DecimalType) icebergType).scale());
max = max.setScale(((Types.DecimalType) icebergType).scale());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we directly import DecimalType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Great contribution, many thanks!

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313
Thanks for your nice work! Looks that there're some errors in CI tests. Could you fix the errors and update the PR to pass the tests?

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 14, 2021

CLA Signed

The committers are authorized under a signed CLA.

Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483

Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com>
Co-authored-by: David Phillips <david@acz.org>
Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
@junyi1313
Copy link
Contributor Author

@ChunxuTang I have updated the PR and fixed the CI errors.
@zhenxiao @beinan Pls help to merge.
Thanks so much.

@zhenxiao zhenxiao merged commit d3823d5 into prestodb:master Jul 15, 2021
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants