-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update hadoop to 3.2.0 #53
Update hadoop to 3.2.0 #53
Conversation
@highker Could you please review this PR for me? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate what features are currently broken, because of the old Hadoop version ?
@@ -402,7 +448,7 @@ | |||
<plugin> | |||
<groupId>org.apache.maven.plugins</groupId> | |||
<artifactId>maven-shade-plugin</artifactId> | |||
<version>2.4.2</version> | |||
<version>3.2.1</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the old shaded jar size versus the new size ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to add support for Hadoop 3 to presto so presto works when environment has Hadoop 3.
Currently, queries with encryption flag set to true fails (example is updated in description) when it runs on Hadoop 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert on Hadoop, so will defer this to Hadoop experts.
@@ -426,10 +472,18 @@ | |||
<pattern>org.apache.http</pattern> | |||
<shadedPattern>${shadeBase}.org.apache.http</shadedPattern> | |||
</relocation> | |||
<relocation> | |||
<pattern>mozilla</pattern> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these new dependencies from hadoop upgrade ? Seems like hadoop3 has lot of new dependencies.
@arunthirupathi @imjalpreet, Can you please help me review this PR. I have updated it with more information since last change. |
I work for Meta/Facebook. We don't use the hadoop at all, so not sure, if this would break anything. @imjalpreet do you have concerns ? My concerns are around the changed dependency and how it impacts other libraries already out there. |
@arunthirupathi @rajatrj20 Thanks for the PR. I will try to review this early next week and let you know. |
@rajatrj20 Apologies for such a long delay. I will complete the review this week. But before I begin, I have a small clarification to make. Since we are taking reference from a Trino PR trinodb/trino-hadoop-apache#3, is there any reason behind picking only partial changes? IMO we should cherry pick the entire commit. |
@imjalpreet I decided to leave classes that will not be used in PrestoDB from the original commit. Also, there are some differences in excluded dependencies between the trino commit and this change, hence the original commit was not cherry-picked as it is. |
@rajatrj20 oh okay, I will take a look. Also, I see there is already a tag created for 3.2.0-1: https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.0-1 which has all the required changes. @aweisberg Do you have some context on this tag and can it be used for the upgrade? |
You should be able to use it. I think it was already released to maven. I don’t recall why I didn’t complete the upgrade. |
@rajatrj20 I think it should be good as well, what do you think about this tag? It is already released to maven as well (https://mvnrepository.com/artifact/com.facebook.presto.hadoop/hadoop-apache2/3.2.0-1)
@aweisberg we should think about this too. Do we want to rename the repository or create a new for hadoop3? |
@imjalpreet Is it possible now to build presto with presto-hadoop-apache 3 version?? |
@imjalpreet What are you suggesting here - Do you want me to rebase my changes on top of 3.2.0-1 tag? Also, I am not sure if we can release same tag again to maven central please confirm if it can be done. |
We can't release the same tag again. You do want to be on the 3.2.x branch. Is this adding anything or have we already released an artifact for this and we just need to update Presto itself? |
I am not updated with 3.2.x branch, @aweisberg is the current branch 3.2.x a complete update for hadoop 3? In that case I can rebase my PR to 3.2.x branch. Also what do you think about creating new repository for hadoop 3 shaded version? |
My preference is to do what Trino did and rename the repository to hadoop-apache and stop including the version. Realistically only the version in use in Presto matters. If we had finished the update to hadoop 3 then master would have become the 3.2.x branch since we don't need to release hadoop 2 versions once Presto upgrades. We should make a new maven artifact that is just |
Yes, renaming repository to If you know what needs to be done to rename this repository and if I can help with it please let me know. |
c01941b
to
21db71a
Compare
|
21db71a
to
c01941b
Compare
I can rename it, but the bigger issue is getting it merged into Presto and having a working Presto PR to pair with it. I had a PR for this prestodb/presto#13311 but I can't recall why I dropped it. Looks like a hive dependency upgrade is also mixed into that PR. |
Closing this PR as it is now failing EasyCLA check. Opened PR against branch |
When presto runs on Hadoop 3 with
dfs.encrypt.data.transfer=true
, queries fails with error:and on HDFS side following error is thrown:
This happens because presto still uses hadoop 2 libraries which results in failure when running on hadoop 3 system. This PR is to update support of hadoop 3 to presto engine by upgrading hadoop to version 3.2.0. This changes are not tested on hadoop 2 environment after updating it to hadoop 3 and if visibility is needed I can update that.
To add some historical context on how PrestoDB moved from hadoop-apache1 to hadoop-apache2, Presto was using hadoop-apache1 and then plugin for hadoop-apache2 was added as well. After sometime plugin for hadoop-apache1 was removed from the dependencies and it started using only hadoop-apache2.
Also, Trino has moved away from hadoop 2 and is now on hadoop 3. Reference: Update to Hadoop 3.2.0 #3
Ideally, we should also look at renaming this repository to presto-hadoop-apache to make it more generic or a new repository presto-hadoop-apache3 should be created for hadoop version 3 the way repo presto-hadoop-apache1 and presto-hadoop-apache2 already exist for version 1 and 2 respectively. Since, I am not sure which route needs to be taken, I didn't create or tried to rename the repository. Please let me know on the steps to be taken and I can follow those steps to do the further work required.