Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reflection for file system cache #16

Merged
merged 4 commits into from Jan 10, 2020
Merged

Conversation

@electrum
Copy link
Member

electrum commented Jan 9, 2020

No description provided.

@cla-bot cla-bot bot added the cla-signed label Jan 9, 2020
@SuppressWarnings("DeprecatedIsStillUsed")
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class FileSystem extends Configured

This comment has been minimized.

Copy link
@findepi

findepi Jan 9, 2020

Member

I'd rather not inline this class, unless we create real running fork of hadoop.

Any alternatives?

This comment has been minimized.

Copy link
@findepi

This comment has been minimized.

Copy link
@dain

dain Jan 9, 2020

Member

The direction of the JVM architects is pretty clear: final fields will no longer be modifiable. You can see the discussions in these locations:

If we assume this direction, there are not a lot of choices. The best would be to get Hadoop patched so we can swap the cache out. Short of that, we need a patch to the Hadoop code, and I personally feel this is the cleanest technique. I say that because this library is rarely updated, and when it is updated, it is done (and must be done) by someone that understands the impact of the upgrade.

This comment has been minimized.

Copy link
@electrum

electrum Jan 9, 2020

Author Member

I looked at modifying the bytecode during the shade process, but that has complications from a build process perspective, and also means the source jar will be wrong. This seems like the most straightforward solution.

We already do this with other classes. It means we need to be diligent about applying upstream changes when updating Hadoop versions (which fortunately happens rarely and is a very involved process anyway).

I’ll also see about submitting a patch to Hadoop to make this pluggable.

This comment has been minimized.

Copy link
@findepi

findepi Jan 9, 2020

Member

I’ll also see about submitting a patch to Hadoop to make this pluggable.

very good idea.

The direction of the JVM architects is pretty clear: final fields will no longer be modifiable.

i'm aware. But we still have a choice -- accommodate to this as early or as late as possible.

Besides, the advice in https://bugs.openjdk.java.net/browse/JDK-8217225 and link to powermock conflicts with my experience.
I used powermock in the past and I don't want to go there back.

None of you replies directly to my question, so let me repeat it -- is using Unsafe as way to get away (for the time being)?

This comment has been minimized.

Copy link
@electrum

electrum Jan 10, 2020

Author Member

Based on the links above, it seems that using unsafe to set the static final field will work with current JVMs. However, it's also clear from the comment by Alan Bateman (one of the JDK engineers who worked on modules and modularizing the JDK) that this should not be done:

The change in JDK 12 is intentional to improve the robustness and security of the platform. In general, there is no supported way to change static final fields. Mocking or other testing tools that want to change static final fields should explore dropping the final field at class load time.

The motivation for this change is that Presto is broken on JDK 12+ due to the existing reflection code. We weren't aware that it was broken until a user reported it. The approach in this PR is future proof. I'd rather not switch from one unsupported hack to another that can break at any point in the future.

This comment has been minimized.

Copy link
@findepi

findepi Jan 12, 2020

Member

Another option instead of inlining FileSystem.java would be to replace org.apache.hadoop.fs.FileSystem.Cache during repackaging, with our own class.

This comment has been minimized.

Copy link
@electrum

electrum Jan 13, 2020

Author Member

I thought about that, but I didn’t think it works because it’s a nested class, so the outer FileSystem class would use ours during compilation and tests (none of the needed methods or behavior would exist).

It should work if we could give it the correct name directly, without overriding the outer class. Maybe it can be named with a $ in the source file name.

This comment has been minimized.

Copy link
@findepi

findepi Jan 13, 2020

Member

so the outer FileSystem class would use ours during compilation and tests (none of the needed methods or behavior would exist).

For our compilation -- it doesn't matter, because FileSystem is already compiled.
For our tests -- it's even better, because we test what we ship. (Your impl'd approach also does this, good.)

Maybe it can be named with a $ in the source file name.

Yes, this is what i meant.
Java used to have no problem with that.

pom.xml Show resolved Hide resolved
@electrum electrum force-pushed the electrum:fscache branch from b6b2c4c to ef8764f Jan 9, 2020
@electrum electrum requested review from dain and findepi Jan 9, 2020
@dain
dain approved these changes Jan 9, 2020
@findepi
findepi approved these changes Jan 9, 2020
Copy link
Member

findepi left a comment

LGTM % ongoing discussion (#16 (comment))
(i'm fine whatever the conclusion will be)

@electrum electrum merged commit 9726883 into prestosql:master Jan 10, 2020
7 checks passed
7 checks passed
build (ubuntu-latest, 8)
Details
build (ubuntu-latest, 11)
Details
build (ubuntu-latest, 13)
Details
build (macos-latest, 8)
Details
build (macos-latest, 11)
Details
build (macos-latest, 13)
Details
verification/cla-signed
Details
@electrum electrum deleted the electrum:fscache branch Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.