Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to decompile the executable and see the original code? #4003

Closed
bhargavmodi opened this issue Nov 11, 2021 · 12 comments
Closed
Assignees

Comments

@bhargavmodi
Copy link

Describe the issue
One can easily generate the executable using GraalVM. Now, it is also claimed that the original code is getting obfuscated during the executable generation.

So, let's say if the executable is shared through public platform, then in this case is it possible for hacker to decompile the executable and see the original code?

If that is possible, then hacker can modify the original code and again generate the executable to share on public forum.

@adinn
Copy link
Collaborator

adinn commented Nov 11, 2021

In brief: No. This is almost always impossible and, even where it might be possible, is almost always impractical. The remaning cases are trivial programs.

The first thing to note is that it is no easier to do this for GraalVM than for any other compiled binary such as, say, a compiled C++ executable. If you are worried about this threat with a GraalVM native binary then you ought to be just as worried about almost every other software deployment.

The reality is that a very small fraction of a percentage of programmers might be capable of reconstructing some parts of the original source code from a binary. However, doing so would be incredibly laborious and time consuming. For some (probably most) of the binary code it would almost certainly fail. It would take an enormous amount of time to reconstruct a whole program, quite possibly years for a moderately complex application and it is highly improbable that there wold be enough information to reconstruct all the source. Code is not just highly obfuscated by production compilation. A compiler also drops a lot of information provided in the source code that is not needed in the compiled code.

You are really worrying about the wrong type of threat here. Anyone wanting to tamper with a binary would not need to recreate the whole program from source. There are fra simpler ways to patch/rebuild a binary and introduce different, possibly insecure behaviour. The sort of spoofing attack you are talking about is normally dealt with by securing the public server on which the binary is published and publishing a checksum for it which can be used to check that downloaded code has not been tampered with.

@rodrigar-mx rodrigar-mx removed the bug label Nov 12, 2021
@rodrigar-mx rodrigar-mx self-assigned this Nov 12, 2021
@bhargavmodi
Copy link
Author

In brief: No. This is almost always impossible and, even where it might be possible, is almost always impractical. The remaning cases are trivial programs.

The first thing to note is that it is no easier to do this for GraalVM than for any other compiled binary such as, say, a compiled C++ executable. If you are worried about this threat with a GraalVM native binary then you ought to be just as worried about almost every other software deployment.

The reality is that a very small fraction of a percentage of programmers might be capable of reconstructing some parts of the original source code from a binary. However, doing so would be incredibly laborious and time consuming. For some (probably most) of the binary code it would almost certainly fail. It would take an enormous amount of time to reconstruct a whole program, quite possibly years for a moderately complex application and it is highly improbable that there wold be enough information to reconstruct all the source. Code is not just highly obfuscated by production compilation. A compiler also drops a lot of information provided in the source code that is not needed in the compiled code.

You are really worrying about the wrong type of threat here. Anyone wanting to tamper with a binary would not need to recreate the whole program from source. There are fra simpler ways to patch/rebuild a binary and introduce different, possibly insecure behaviour. The sort of spoofing attack you are talking about is normally dealt with by securing the public server on which the binary is published and publishing a checksum for it which can be used to check that downloaded code has not been tampered with.

Thank you so much @adinn for the detailed comment. I think it resolves many doubts.

But there is one point, I need your help to clarify. Generally, java application generates jar file. Now, this jar can be easily decompiled. So, if we apply the tool such as proguard to obfuscate and generate the jar - this makes it hard for a normal developer to convert into original code, but a hacker can easily do it.

But I see that GraalVM’s native image feature converts a Java application into a native binary. The Java byte code compiles into native code ahead-of-time (AOT). And I think converting back from native code is next to impossible.

@adinn
Copy link
Collaborator

adinn commented Nov 15, 2021

Generally, java application generates jar file. ...

In general, a java application does not generate a jar file. What I think you are trying to express is that 1) java applications are compiled (by the javac program) to an intermediate format called class files and 2) these application class files are normally deployed as jar files (along with the JVM i.e. the java program).

Note that it is the method bytecode in these class files that the JVM's JIT compiler translates to machine code at runtime. However, it is exactly the same class bytecode that GraalVM takes as input and uses to generate a native image executable ahead of time. GraalVM does not take Java source code as input.

Now, it is true that it is much easier to reverse engineer Java class files to reconstruct the original Java application source than it is to reverse engineer the executable generated by GraalVM or even the JITted code the JVM generates at runtime. However, this is not exactly remarkable. In particular, it does not imply that deploying an application as a suite of jars is less secure than deploying it as a GraalVM native image. Jars are usually secured either by signing them when they are built or by installing them in locations which restrict updates to legitimate users. Knowing what is in the jars does not make it particularly easy to find security holes in the product but it certainly does make it a lot easier to have a vast eco-system of Java libraries and toolkits.

I'm not really sure what your point is in asking these questions. You seem to have a concern that access to the original source code for an application (or alternatively, the class files) is somehow a security threat. That is patently mistaken. Open source projects give full details of source code to anyone who wants them, along with recipes for how to build the project deliverables, without thereby being any less secure than similar closed source projects. Indeed, in many cases open source projects end up being more secure than closed ones because many, many more developers are able to read, test and debug the code, spot and report security problems and identify ways to resolve them. There are countless examples of critical open source software projects which prove this point, Linux and OpenJDK being two of the most highly notable examples.

@bhargavmodi
Copy link
Author

Again @adinn Thank you so much for covering all the points very nicely.
I can understand there's no shame on showing up your code till the time you're following the best practices. And open sourcing is the great way to make the codebase mature.

Even though there's one point in code obfuscation is that you dont only hide your code but you can stop others to copy your code. I know you can bundle up the license with your codebase, but be realistic, this does not work in some countries or its a nightmare.

@bhargavmodi
Copy link
Author

Generally, java application generates jar file. ...

In general, a java application does not generate a jar file. What I think you are trying to express is that 1) java applications are compiled (by the javac program) to an intermediate format called class files and 2) these application class files are normally deployed as jar files (along with the JVM i.e. the java program).

Note that it is the method bytecode in these class files that the JVM's JIT compiler translates to machine code at runtime. However, it is exactly the same class bytecode that GraalVM takes as input and uses to generate a native image executable ahead of time. GraalVM does not take Java source code as input.

Now, it is true that it is much easier to reverse engineer Java class files to reconstruct the original Java application source than it is to reverse engineer the executable generated by GraalVM or even the JITted code the JVM generates at runtime. However, this is not exactly remarkable. In particular, it does not imply that deploying an application as a suite of jars is less secure than deploying it as a GraalVM native image. Jars are usually secured either by signing them when they are built or by installing them in locations which restrict updates to legitimate users. Knowing what is in the jars does not make it particularly easy to find security holes in the product but it certainly does make it a lot easier to have a vast eco-system of Java libraries and toolkits.

I'm not really sure what your point is in asking these questions. You seem to have a concern that access to the original source code for an application (or alternatively, the class files) is somehow a security threat. That is patently mistaken. Open source projects give full details of source code to anyone who wants them, along with recipes for how to build the project deliverables, without thereby being any less secure than similar closed source projects. Indeed, in many cases open source projects end up being more secure than closed ones because many, many more developers are able to read, test and debug the code, spot and report security problems and identify ways to resolve them. There are countless examples of critical open source software projects which prove this point, Linux and OpenJDK being two of the most highly notable examples.

Hi @adinn

I've some follow up questions. Thank you so much in advance.

  • GraalVM generates executable Ahead of Time (AOT). Could you please suggest a way to distribute the native-image because in advance we're unaware about the machines in which it is going to be executing?
  • Is there a specific treatment required to run the native-image on AWS, AKS ?

@adinn
Copy link
Collaborator

adinn commented Nov 18, 2021

@bhargavmodi Those are interesting and pertinent questions. However, I'm afraid you will have to see if someone else, perhaps someone from Oracle's team, can provide answers. I am merely a (non-Oracle) contributor to the GraalVM project and am not normally involved in deploying applications.

@bhargavmodi
Copy link
Author

Ok, Thank you much @adinn on sharing the knowledge and clearing doubts.

@rodrigar-mx
Copy link
Contributor

@bhargavmodi for distributing the native-image either you build it within a container or build an image for each architecture. As far as I know there are no extra requirements to build and run native image on AWS. Depending on the OS you may need to install some libraries (see https://www.graalvm.org/reference-manual/native-image/#prerequisites). I found also this interesting article for AWS and graalvm.

@bhargavmodi
Copy link
Author

Hi @rodrigar-mx
Thanks for the container based suggestions - it makes sense.

I've some subsequent questions, though it may sound silly, but need to confirm with you:

  • Cross-compilation is not supported. But let's say if native-image is generated in ubuntu, then will it work in other linux distributions such as Debian, Red Hat and so on
  • Will there be impact on OS upgrade. Let's say if native-image is generated in particular version of ubuntu, then will it work in future in future OS release? or do we need to again generate native-image to support future OS?

Thanks in advance!

@rodrigar-mx
Copy link
Contributor

Hi @bhargavmodi. Yes it should generally work in different distros as long as they share the same architecture, but the common practice is to build the native image again. No, there should be no impact on OS upgrade.

@vlinx-io
Copy link

Hi @bhargavmodi , this project shows this possibility. https://medium.com/@vlinx/nativeimage-reverse-engineering-ba235db950ff

@NCLnclNCL
Copy link

CHÀO@bhargavmodi, dự án này cho thấy khả năng này. https://medium.com/@vlinx/nativeimage-reverse-engineering-ba235db950ff

Haha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants