Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Improve constant replacement by using android sdk libs #2119

Open
nitram84 opened this issue Mar 9, 2024 · 11 comments
Open

[feature] Improve constant replacement by using android sdk libs #2119

nitram84 opened this issue Mar 9, 2024 · 11 comments

Comments

@nitram84
Copy link
Contributor

nitram84 commented Mar 9, 2024

Describe your idea

In some apps I recompiled recently, I had compilation errors because attributes of custom views in values/attrs.xml were not defined as "declare-styleable". So I looked into the <app package>.R.styleable.* definitions. For each view the is an array of attributes in R.styleable. I found that not all literals in R.styleable array were resolved as constants. All those literals could be replaced to constants by using jadx-core/src/main/resources/android/res-map.txt or android.R.* of android sdk.

Examples for apps with unresolved R.styleable literals:

https://candy-bubble-shooter.apk.gold/
https://apkpure.com/rest-api-client/in.pounkumar.restclient/download/1.1.2

Here is my feature request:

Would it be possible to add all public constant definitions of android sdk libs to jadx-core/src/main/resources/clst/core.jcst and use the constants in ConstStorage? Resolving android internal constants would make obfuscated code a bit more readable, lots of android linter warnings would be fixed (I'm thinking of android linter warnings due to unresolved constants of android.content.Context .) and it might be possible to recover declare-styleable attributes *).

If constant information could be loaded from jadx-core/src/main/resources/clst/core.jcst the file jadx-core/src/main/resources/android/res-map.txt wouldn't be needed anymore because this file is redundant to the constants of android.R.*.

This issue is a duplicate to #244.

*) Recovering declare-styleable attributes is a known issue for jadx and apktool:

#247
#244
iBotPeaches/Apktool#775
iBotPeaches/Apktool#1217
iBotPeaches/Apktool#1910
iBotPeaches/Apktool#224

For Apktool this issue is considered as unsolvable. But I think, there is a way for jadx to recover "declare-styleable" attributes when all values in R.styleable.* arrays are resolved to constants.

@skylot
Copy link
Owner

skylot commented Mar 17, 2024

@nitram84 thanks for the suggestion, but such change have a lot of things to consider, so I need some time for deeper investigation.
Here are some considerations for now:

  • Adding all constant fields from android.jar may be very inefficient because a lot of constants have same value and we can't use them to replace literals.
  • Adding constants only from android style classes to core.jcst maybe confusing, and I prefer more common approach
  • Two previous points were the reason why I introduce res-map.txt. It is unique const map for only android resources.
  • Also res-map.txt contains ids from all android versions (actually it is up to 30 API, so we may need to update it). This allows to restore names of resources not supported or deprecated in latest android.

@skylot skylot added this to the TBD milestone Mar 17, 2024
@nitram84
Copy link
Contributor Author

nitram84 commented Mar 23, 2024

Hi @skylot

Thank you for your answer! Before you start investigating this issue, I should provide more details about android sdk constants and the related "R.styleable"-issue.

Let's take my first sample app: https://candy-bubble-shooter.apk.gold/

In com.androbros.candybubbleshooter.R.styleable we have this array definition:

public static final int[] ActionMenuItemView = {16843071};

What is 16843071? The array is defined here (or better aapt2 generates it from here): https://android.googlesource.com/platform/frameworks/base/+/refs/heads/main/core/res/res/values/attrs.xml#4409

minWidth is documented here https://developer.android.com/reference/android/R.attr#minWidth : Constant Value: 16843071 (0x0101013f)

And indeed 0x0101013f can be found here https://github.com/skylot/jadx/blob/master/jadx-core/src/main/resources/android/res-map.txt#L320

(You might have an idea how I would restore <declare-styleable> in attrs.xml . This is off-topic. If it is doable I would open a specific issue for that.)

Currently res-map.txt is only used to decode resources but not for ConstStorage to replace constants in decompiled java code. So updating res-map.txt to api 34 wouldn't help: minWidth is api 1.

At the moment it's not easy to inject new constant definitions to ConstStorage since it works with FieldNodes that are referencing class representations of the current apk. So I thought class definitions loaded from jadx-core/src/main/resources/clst/core.jcst might be adapted for this to inject sdk constants. But you are right, there are better solutions possible.

The next questions is "About how many unique constants are we talking"? 8418 constants for android-34, counted quick and dirty with:

// Use dependency org.ow2.asm:asm:9.6 and pass path to android.jar as single argument, dump all unique constants as csv to stdout
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashSet;
import java.util.HashMap;
import java.util.Set;
import java.util.Map;

import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import org.objectweb.asm.ClassReader;
import org.objectweb.asm.ClassVisitor;
import org.objectweb.asm.FieldVisitor;
import org.objectweb.asm.Opcodes;

public class AndroidConstDump {
  
  protected static Set<String> constValues = new HashSet<>();
  protected static Map<String, String> constants = new HashMap<>();
  
  public static final void main(String[] args) {
    if (args.length != 1) {
      System.exit(1);
    }
    String pathToAndroidJar = args[0];
    try (ZipInputStream zis = new ZipInputStream(new FileInputStream(pathToAndroidJar))) {
      ZipEntry zipEntry = zis.getNextEntry();
      while (zipEntry != null) {
        if (!zipEntry.isDirectory()) {
          String className = zipEntry.getName();
          if (className.endsWith(".class")) {
            processClass(zis, className);
          }
        }
        zipEntry = zis.getNextEntry();
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
    
    for(String constRecord: constants.values()) {
      System.out.println(constRecord);
    }
  }

  private static void processClass(InputStream in, final String className) {
    try {
      ClassReader cr = new ClassReader(in);
      cr.accept(new ClassVisitor(Opcodes.ASM9) {
  
        @Override
        public FieldVisitor visitField(int access, String name, String descriptor, String signature, Object value) {
          if(value != null && access == 25) {
            String constValue = value.toString();
            String constRecord = className.replace('/', '.').replace('$', '.').replace(".class", ".")  + name + ";" + descriptor + ";" + value.toString();
            if (constants.get(constValue) != null) {
              constants.remove(constValue);
            }
            if (!constValues.contains(constValue)) {
              constants.put(constValue, constRecord);
            }
            constValues.add(constValue);
          }
          return super.visitField(access, name, descriptor, signature, value);
        }
      }, 0);
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

You added the resources flag, but in my opinion this issue is not about resources: Resource constants are included in the list of unique constants, but the missing constants are not restricted to resources. If needed I can provide a sample with a non-replaced constant of android.content.Context which is not related to resources.

For the moment I would ignore constants of "org.apache.http.legacy.jar" or "android.car.jar".

I have a last sample to show that this issue is not restricted to R files and to show a special case how (some) non-unique constants could be replaced without false-positives:

https://mahjong-solitaire-free2.apk.gold/

Class com.freekidsgames.solitaire.mahjongsoliatirefree.MahjonggData:showInfo(Context):

Using the unique constants
dialog.setPositiveButton(17039370, (DialogInterface.OnClickListener) null);
could be transformed to
dialog.setPositiveButton(android.R.string.ok, (DialogInterface.OnClickListener) null);

But what is the constant in view.findViewById(R.id....).setVisibility(8); ? Android linter/Android Studio says it must be android.view.View.GONE. Linter rules for constants are defined by annotations using @ StringDef, @ IntDef, @ LogDef annotations.

Example:

https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/view/View.java#1105 -> https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/view/View.java#13018
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/main/core/java/android/content/Context.java#4081 -> https://android.googlesource.com/platform/frameworks/base/+/refs/heads/main/core/java/android/content/Context.java#4393

If those rules could be extracted, I could imagine an AndroidLinterPass to replace those constants. Linter rules would be related to "core.jcst" because rules are bound to method signatures loaded from there. In the past linter rules helped me a bit to understand obfuscated code. If I find a way to export the linter rules I would open a new issue.

Edit: typo, more formatting

@skylot
Copy link
Owner

skylot commented Mar 25, 2024

@nitram84 thank you for details! It becomes much easier to understand the issue. Mostly because I am not very familiar with Android resources processing/usage (relevant jadx code was added by contributors). Also, I add resources tag just because you mention it 🤣

The next questions is "About how many unique constants are we talking"? 8418 constants for android-34

Actually I was more conserned about constant fields with duplicated values, such fields will "taking space" in "core.jcst" but can't be used for constant replacement. So I made a simple script to count such duplications:

import jadx.core.dex.info.ConstStorage

val jadx = getJadxInstance()
jadx.args.isReplaceConsts = false

val constMap = mutableMapOf<Any, Int>()

jadx.stages.prepare { root ->
	for (cls in root.classes) {
		for (fld in cls.fields) {
			ConstStorage.getFieldConstValue(fld)?.let { constVal ->
				constMap.merge(constVal, 1, Int::plus)
			}
		}
	}
}

jadx.afterLoad {
	for (entry in constMap.entries.sortedBy { it.value }) {
		log.debug { "value: ${entry.key} - found in ${entry.value} fields" }
	}
	val total = constMap.values.sum()
	val unique = constMap.values.count { it == 1 }
	val duplicate = constMap.values.filter { it > 1 }.sum()
	log.info { "Constant fields: total=$total, unique=$unique, duplicate=$duplicate" }
}

Running it with android.jar in jadx output this (start omitted):

...
DEBUG: value: 4 - found in 685 fields
DEBUG: value: 3 - found in 701 fields
DEBUG: value: 0 - found in 1101 fields
DEBUG: value: 2 - found in 1230 fields
DEBUG: value: 1 - found in 1483 fields
INFO : Constant fields: total=23994, unique=8816, duplicate=15178

As you can see, duplication rate is very high. This mostly caused by fields which used for enumeration or flags, like in mentioned View class:

    public static final int VISIBLE = 0x00000000;
    public static final int INVISIBLE = 0x00000004;
    public static final int GONE = 0x00000008;

    @IntDef({VISIBLE, INVISIBLE, GONE})
    @Retention(RetentionPolicy.SOURCE)
    public @interface Visibility {}

So, current options to resolve this issue are:

  • use core.jcst but skip duplicated fields 🙂
  • add constants from "res-map.txt" into ConstStorage with fake field info (this is not hard to fix)

About android flags like in view.findViewById(R.id....).setVisibility(8);:

I put Visibility annotation above to show that retention police set to source, which means it will not appear in compiled code and android.jar. And we can't grab it from there and put into core.jcst.
So I don't know how we can collect info to match values to names/fields. I will try to look for linter rules or code, but this will take some time 🙁

Small Note:

This issue can be slightly related to #2134 because it may need to change a way to store all possible additional information for other classes without involving these classes in decompilation.

@nitram84
Copy link
Contributor Author

Thank you for your deeper analysis! With your information I would now argue for using fake field info to add new constants to ConstStorage:

  • a lot of constants with unique values that would be added to core.jcst are resource constants (android.R.*). So we would have redundant information in core.jcst and in res-map.txt
  • I'm sure using the fake field info approach would make it easy to load additional text files with useful constants (eventually including constants for an android linter pass). This would be more modular.
  • constants using fake fields could also be added by plugins or a script. This would allow a simple code transformation to introduce synthetic constants.

Perhaps it's better to apply sdk constants before constants defined in the app itself. This could reduce false positives.

For a possible android linter pass:

RetentionPolicy.SOURCE was a good point. But I think you don't need to look into the linter code. All annotations were already exported in the file platforms/android-XX/data/annotations.zip . This file should contain all necessary information for a linter pass.

It is also possible to add custom IntDef or StringDef rules in libraries. If used in a library, those rules are exported in a file annotations.zip as part of the AAR (see https://developer.android.com/studio/write/annotations#adding-library). I could imagine a jadx plugin, that tries to detect library dependencies, loads the AARs with annotations.zip and applies linter rules for constants. This should work even it's not possible to detect the exact version of a library.

@skylot
Copy link
Owner

skylot commented Mar 31, 2024

Ok. I commit changes to put resources fields (from res-map.txt) into const storage. Please check 🙂

All annotations were already exported in the file platforms/android-XX/data/annotations.zip

Oh, I didn't know that, thanks!
Indeed, we can use that to restore constant names, but it will require adding info about all const fields from android.jar 🤣

@nitram84
Copy link
Contributor Author

nitram84 commented Apr 2, 2024

Thank you very much for your work. All constants in R.styleable arrays are now resolved, so I can continue investigating the declare-styleable problem.

I'll try to write and provide a script that identifies more useful android constants (non resource constants, unique value) that could be added in the same way.

For a android linter pass I first have to parse and analyze the annotation xml files. I don't think we all const field from android.jar, but be need a collection of all int/log/string-def rule. I have to check how many rules and constants we have and find a way to merge them to one rules file. In my opinion one file per library instead of one file per package would be easier to work with.

But it may take some time until I can provide more details.

@nitram84
Copy link
Contributor Author

nitram84 commented Jun 6, 2024

Hi @skylot,

Indeed, we can use that to restore constant names, but it will require adding info about all const fields from android.jar 🤣

I did that and I was able to implement a linter pass as a jadx-plugin: https://github.com/nitram84/jadx/tree/android-linter-plugin. And even more (See https://github.com/nitram84/jadx/blob/android-linter-plugin/jadx-plugins/jadx-android-linter/README.md):

  • I scraped the whole Google Maven repository and exported all linter rules of each library
  • The plugin supports constant unfolding to restore the original expression for a value before the compiler optimized it away. I don't think there is any other decompiler out there which supports constant unfolding.
  • Having linter rules of third-party libraries, the plugin is able to detect dependencies.

Before I open a pull request, I would like to ask if this is something potentially mergeable or should this plugin become standalone?

@skylot
Copy link
Owner

skylot commented Jun 6, 2024

@nitram84 great work 👍

or should this plugin become standalone?

Actually, this is up to you to decide 🙂
There are many pros and cons, so I am not sure myself which approach is better.
Anyway, you already did a great job by putting this in a separate module. Thank you! 🎉

@nitram84
Copy link
Contributor Author

nitram84 commented Jun 9, 2024

Thank you too. Your changes for this issue helped be to implement this.
I think the plugin part with the sdk rules should be usable for everyone without drawbacks. If the plugin is part of Jadx, I think most people would enable it or keep it enabled. So if the plugin is considered as useful, it I think it would be nice,if it could be part of Jadx. My concern are the third party libraries and the database. I don't want to pollute your project with database updates.
The best option is probably to separate the plugin and the database into different repositories. Maybe the database repo could perhaps be a subrepo owned by https://github.com/jadx-decompiler. What is your opinion?

And perhaps using third party rules should be optional and configurable.

@skylot
Copy link
Owner

skylot commented Jun 9, 2024

If the plugin is part of Jadx

Well, "part of jadx" is a vague term, it can mean several things:

  1. module inside jadx repo
  2. plugin bundled with jadx, but located in separate repo
  3. plugin located in jadx-decompiler group
  4. both 2 and 3 🤣

Actually, I am still thinking about moving jadx repo into jadx-decompiler group, and also splitting modules into separate repos. So 4th option can be only one used in the future.

separate the plugin and the database into different repositories

These parts look tightly coupled, not sure if splitting is really necessary. But sure if you plan to update database very often it is better to make it downloadable and auto update on changes. And, please consider to support option for manual update to minimize network access because it can be long and very faulty (and better to avoid at all in jadx-cli).

Maybe the database repo could perhaps be a subrepo owned by https://github.com/jadx-decompiler

Sure, this is possible. And, if you want, we can put your plugin repo also into this group.

@nitram84
Copy link
Contributor Author

Option 4 sounds good. This way the plugin may help to test a modularized version of jadx.

I'm not sure how frequent there will be rule updates. Rule updates may be caused by a new Android api level, api changes to third party libs (means breaking changes of libraries) and new libraries (either scraped or suggested by a user). Most updates may come from new rules of maven central. I will add some rules from maven central, but I don't think scraping is an option. So I don't expect very frequent updates to the database. Working with an older state of the database my produce incomplete results but should not produce incorrect results. With this assumptions I would always bundle a static database with the plugin to make the plugin usable in offline mode. This way there is also no need to separate database and plugin into different repos anymore.

For client side updates incremental updates should be possible, if I use a different index layout (with timestamps and hashes for each file, a bit like npm). When the database is updated the first time, a copy of the internal rule db could be created. The copy will be updated and used. If a user deletes the updated database, the plugin would still be usable with the internal static database. A different index layout may solve some more problems (there are libs without having own constants -> no empty constant file needed and there may be libs with "legacy rules" that only exists in earlier library versions) and would allow recoverable updates with low bandwidth usage. With an updatable copy of the database I think it should also be possible for users to add their own custom rules in a special directory.

I will work on better updatable database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants