Skip to content

Commit

Permalink
More exact usage of merging model for method deduping
Browse files Browse the repository at this point in the history
Summary:
Previously, DexLimitsChecker crashed due to mref usage exceeding dex limits, both after the mergeability-aware InterDexReshufflePass (MAReshuffle) and the IntraDexClassMergingPass (IDCM).

It is expected that the limit is exceeded after MAReshuffle because it accounts for the mref deduping that will happen in the subsequent IDCM.

The limit was exceeded after the IDCM because our approximation of mref deduping was inaccurate. During the process of debugging to identify the gap between our approximation in MAReshuffle and what actually happened in IDCM, we realized that the following contributed to at least 90% of the gap, closing which could bring down the maximum number of mrefs exceeding limit per dex from ~2100 to ~150:

- MAReshuffle does not account for the `excluded_types` used by IDCM
- MAReshuffle assumes that all ctors of mergeables belonging to the same merger in the same dex can be collapsed into a single new ctor. In practice, this is not the case as we also need to group ctors by protos.

To close the gap, we are going to proceed with the following changes:
1) Directly use the info about dedupable vmethods and intf_methods in merging_model and not consider dmethods for deduping. The former would make the model more accurate, and the latter would make the model more conservative so that the last ~150 mrefs exceeding dex limit is taken care of.
2) Make MAReshuffle a service that is called by IDCM, instead of its own pass. This way, the merging model configs shared by IDCM and MAReshuffle that are established in IDCM can easily be used by MAReshuffle, including but not limited to `exclude_types`.

This diff makes necessary changes for 1.

Reviewed By: beicy

Differential Revision: D56850663

fbshipit-source-id: f47ac73614763985ecd453e3ba36abf61e6951ce
  • Loading branch information
ShatianWang authored and facebook-github-bot committed May 2, 2024
1 parent 5ce3b2a commit c126ee1
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 59 deletions.
19 changes: 10 additions & 9 deletions libredex/DexStructure.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ using MethodRefs = std::unordered_set<DexMethodRef*>;
using FieldRefs = std::unordered_set<DexFieldRef*>;
using TypeRefs = std::unordered_set<DexType*>;
using MergerIndex = size_t;
using MethodGroup = size_t;

struct DexInfo {
bool primary{false};
Expand Down Expand Up @@ -225,30 +226,30 @@ also reject some legal cases.
}

void set_merging_type_method_usage(
std::unordered_map<MergerIndex, std::unordered_map<std::string, size_t>>&
std::unordered_map<MergerIndex, std::unordered_map<MethodGroup, size_t>>&
merging_type_method_usage) {
m_merging_type_method_usage = merging_type_method_usage;
}

size_t get_merging_type_method_usage(MergerIndex merging_type,
const std::string& name) const {
MethodGroup group) const {
auto it = m_merging_type_method_usage.find(merging_type);
if (it == m_merging_type_method_usage.end()) {
return 0;
}
auto it2 = it->second.find(name);
auto it2 = it->second.find(group);
return it2 == it->second.end() ? 0 : it2->second;
}

void increase_merging_type_method_usage(MergerIndex merging_type,
const std::string& name) {
m_merging_type_method_usage[merging_type][name]++;
MethodGroup group) {
m_merging_type_method_usage[merging_type][group]++;
}

void decrease_merging_type_method_usage(MergerIndex merging_type,
const std::string& name) {
always_assert(m_merging_type_method_usage[merging_type][name] > 0);
m_merging_type_method_usage[merging_type][name]--;
MethodGroup group) {
always_assert(m_merging_type_method_usage[merging_type][group] > 0);
m_merging_type_method_usage[merging_type][group]--;
}

void set_num_new_methods(size_t num_new_methods) {
Expand Down Expand Up @@ -288,7 +289,7 @@ also reject some legal cases.

// The following are used to track (hypothetical) class merging stats.
std::unordered_map<MergerIndex, size_t> m_merging_type_usage;
std::unordered_map<MergerIndex, std::unordered_map<std::string, size_t>>
std::unordered_map<MergerIndex, std::unordered_map<MethodGroup, size_t>>
m_merging_type_method_usage;
int m_num_new_methods;
int m_num_deduped_methods;
Expand Down
37 changes: 18 additions & 19 deletions opt/class-merging/ModelSpecGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -207,30 +207,29 @@ void find_all_mergeables_and_roots(const TypeSystem& type_system,
merging_spec->merging_targets.size(), merging_spec->roots.size());
}

class_merging::Model construct_global_model(DexClasses& scope,
PassManager& mgr,
ConfigFiles& conf,
DexStoresVector& stores) {
class_merging::ModelSpec m_merging_spec;
// The specs below should match those used by IntraDexClassMerging
m_merging_spec.use_stable_shape_names = true;
m_merging_spec.interdex_config.init_type("non-ordered-set");
m_merging_spec.interdex_config.init_inferring_mode("class-loads");
m_merging_spec.dedup_fill_in_stack_trace = false;
size_t global_min_count = 4;

// The specs below removes dex boundaries and max size of mergers.
m_merging_spec.per_dex_grouping = false;
m_merging_spec.strategy = class_merging::strategy::BY_CLASS_COUNT;
m_merging_spec.min_count = 2;
m_merging_spec.max_count = std::numeric_limits<size_t>::max();
class_merging::Model construct_global_model(
DexClasses& scope,
PassManager& mgr,
ConfigFiles& conf,
DexStoresVector& stores,
const class_merging::ModelSpec& merging_spec,
size_t global_min_count) {
// Copy merging_spec to avoid changing the original one.
class_merging::ModelSpec global_model_merging_spec = merging_spec;
// The global_model_merging_spec share everything with the input merging_spec
// expect for the following, which removes dex boundaries and max size of
// mergers to create a global model.
global_model_merging_spec.per_dex_grouping = false;
global_model_merging_spec.strategy = class_merging::strategy::BY_CLASS_COUNT;
global_model_merging_spec.min_count = 2;
global_model_merging_spec.max_count = std::numeric_limits<size_t>::max();

TypeSystem type_system(scope);
find_all_mergeables_and_roots(type_system, scope,
/*global_min_count=*/global_min_count, mgr,
&m_merging_spec);
&global_model_merging_spec);
return class_merging::construct_model(type_system, scope, conf, mgr, stores,
m_merging_spec);
global_model_merging_spec);
};

} // namespace class_merging
11 changes: 7 additions & 4 deletions opt/class-merging/ModelSpecGenerator.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,12 @@ void find_all_mergeables_and_roots(const TypeSystem& type_system,
* IntraDexClassMerging. As a result, certain merging specs used are
* set to match those used by the IntraDexClassMerging pass.
*/
class_merging::Model construct_global_model(DexClasses& scope,
PassManager& mgr,
ConfigFiles& conf,
DexStoresVector& stores);
class_merging::Model construct_global_model(
DexClasses& scope,
PassManager& mgr,
ConfigFiles& conf,
DexStoresVector& stores,
const class_merging::ModelSpec& merging_spec,
size_t global_min_count);

} // namespace class_merging
54 changes: 34 additions & 20 deletions opt/interdex/InterDexReshuffleImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ InterDexReshuffleImpl::InterDexReshuffleImpl(
});

// Initialize m_class_to_merging_info and m_num_field_defs
if (!m_merging_model) {
if (m_merging_model == boost::none) {
return;
}

Expand All @@ -161,20 +161,35 @@ InterDexReshuffleImpl::InterDexReshuffleImpl(
return;
}
for (const DexType* mergeable : merger.mergeables) {
auto cls = type_class(mergeable);
const auto cls = type_class(mergeable);
if (cls != nullptr) {
m_class_to_merging_info.emplace(cls, MergingInfo());
auto& merging_info = m_class_to_merging_info.at(cls);
merging_info.merging_type = num_merging_types;
auto& dedupable_mrefs = merging_info.dedupable_mrefs;
for (const auto& method : cls->get_vmethods()) {
dedupable_mrefs.insert(method);
}
}
MethodGroup group = 0;
if (!merger.vmethods.empty()) {
for (const auto& vmeths : merger.vmethods) {
for (const auto* meth : vmeths.overrides) {
const auto meth_cls = type_class(meth->get_class());
auto& merging_info = m_class_to_merging_info.at(meth_cls);
merging_info.dedupable_mrefs[meth] = group;
}
for (const auto& method : cls->get_ctors()) {
dedupable_mrefs.insert(method);
group++;
}
}
if (!merger.intfs_methods.empty()) {
for (const auto& intf_meths : merger.intfs_methods) {
for (const auto* meth : intf_meths.methods) {
const auto meth_cls = type_class(meth->get_class());
auto& merging_info = m_class_to_merging_info.at(meth_cls);
merging_info.dedupable_mrefs[meth] = group;
}
group++;
}
}

m_num_field_defs.emplace(num_merging_types, merger.shape.field_count());
num_merging_types++;
});
Expand All @@ -183,7 +198,7 @@ InterDexReshuffleImpl::InterDexReshuffleImpl(
for (size_t dex_idx = m_first_dex_index; dex_idx < dexen.size(); dex_idx++) {
auto& dex = dexen.at(dex_idx);
auto& mutable_dex = m_mutable_dexen.at(dex_idx);
std::unordered_map<MergerIndex, std::unordered_map<std::string, size_t>>
std::unordered_map<MergerIndex, std::unordered_map<MethodGroup, size_t>>
merging_type_method_usage;
std::unordered_map<MergerIndex, size_t> merging_type_usage;
int num_new_methods = 0;
Expand All @@ -194,9 +209,8 @@ InterDexReshuffleImpl::InterDexReshuffleImpl(
MergerIndex merging_type = merging_info.merging_type;
merging_type_usage[merging_type]++;
for (const auto& method : merging_info.dedupable_mrefs) {
const std::string& method_name =
method->get_simple_deobfuscated_name();
merging_type_method_usage[merging_type][method_name]++;
MethodGroup group = method.second;
merging_type_method_usage[merging_type][group]++;
num_deduped_methods++;
}
}
Expand Down Expand Up @@ -470,12 +484,12 @@ bool InterDexReshuffleImpl::try_plan_move(const Move& move,
merging_type = m_class_to_merging_info.at(move.cls).merging_type;
// Compute the number of method definitions in move.cls that can be deduped
// in target_dex.
std::unordered_set<DexMethod*> dedupable_mrefs =
const std::unordered_map<const DexMethod*, MethodGroup>& dedupable_mrefs =
m_class_to_merging_info.at(move.cls).dedupable_mrefs;
for (const auto& method : dedupable_mrefs) {
const std::string& method_name = method->get_simple_deobfuscated_name();
MethodGroup group = method.second;
const size_t old_usage =
target_dex.get_merging_type_method_usage(merging_type, method_name);
target_dex.get_merging_type_method_usage(merging_type, group);
if (old_usage > 0) {
clazz_num_dedupable_method_defs++;
}
Expand Down Expand Up @@ -516,23 +530,23 @@ bool InterDexReshuffleImpl::try_plan_move(const Move& move,
// Update class merging stats in source_dex and target_dex
source_dex.decrease_merging_type_usage(merging_type);
target_dex.increase_merging_type_usage(merging_type);
std::unordered_set<DexMethod*> dedupable_mrefs =
const std::unordered_map<const DexMethod*, MethodGroup>& dedupable_mrefs =
m_class_to_merging_info.at(move.cls).dedupable_mrefs;
for (const auto& method : dedupable_mrefs) {
const std::string& method_name = method->get_simple_deobfuscated_name();
MethodGroup group = method.second;
// Source_dex updates
const int source_old_usage =
source_dex.get_merging_type_method_usage(merging_type, method_name);
source_dex.get_merging_type_method_usage(merging_type, group);
always_assert(source_old_usage > 0);
source_dex.decrease_merging_type_method_usage(merging_type, method_name);
source_dex.decrease_merging_type_method_usage(merging_type, group);
if (source_old_usage == 1) {
source_dex.decrease_num_new_methods();
}
source_dex.decrease_num_deduped_methods();
// Target_dex updates
const int target_old_usage =
target_dex.get_merging_type_method_usage(merging_type, method_name);
target_dex.increase_merging_type_method_usage(merging_type, method_name);
target_dex.get_merging_type_method_usage(merging_type, group);
target_dex.increase_merging_type_method_usage(merging_type, group);
if (target_old_usage == 0) {
target_dex.increase_num_new_methods();
}
Expand Down
15 changes: 8 additions & 7 deletions opt/interdex/InterDexReshuffleImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ struct ReshuffleConfig {

struct MergingInfo {
MergerIndex merging_type;
std::unordered_set<DexMethod*> dedupable_mrefs;
std::unordered_map<const DexMethod*, MethodGroup> dedupable_mrefs;
};

// Compute gain powers by reference occurrences. We don't use the upper 20 (19,
Expand Down Expand Up @@ -298,20 +298,21 @@ class MoveGains {
compute_gain(source_merging_type_usage, target_merging_type_usage,
for_removal);

const std::unordered_map<const DexMethod*, MethodGroup>& dedupable_mrefs =
m_class_to_merging_info.at(cls).dedupable_mrefs;
for (auto* mref : refs.mrefs) {
auto source_occurrences = source.get_mref_occurrences(mref);
auto target_occurrences = target.get_mref_occurrences(mref);
const auto ref_cls = type_class(mref->get_class());
// If mref is defined in cls, then we use its corresponding merging type
// method usage in source and target to approximate the
// source_occurrences and target_occurrences after merging.
if (ref_cls == cls) {
const std::string& method_name =
mref->as_def()->get_simple_deobfuscated_name();
const DexMethod* meth = mref->as_def();
if (dedupable_mrefs.count(meth)) {
MethodGroup group = dedupable_mrefs.at(meth);
source_occurrences =
source.get_merging_type_method_usage(merging_type, method_name);
source.get_merging_type_method_usage(merging_type, group);
target_occurrences =
target.get_merging_type_method_usage(merging_type, method_name);
target.get_merging_type_method_usage(merging_type, group);
gain +=
m_deduped_weight *
compute_gain(source_occurrences, target_occurrences, for_removal);
Expand Down

0 comments on commit c126ee1

Please sign in to comment.