Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding External Term Format (ETF) heap map with a flat map as a key results in a broken map #6496

Closed
potatosalad opened this issue Nov 22, 2022 · 2 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@potatosalad
Copy link
Contributor

Describe the bug

While running some stateful property-based tests on the Erlang Distribution Protocol, I encountered an issue with specific Map encodings which can result in a "broken map".

Specifically: if a map with more than 32 key-pairs (or heap map) contains a key which is a flat map with at least 2 key pairs of different types (for example: a float() and an integer() like with #{0 => [],0.0 => []}), the exact ordering of the encoded key pairs in this nested flat map can result in a decoded map that has a key that cannot be fetched with maps:get/2,3, cannot be pattern matched against, cannot be displayed properly with erlang:display/1, etc.

Example nested encoding which can result in a broken map:

%% Encoding which can result in a broken map if used as the key of a larger heap map:
<<
    116,                % MAP_EXT
    0,0,0,2,            % Arity = 2
    70,0,0,0,0,0,0,0,0, % [0] Key = 0.0
    106,                % [0] Val = []
    97,0,               % [1] Key = 0
    106                 % [1] Val = []
>>.

%% Encoding from `erlang:term_to_binary/2` which does not result in a broken map:
<<
    116,                % MAP_EXT
    0,0,0,2,            % Arity = 2
    97,0,               % [0] Key = 0
    106,                % [0] Val = []
    70,0,0,0,0,0,0,0,0, % [1] Key = 0.0
    106                 % [1] Val = []
>>.

To Reproduce

See prop_find_broken_maps and the counterexample.log for more details on how these encodings were produced.

1> BrokenBin = <<131,116,0,0,0,33,97,0,106,97,1,106,97,2,106,97,3,106,97,4,106,97,5,106,97,6,106,97,7,106,97,8,106,97,9,106,97,10,106,97,11,106,97,12,106,97,13,106,97,14,106,97,15,106,97,16,106,97,17,106,97,18,106,97,19,106,97,20,106,97,21,106,97,22,106,97,23,106,97,24,106,97,25,106,97,26,106,97,27,106,97,28,106,97,29,106,97,30,106,97,31,106,116,0,0,0,2,70,0,0,0,0,0,0,0,0,106,97,0,106,106>>,
1> BrokenMap = erlang:binary_to_term(BrokenBin).
#{16 => [],18 => [],5 => [],19 => [],27 => [],10 => [],
  24 => [],4 => [],21 => [],22 => [],17 => [],3 => [],8 => [],
  1 => [],7 => [],2 => [],14 => [],15 => [],20 => [],13 => [],
  0 => [],6 => [],28 => [],25 => [],9 => [],11 => [],31 => [],
  26 => [],30 => [],
  #{0 => [],0.0 => []} => [],
  29 => [],23 => [],12 => []}
2> % Try fetching every key from the broken Map (should not error):
2> [maps:get(Key, BrokenMap) || Key <- maps:keys(BrokenMap)].
** exception error: bad key: #{0 => [],0.0 => []}
     in function  maps:get/2
        called as maps:get(#{0 => [],0.0 => []},
                           #{16 => [],18 => [],5 => [],19 => [],27 => [],10 => [],
                             24 => [],4 => [],21 => [],22 => [],17 => [],3 => [],8 => [],
                             1 => [],7 => [],2 => [],14 => [],15 => [],20 => [],13 => [],
                             0 => [],6 => [],28 => [],25 => [],9 => [],11 => [],
                             31 => [],...})
        *** argument 1: not present in map
3> % Try fetching the flat map key from the heap map (should not error):
3> maps:get(#{0 => [],0.0 => []} , BrokenMap).
** exception error: bad key: #{1 => 0,2 => 0,3 => 0,0.0 => 0}
     in function  maps:get/2
        called as maps:get(#{1 => 0,2 => 0,3 => 0,0.0 => 0},
                           #{16 => 0,18 => 0,5 => 0,19 => 0,
                             #{1 => 0,2 => 0,3 => 0,0.0 => 0} => 0,
                             27 => 0,10 => 0,24 => 0,4 => 0,21 => 0,22 => 0,17 => 0,
                             3 => 0,8 => 0,1 => 0,7 => 0,2 => 0,14 => 0,15 => 0,20 => 0,
                             13 => 0,0 => 0,6 => 0,28 => 0,25 => 0,9 => 0,11 => 0,...})
        *** argument 1: not present in map
4> % Try pattern matching with the flat map key (should not error):
4> Key = hd([K || K <- maps:keys(BrokenMap), is_map(K)]).
#{0 => [],0.0 => []}
5> #{Key := _} = BrokenMap.
** exception error: no match of right hand side value #{16 => [],18 => [],5 => [],19 => [],27 => [],10 => [],
                                                        24 => [],4 => [],21 => [],22 => [],17 => [],3 => [],8 => [],
                                                        1 => [],7 => [],2 => [],14 => [],15 => [],20 => [],13 => [],
                                                        0 => [],6 => [],28 => [],25 => [],9 => [],11 => [],31 => [],
                                                        26 => [],30 => [],...}
6> % Try calling `erlang:display/1` with the broken Map (results in corrupted output):
6> erlang:display(BrokenMap).
",[0]},[6],<1400>{[28],[25]},<210>{"{[19",[11]},[31],[26],<2440>{[30],[#{0=>[],0.000000e+00=>[]}],<8020>{[29],[23]}},[12]}<8512>{[7],[2],[14],[15],[20]},<1080>{"
true
7> % Try encoding and decoding the broken Map:
7> FixedBin = erlang:term_to_binary(BrokenMap, [{minor_version, 2}]).
<<131,116,0,0,0,33,97,12,106,97,23,106,97,29,106,116,0,0,0,2,97,0,106,70,0,0,0,0,0,0,0,0,106,106,97,30,106,97,26,106,97,31,106,97,11,106,97,9,106,97,25,106,97,28,106,97,6,106,97,0,106,97,13,106,97,20,106,97,15,106,97,14,106,97,2,106,97,7,106,97,1,106,97,8,106,97,3,106,97,17,106,97,22,106,97,21,106,97,4,106,97,24,106,97,10,106,97,27,106,97,19,106,97,5,106,97,18,106,97,16,106>>
8> FixedMap = erlang:binary_to_term(FixedBin).
#{16 => [],18 => [],
  #{0 => [],0.0 => []} => [],
  5 => [],19 => [],27 => [],10 => [],24 => [],4 => [],
  21 => [],22 => [],17 => [],3 => [],8 => [],1 => [],7 => [],
  2 => [],14 => [],15 => [],20 => [],13 => [],0 => [],6 => [],
  28 => [],25 => [],9 => [],11 => [],31 => [],26 => [],
  30 => [],29 => [],23 => [],12 => []}
8> % Fetching the flat map key from the heap map with the Fixed Map (works correctly):
8> maps:get(#{0 => [],0.0 => []}, FixedMap).
[]
9> % Also notice that the BrokenMap and FixedMap do not equal one another:
9> FixedMap == BrokenMap.
false

Expected behavior

Depending on the desired behavior here, either:

  1. The External Term Format needs to have more specific documentation about MAP_EXT if the ordering of the key pairs is important and binary_to_term should error if out-of-order key pairs are detected when decoding.
  2. Decoding a MAP_EXT should allow the key pairs to be in any arbitrary order without resulting in a "broken map".

My vote would be for (2).

Here are a couple property definitions in pseudo-quickcheck code:

% Calling `maps:get/2,3` on a `Map` for each `Key` returned by `maps:keys(Map)` should succeed.
?FORALL(
    Map,
    map(),
    begin
        Tag = erlang:make_ref(),
        Pred = fun(Key) -> maps:get(Key, Map, Tag) =/= Tag end,
        lists:all(Pred, maps:keys(Map))
    end
).

% A "roundtrip" `Map` term should be equal to itself after encoding and decoding.
?FORALL(
    Map,
    map(),
    begin
        RoundtripMap = erlang:binary_to_term(erlang:term_to_binary(Map)),
        RoundtripMap == Map
    end
).

Affected versions

  • OTP 25.x (maint)
  • OTP 26.x (master)

Additional context

I think external.c may be erroneously assuming that the key pairs being decoded are in a specific order.

@potatosalad potatosalad added the bug Issue is reported as a bug label Nov 22, 2022
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Nov 23, 2022
@sverker
Copy link
Contributor

sverker commented Nov 28, 2022

Thanks for the good report.

Yes, the ambition was for MAP_EXT to not demand sorted key.

The bug is at the end of function dec_term()

if (!PSTACK_IS_EMPTY(hamt_array)) {
do {
struct dec_term_hamt* hamt = PSTACK_TOP(hamt_array);
*hamt->objp = erts_hashmap_from_array(factory,
hamt->leaf_array,
hamt->size,
1);
if (is_non_value(*hamt->objp))
goto error_hamt;
(void) PSTACK_POP(hamt_array);
} while (!PSTACK_IS_EMPTY(hamt_array));
}
/* Iterate through all the (flat)maps and check for validity and sort keys
* - done here for when we know it is complete.
*/
while(!WSTACK_ISEMPTY(flat_maps)) {
next = (Eterm *)WSTACK_POP(flat_maps);
if (!erts_validate_and_sort_flatmap((flatmap_t*)next))
goto error;
}
where all (big) hashmaps are generated followed by sorting of all (small) flatmaps. If an unsorted flatmap is key in a hashmap its hash value will be calculated wrong. Seems like we have to fixup all maps in one go bottom-up.
Working on it...

@sverker
Copy link
Contributor

sverker commented Nov 30, 2022

Fix #6521 merged to maint for OTP 25.2.

@sverker sverker closed this as completed Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

3 participants