Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring merge sort and insertion sort cmp function semantics together #17473

Merged
merged 6 commits into from Sep 9, 2020

Conversation

anisse
Copy link
Contributor

@anisse anisse commented Aug 16, 2020

Merge sort uses cmp (a, b) < 0 for its first test branch, and insertion
sort cmp (a, b) > 0 ; which means the 0 boundary goes in one case in one
branch, and in the other sort function in the other branch.

We keep the semantics of the insertion sort, because it allows stability between
the two sort functions for equal elements.

Update tests that were broken because of wrong register ordering.

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the documentation and the radare2 book with the relevant information (if needed)

Detailed description

Register order changed on arm32 after adding more register at the end: everything was seemingly in a random order. It's because the list went over 43 elements, and started using merge sort instead of insertion sort. This PR make merge sort behave properly with compare functions that return a bool.

Test plan

...

Closing issues

This should unblock test regression from PR #17462

@anisse
Copy link
Contributor Author

anisse commented Aug 17, 2020

I rebased on master and removed sdb modifications after sending radareorg/sdb#212

@XVilka
Copy link
Contributor

XVilka commented Aug 19, 2020

Please rebase on top of the master and sync SDB here since that PR was merged.

@XVilka
Copy link
Contributor

XVilka commented Aug 20, 2020

Seems it has broken the reverse debugger:

[XX] db/archos/linux-x64/dbg_step_back dbg.stepback
R2_NOPLUGINS=1 radare2 -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -d -e dbg.bpsysign=true -Qc 'db main
db 0x004028fe
dc
dts+
dc
dsb
dsb
dr rbx,rcx,rdx,r12,rip
dk 9
' bins/elf/analysis/ls-linux-x86_64-zlul
-- stdout
@@ -1,5 +1,5 @@
 0x00000001
 0x00000001
-0x00000001
+0x0000a401
 0x00404870

-- stderr
Process with PID 33840 started...
= attach 33840 33840
bin.baddr 0x00400000
Using 0x400000
asm.bits 64
hit breakpoint at: 0x4028a0
Reading 4096 byte(s) from 0x0061c000...
Reading 4096 byte(s) from 0x0061d000...
Reading 135168 byte(s) from 0x02466000...
Reading 4096 byte(s) from 0x7fa88960b000...
Reading 16384 byte(s) from 0x7fa88960c000...
Reading 4096 byte(s) from 0x7fa889814000...
Reading 4096 byte(s) from 0x7fa889a18000...
Reading 4096 byte(s) from 0x7fa889c89000...
Reading 8192 byte(s) from 0x7fa88a075000...
Reading 16384 byte(s) from 0x7fa88a077000...
Reading 4096 byte(s) from 0x7fa88a282000...
Reading 4096 byte(s) from 0x7fa88a4a8000...
Reading 4096 byte(s) from 0x7fa88a282000...
Reading 4096 byte(s) from 0x7fa88a4a8000...
Reading 8192 byte(s) from 0x7fa88a4a9000...
Reading 28672 byte(s) from 0x7fa88a6b9000...
Reading 4096 byte(s) from 0x7fa88a6d3000...
Reading 4096 byte(s) from 0x7fa88a6d4000...
Reading 135168 byte(s) from 0x7ffe012d9000...
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
r_reg_get_value: Bit size 256 not supported
...

@anisse
Copy link
Contributor Author

anisse commented Aug 20, 2020

Yes, I didn't notice it at first because the test is also failing on my machine (Fedora 32), but differently. I saw the same result before and after the PR. I'll install Ubuntu to try to reproduce it.

@anisse
Copy link
Contributor Author

anisse commented Aug 20, 2020

I installed Ubuntu bionic in chroot, and the test passes on this branch, I'll try something else.

@anisse
Copy link
Contributor Author

anisse commented Aug 20, 2020

I think it may be because I don't have AVX extensions on my (old) CPU, see logs of a github action on my account, with additional debug info:

[…]
r_reg_get_value: Bit size 256 not supported for reg ymm10 
r_reg_get_value: Bit size 256 not supported for reg ymm11 
r_reg_get_value: Bit size 256 not supported for reg ymm12 
r_reg_get_value: Bit size 256 not supported for reg ymm13 
r_reg_get_value: Bit size 256 not supported for reg ymm14 
[…]

https://github.com/anisse/radare2/pull/3/checks?check_run_id=1005021519

Now I need to find a machine with those or debug through github actions :-/

@anisse
Copy link
Contributor Author

anisse commented Aug 23, 2020

Apparently glibc has a function to do a faster memset using avx512: vzeroupper. That's what the trace debugger is hitting, trying to get the value of the ymm0-15 registers, and failing because this isn't implemented.

I'm pretty sure this PR accidentally fixed something (I don't know what) which is now making this trace debugger test fail. I wouldn't mind a second pair of eyes for this.

I could add a fake case for getting the value of a 256bits register, just like 128bits (which isn't really implemented). What do you think ?

@ret2libc
Copy link
Contributor

I'm not sure about this. By definition in r_list.h, RListComparator should return -1, 0, 1, so I think it is just wrong to pass a RListComparator that returns a bool. IMHO it is wrong to use cmp_order as in test_list.c and assume to have good results by returning a bool. @thestr4ng3r what do you think?

@anisse
Copy link
Contributor Author

anisse commented Aug 25, 2020

I'm not sure about this. By definition in r_list.h, RListComparator should return -1, 0, 1, so I think it is just wrong to pass a RListComparator that returns a bool. IMHO it is wrong to use cmp_order as in test_list.c and assume to have good results by returning a bool. @thestr4ng3r what do you think?

I've thought about this as well, but decided against this, for the following reasons:

  • there are comparators that rely on the "wrong" behaviour. And this behaviour works for insertion sort.
  • both insertion sort and merge sort should have the same behaviour
  • historically, there was only insertion sort, and merge sort was added later, with a different semantic at the 0 boundary. The "definition" in r_list.h was also added later, without fixing all the comparators. It was also probably added as consequence of this issue, without fixing the core issue.

I could provide a different PR that changes the comparators, but IMHO it would require changing the insertion sort as well, to stop working as it always did, since the "code is the API", and the behaviour should be the same for both functions to avoid any surprise once the list size changes; ideally this should be propagated in sdb as well, that already took this fix. What do you think ?

Right now there are lists in the code that aren't properly sorted: I've found the register lists for x86, aarch64, and the new one for arm32 in PR #17462 .

@ret2libc
Copy link
Contributor

Right now there are lists in the code that aren't properly sorted: I've found the register lists for x86, aarch64, and the new one for arm32 in PR #17462 .

I think they are not properly sorted because of this bool/int mess. If you convert the comparator functions to use -1/0/1, they should work.

I'm not sure about this. By definition in r_list.h, RListComparator should return -1, 0, 1, so I think it is just wrong to pass a RListComparator that returns a bool. IMHO it is wrong to use cmp_order as in test_list.c and assume to have good results by returning a bool. @thestr4ng3r what do you think?

I've thought about this as well, but decided against this, for the following reasons:

* there are comparators that rely on the "wrong" behaviour. And this behaviour works for insertion sort.

The fact that it works is just a side effect, IMO. However, in general in C comparator functions like strcmp and similar return -1/0/1 (a negative, 0, positive number actually), this is why having the same semantics seems better to me.

* both insertion sort and merge sort should have the same behaviour

I agree, and I think they do, if the right comparator function (that is one that returns -1/0/1) is used in both cases. If merge or insertion sort don't work properly, provided with a -1/0/1 comparator function, than that is definitely a bug.

* historically, there was only insertion sort, and merge sort was added later, with a different semantic at the 0 boundary. The "definition" in r_list.h was also added later, without fixing all the comparators. It was also probably added as consequence of this issue, without fixing the core issue.

I didn't dig into the history, but the different semantic at the 0 boundary is probably only a "small issue" and probably should affect only the relative sorting of two elements with the same value according to cmp. Anyway, RListComparator is used not only in insertion/merge_sort and the same comparator function should work well in all those functions (e.g. r_list_sort, but also r_list_find). A comparator function that returns a > b would return false both when a == b and when a < b. False is interpreted as 0 and r_list_find uses !cmp to check whether an element was found in a list. So r_list_find would not work correctly. I think this is just one example, but in general I do not see it as a good thing to have a boolean disguised as a int.

All this to say that if there are problems when moving from insertion to merge sort, it is probably just because the comparator function passed to insertion_sort abused the way the sorting is implemented, but those are actually wrong. We should change those comparator functions. You can also see that you are actually casting a boolean (a > b) to int (which is the return value of RListComparator) and this usually should raise a red flag. That said, it is also ok to ensure that the behaviour when two elements are evaluated to be the same (RListComparator returns 0) is consistent between merge and insertion sort.

@anisse
Copy link
Contributor Author

anisse commented Aug 25, 2020

* historically, there was only insertion sort, and merge sort was added later, with a different semantic at the 0 boundary. The "definition" in r_list.h was also added later, without fixing all the comparators. It was also probably added as consequence of this issue, without fixing the core issue.

I didn't dig into the history, but the different semantic at the 0 boundary is probably only a "small issue" and probably should affect only the relative sorting of two elements with the same value according to cmp. Anyway, RListComparator is used not only in insertion/merge_sort and the same comparator function should work well in all those functions (e.g. r_list_sort, but also r_list_find). A comparator function that returns a > b would return false both when a == b and when a < b. False is interpreted as 0 and r_list_find uses !cmp to check whether an element was found in a list. So r_list_find would not work correctly. I think this is just one example, but in general I do not see it as a good thing to have a boolean disguised as a int.

It's not a small issue, once merge sort is used, lists aren't sorted at all. You can look at the test fixes to see how the order changes.

All this to say that if there are problems when moving from insertion to merge sort, it is probably just because the comparator function passed to insertion_sort abused the way the sorting is implemented, but those are actually wrong. We should change those comparator functions. You can also see that you are actually casting a boolean (a > b) to int (which is the return value of RListComparator) and this usually should raise a red flag. That said, it is also ok to ensure that the behaviour when two elements are evaluated to be the same (RListComparator returns 0) is consistent between merge and insertion sort.

I agree it's a comparator bug, but this PR has the advantage of entirely fixing this bug class. It will reappear again, because casting the function once to (RListComparator) is slightly easier than casting both const void*.

Here is my proposal:

  • keep this behaviour change for merge sort, but also fix the bad RListComparators. I've found about ~14 entries, I'll fix them in a separate patch in the PR.
  • I'll remove the test to show that it isn't acceptable use of r_list_sort

@anisse
Copy link
Contributor Author

anisse commented Aug 25, 2020

I've updated the PR with my proposal and changed the main description.

libr/core/agraph.c Outdated Show resolved Hide resolved
@ret2libc
Copy link
Contributor

I've updated the PR with my proposal and changed the main description.

Thanks! Much better now ;) Let's wait for CI. Please just fix that small comment (unless there is a reason to keep it like that), then it's ok for me.

test/db/anal/vars Show resolved Hide resolved
test/db/archos/linux-x64/dbg_drt Show resolved Hide resolved
libr/core/cconfig.c Outdated Show resolved Hide resolved
@ret2libc ret2libc self-assigned this Aug 28, 2020
This fixes the trace debugger test by removing the content of rdx, which
changes on Fedora glibc, or recent Ubuntu with glibc AVX2 support.

Ideally this test should be modified to depend less on the system libc.
Copy link
Collaborator

@trufae trufae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the commit title and add ##anal i think this is an important change and it shuold be in the changelog, the message should make clear the function variable and reflines order is what is mainly affected. Do you have a screenshot of the change in reflines after this change? is the change in sdb correct or should be changed again because of the cmp <=

libr/core/cconfig.c Outdated Show resolved Hide resolved
libr/core/cconfig.c Outdated Show resolved Hide resolved
libr/core/cconfig.c Outdated Show resolved Hide resolved
@@ -1334,9 +1334,9 @@ var char ** var_20h @ rbp-0x20
var int64_t var_14h @ rbp-0x14
var void * var_10h @ rbp-0x10
var int64_t var_4h @ rbp-0x4
arg int argc @ rdi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there is something wrong here. I would expect variables and arguments to be shown in the "right order", not just sorted by register names. I know this depends on aBI, etc. but it kinda made sense to have rdi, rsi, rdx, while I find this new sorting inappropriate in this particular context. @XVilka @thestr4ng3r @kazarmy WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the argument order feels reversed. I'm not sure how to address that here though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a little bit more investigation. Maybe it's just the way it is, but maybe we are missing something.

@anisse
Copy link
Contributor Author

anisse commented Aug 28, 2020

Change the commit title and add ##anal i think this is an important change and it shuold be in the changelog, the message should make clear the function variable and reflines order is what is mainly affected.

Should it be for all commits ?

Do you have a screenshot of the change in reflines after this change?

I'm not sure how to test, but here is a small example; before:

$ ./binr/radare2/radare2 -e "asm.describe=false" -e "scr.color=0" -Qc "pd 12 @ 0x00065f70" test/bins/mach0/Alamofire-stripped
            0x00065f70      81040f58       ldr x1, 0x84000
            0x00065f74      60070094       bl 0x67cf4
        ┌─< 0x00065f78      e00000b4       cbz x0, 0x65f94
       ┌──> 0x00065f7c      1f0013eb       cmp x0, x19
      ┌───< 0x00065f80      80000054       b.eq 0x65f90
      │╎│   0x00065f84      ff060094       bl 0x67b80
      │└──< 0x00065f88      a0ffffb5       cbnz x0, 0x65f7c
      │┌──< 0x00065f8c      02000014       b 0x65f94
      └───> 0x00065f90      e0030032       orr w0, wzr, 1
       └└─> 0x00065f94      fd7b41a9       ldp x29, x30, [sp, 0x10]
            0x00065f98      f44fc2a8       ldp x20, x19, [sp], 0x20
            0x00065f9c      c0035fd6       ret

after:

$ ./binr/radare2/radare2 -e "asm.describe=false" -e "scr.color=0" -Qc "pd 12 @ 0x00065f70" test/bins/mach0/Alamofire-stripped
            0x00065f70      81040f58       ldr x1, 0x84000
            0x00065f74      60070094       bl 0x67cf4
        ┌─< 0x00065f78      e00000b4       cbz x0, 0x65f94
       ┌──> 0x00065f7c      1f0013eb       cmp x0, x19
      ┌───< 0x00065f80      80000054       b.eq 0x65f90
      │╎│   0x00065f84      ff060094       bl 0x67b80
      │└──< 0x00065f88      a0ffffb5       cbnz x0, 0x65f7c
      │┌──< 0x00065f8c      02000014       b 0x65f94
      └───> 0x00065f90      e0030032       orr w0, wzr, 1
       └└─> 0x00065f94      fd7b41a9       ldp x29, x30, [sp, 0x10]
            0x00065f98      f44fc2a8       ldp x20, x19, [sp], 0x20
            0x00065f9c      c0035fd6       ret

(no change)

is the change in sdb correct or should be changed again because of the cmp <=

Yes, I've made sure we keep the same behaviour with sdb.

…#anal

Merge sort uses cmp (a, b) < 0 for its first test branch, and insertion
sort cmp (a, b) > 0 ; which means the 0 boundary goes in one case in one
branch, and in the other sort function in the other branch.

It makes it possible to support compare function that return true/false
instead of -1/0/1; although this isn't an acceptable use of
RListComparator, this prevents future bugs from appearing, because this
works with insertion sort, but not merge sort.

The main advantage of this patch is that both sort functions should sort
equal elements the same way. This stability is important for zignatures
for example.
@anisse
Copy link
Contributor Author

anisse commented Aug 28, 2020

Change the commit title and add ##anal i think this is an important change and it shuold be in the changelog, the message should make clear the function variable and reflines order is what is mainly affected.

Should it be for all commits ?

I did it for commits that impact sort order and test fixes.

@ret2libc
Copy link
Contributor

ret2libc commented Sep 2, 2020

@XVilka @thestr4ng3r @trufae someone please have another look at this and merge if you think it's ok!

Copy link
Contributor

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison change looks good to merge. What looks wrong is the reversed order of the arguments/etc like @ret2libc pointed out.

radare
radare previously requested changes Sep 3, 2020
xmm1h
xmm1
ds
xmm1l
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this order is messed up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but that's because the register offsets are the same for xmm1l and ds : 184

"xmm@fpu xmm1 .128 176 16\n"
"fpu xmm1h .64 176 8\n"
"fpu xmm1l .64 184 8\n"

"seg@gpr ds .64 184 0\n"

ymm14
ymm13
ymm12
ymm11
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its properly (reversed) sorted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the original functions wants to sort in order, this is the issue this PR fixes:
https://github.com/radareorg/radare2/blob/master/libr/reg/reg.c#L200-L203

You can verify this by always using insertion sort instead here:
https://github.com/radareorg/radare2/blob/master/libr/util/list.c#L573-L577

r12
dr3
mxcr_mask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random order :?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@anisse anisse Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, wrong links, this is the file:


"fpu mxcr_mask .32 28 0\n"

@XVilka XVilka marked this pull request as draft September 3, 2020 00:06
@ret2libc
Copy link
Contributor

ret2libc commented Sep 3, 2020

@XVilka @trufae as said in previous comments, we think those "random" orders are just because they were never really sorted in the first place. If you look at those tests, the original order was not really better than the new one. It seems like somehow they seemed sorted, but there is no real sorting underneath. Probably those commands should sort the things before printing them themselves.

@trufae
Copy link
Collaborator

trufae commented Sep 4, 2020

lgtm

@ret2libc
Copy link
Contributor

ret2libc commented Sep 6, 2020

@anisse can you make the PR ready if it's ok to merge? Just to be sure you don't intend to add anything to this. Thanks again for this!

@anisse anisse marked this pull request as ready for review September 6, 2020 18:07
@ret2libc ret2libc dismissed radare’s stale review September 9, 2020 13:43

dismissing because trufae has accepted the changes.

@ret2libc ret2libc merged commit a4c76ff into radareorg:master Sep 9, 2020
@anisse anisse deleted the mergesort branch September 24, 2020 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r2r Regression tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants