Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny optim to avoid some spilling of floats in x87 #6924

Closed
vicuna opened this issue Jul 9, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jul 9, 2015

Original bug ID: 6924
Reporter: @alainfrisch
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-02-16T14:15:02Z)
Resolution: fixed
Priority: low
Severity: tweak
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Monitored by: @gasche jmeber

Bug description

The code generated for

let f (m:float array) i x = Array.set m i x

on x86 (32-bit) unboxes the float argument, spills it, does the checkbound, and loads the float back from the stack before putting it in the array:

	sub	esp, 8
L100:
	fld	REAL8 PTR [ecx]
	fstp	REAL8 PTR [esp]
	mov	ecx, DWORD PTR [eax-4]
	shr	ecx, 10
	cmp	ecx, ebx
	jbe	L101
	fld	REAL8 PTR [esp]
	fstp	REAL8 PTR [ebx*4+eax-4]
	mov	eax, 1
	add	esp, 8
	ret

I guess that the current strategy for the x87 fp stack is to use it only within a basic block, and it would probably be quite difficult to change that in general. Fragments of numerical code which have branches only because of check bounds could probably benefit a little bit from optimizing cases such as above. The spill above can be avoided by the attached patch, which doesn't bind the rhs of the assignment (i.e. the unboxing code) before the checkbound. The patch is quite restrictive, it should be ok to do so as long as the rhs cannot raise an exception.

The code generated after the patch is:

	sub	esp, 8
L100:
	mov	edx, DWORD PTR [eax-4]
	shr	edx, 10
	cmp	edx, ebx
	jbe	L101
	fld	REAL8 PTR [ecx]
	fstp	REAL8 PTR [ebx*4+eax-4]
	mov	eax, 1
	add	esp, 8
	ret

A very rough micro-benchmark on the code below:

let f a (x : float) =
  for i = 1 to 1000 do
    for j = 0 to Array.length a - 1 do
      a.(j) <- x;
      a.(j) <- x;
      a.(j) <- x;
      a.(j) <- x
    done
  done

let () =
  let a = Array.make 1024 0. in
  for i = 1 to 1000 do
    f a (float i)
  done

gives the following results:

  Before patch,  -unsafe mode:  1.8s
  Before patch,     safe mode:  3.2s
  After patch,   -unsafe mode:  1.8s
  After patch,      safe mode:  2.1s

which shows that most of the overhead for bound checks comes from the spilling overhead.

File attachments

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 9, 2015

Comment author: jmeber

I strongly advocate inclusion of this patch as it gives big speedups (on x86) of a very common and frequent operation in numeric code (in-place modification of float array with "simple" values) when you still want to benefit from bounds checking.
We noted this behavior during a more general initiative in our company to try to implement some numerical kernel routines "back" in OCaml (instead of having them as external C routines) but without loosing too much speed (and yes, we have to support x86 platform for now).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 25, 2015

Comment author: @gasche

I merged the patch in trunk.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 27, 2015

Comment author: @alainfrisch

Thanks for merging. That said, the patch only covers a very specific case and the problem might be more general. I wonder whether we should keep the ticket open for this reason.

@vicuna vicuna closed this Feb 16, 2017

@vicuna vicuna added the back-end label Mar 14, 2019

@vicuna vicuna added this to the 4.03.0 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.