Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register d7 trampled by float_of_int #5731

Closed
vicuna opened this issue Aug 16, 2012 · 7 comments
Closed

Register d7 trampled by float_of_int #5731

vicuna opened this issue Aug 16, 2012 · 7 comments

Comments

@vicuna
Copy link

@vicuna vicuna commented Aug 16, 2012

Original bug ID: 5731
Reporter: jeffsco
Assigned to: meurer
Status: closed (set by @xavierleroy on 2015-12-11T18:19:43Z)
Resolution: fixed
Priority: normal
Severity: major
Platform: ARM
OS: Debian
OS Version: armel 2.6.32-5
Version: 4.00.0
Fixed in version: 4.00.1+dev
Category: back end (clambda to assembly)
Monitored by: cookedm

Bug description

Code generation error. A value is loaded into d7. A call to float_of_int generates code that loads s14, which overwrites the value in d7 (low order mantissa). The value in d7 is then used in a later computation, but the value has been destroyed.

If I make small changes to the code, the load of d7 happens after the float_of_int computation, and things work OK. Possibly some constraints on instruction orderings aren't being applied properly.

Steps to reproduce

Need a version of ocamlopt with support for armv7 with hardware floating point (VFPv3).

Then compile the following code:

let rate_pos scounts : float =
let m_MIN = -999.0
in let max1s = Array.make 14 m_MIN
in let max2s = Array.make_matrix 14 14 m_MIN
in let try_build (k1: int) (m: float) : unit =
let denom = 12
in let try1b (sawk1, xct) k =
let () =
if max2s.(k1).(k) > m then
let adjm = if m <= m_MIN then 0.0 else m
in let numer =
if k = k1 then 48
else if sawk1 then 36
else 24
in let f = float_of_int numer /. float_of_int denom
in let () =
if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
in
max1s.(k1) <-
max1s.(k1) +. (max2s.(k1).(k) -. adjm) *. f
in
if k = k1 then
(true, xct)
else
(sawk1, xct + scounts.(k))
in
ignore (List.fold_left try1b (false, 0) [])
in let () = Array.iteri try_build max1s
in
0.0

My compile line looks like this:

$ ocamlopt -ffpu vfpv3 -c -S rate.ml

I'm attaching the full assembly code output. The section with the error contains the code for the following lines:

in let f = float_of_int numer /. float_of_int denom
in let () =
if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0

The erroneous code looks like this (with added annotations):

   ldr     r12, [r2, #16]    @ r12 <- m_MIN block
   mov     r0, r7, asr #1    
   ldr     r7, [r2, #20]
   movs    r6, #0xc          @ r6 <- denom
   fmsr    s14, r6           
   fsitod  d10, s14          @ d10 <- float_of_int denom
   ldr     r6, [r7, #-4]
   fldd    d7, [r12, #0]     @ d7 <- m_MIN
   ldr     r12, [r2, #28]
   fmsr    s14, r0           @ *** d7 is destroyed here ***
   fsitod  d9, s14           @ d9 <- float_of_int numer
   cmp     r12, r6, lsr #10
   bcs     .L111
   add     r6, r7, r12, lsl #2
   fldd    d6, [r6, #-4]     @ d6 <- max1s.(k1)
   fdivd   d8, d9, d10
   fcmpd   d6, d7            @ *** This comparison fails ***
   fmstat
   bhi     .L104

Additional information

I ran my tests on a stock OCaml 4.00.0 compiler in an emulated Linux ARMEL environment under QEMU on OS X. I don't think this affects the results; I see exactly the same instruction sequence in my experimental cross-compiler builds (compiling for iOS under OS X).

Here is my configure line:

$ ./configure --host armv5tejl-unknown-linux-gnueabihf

(Need to use gnueabihf to get VFPv3 support.)

Then build ocamlopt as usual.

File attachments

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 17, 2012

Comment author: meurer

Jep, the issue is in the scheduling code, that doesn't care about the destroyed_at_oper constraint. There are several possible fixes for this. Either we adjust schedgen to respect destroyed_at_oper, or don't use scheduling at all on arm (just like amd64/i386), or we adjust the arm code generation to avoid the hidden s14 usage for inttofloat, floattoint and single Load/Store.

Xavier, any opinions on this?

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 20, 2012

Comment author: @xavierleroy

Tentative fix to the scheduler committed in SVN trunk, revision 12865. Needs review and testing before inclusion in 4.00 bugfix branch.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 21, 2012

Comment author: meurer

Jeff, can you test the fix with your iOS backend?

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 21, 2012

Comment author: jeffsco

Yes, I'll test first thing tomorrow (Tuesday morning Seattle time).

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 22, 2012

Comment author: jeffsco

I installed the new schedgen.ml into my modified version of 4.00.0+beta2 and rebuilt the app that was failing before. By eye, the spot that was broken before is fixed (d7 is loaded after the float_of_int calculation). I ran the app, and it works perfectly on the device. I also built and ran our other app (another card game), and it also ran perfectly. The whittled-down example above also looks perfect by eye (as I would expect). So, things are looking excellent to me.

One limitation of this test is that I installed Xavier's scheduling change into my worktree. If you like I can instead apply my iOS patches to the INRIA trunk and try that instead. The results will almost certainly be the same, but it feels slightly more rigorous (though a little more trouble for me to set up). Let me know if you'd like me to run this test (or any other test).

(Also let me know if I'm supposed to click a button somewhere in Mantis to say I've given feedback! I'm hoping these notes are sufficient.)

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 23, 2012

Comment author: meurer

Ok, so this is fixed in trunk then. Xavier, do you want to merge that into the 4.00 branch?

Concerning the iOS merge, I already started work on this here:

https://github.com/bmeurer/ocaml-arm/tree/bm/ios

You may want to clone that and contribute the missing bits and pieces. I'll merge everything into upstream once working and time permits.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 24, 2012

Comment author: @xavierleroy

Fix to schedgen.ml commited in 4.00 bugfix branch, r12876. Will be released with 4.00.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant