Float metrics values overflow on ARM systems #117

bekriebel · 2018-04-13T18:46:42Z

When compiled for ARM, metrics values are overflowing when they are cast from float64 to int. This seems to be caused by ARM using int32 instead of int64 during the conversion. This results in high-value metrics getting set to either 2147483.647 or 0 (depending on which GOARM value is used) as soon as the value overflows int32.

The code causing this:

cloudprober/metrics/float.go

Lines 92 to 95 in 882d33f

    
           // Truncate the float to 3-digit precision. 
        
           // Example: 3.33333333333 -> 3.333 
        
           ff := float64(int(f.f*1000)) / 1000 
        
           return strconv.FormatFloat(ff, 'f', -1, 64)

This can be fixed in a few ways:

Don't truncate at all and just output the float as-is.
Specify int64 instead of int. However, this still has a chance of overflow for large values.
Instead of using the int cast and str.FormatFloat, use fmt.Sprintf("%.3f", f.f). This performs the truncate but does not remove the trailing zeros in the decimal in the same way FormatFloat currently does.
Use fmt.Sprintf, but also perform strings.TrimRight to remove trailing zeros and the decimal point where valid. This is functionally similar to the existing version.

I implemented the fourth option on an incoming pull request. Let me know if you'd prefer a different version of the fix.

manugarg · 2018-04-16T05:53:34Z

@bekriebel This is certainly a bug. Thanks for debugging this and the PR. Cloudprober's source of truth (SoT) is Google internal, which means that I'll have to import his pull request to the internal source and then export it back. Thankfully, this process preserves the original author.

Unfortunately, cloudprober was not setup properly for this kind of transformations (reverse). I was trying to sort that out over past few days. That's why delay in accepting the PR. I'll try to finish this process tomorrow.

bekriebel · 2018-04-16T19:27:24Z

No problem. I already have it running in my environment with this change and it is working well. This seems to be the only change necessary for the code to work well on the ARM platform.

Thanks for your work on it!

manugarg · 2018-04-16T23:55:18Z

@bekriebel Thanks. I wanted to ask if you'd mind if I get rid of the trimming completely (that is, we'll print trailing zeros) . I think it's making code a bit obscure and not adding as much value.

I can just do it at my end if that sounds good to you.

bekriebel · 2018-04-17T04:53:00Z

@manugarg Yeah, that was my first inclination as well but wasn't sure if it was done this way for a reason. What do you think about just returning fmt.Sprintf("%f", f.f)? Since this is most likely being ingested by a receiver that is accepting it as a float anyway, I don't see the need to truncate the string at all. If a receiver can't accept this, it could always be truncated in the specific surfacer.

bekriebel · 2018-04-17T05:23:16Z

I went ahead and pushed a new version that doesn't trim or truncate. Personally, I prefer having the extra precision. It can always be truncated on a UI side or in a surfacer if desired.

On ARM system, the int type is defaulting to the size of int32 and can easily be overrun. As discussed in #117, we can skip trimming the trailing zeros. It's not adding much value and making the code obscure. PiperOrigin-RevId: 193150323

manugarg · 2018-04-17T05:44:39Z

Importing this change back as #119. Please take a look.

Keeping precision of 3 decimal points as sometimes we get really long strings in the latency (.3333333.. kind of things). Also, using strconv instead of fmt.Sprintf as strconv is a bit faster.

On ARM system, the int type is defaulting to the size of int32 and can easily be overrun. As discussed in #117, we can skip trimming the trailing zeros. It's not adding much value and making the code obscure. PiperOrigin-RevId: 193150323

manugarg · 2018-04-17T06:56:15Z

Thanks again, Brint. Closing this issue now.

bekriebel mentioned this issue Apr 13, 2018

float: Don't cast a float value as an int #118

Closed

manugarg added the bug label Apr 16, 2018

manugarg mentioned this issue Apr 17, 2018

float: Don't cast a float value as an int #119

Merged

manugarg closed this as completed Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float metrics values overflow on ARM systems #117

Float metrics values overflow on ARM systems #117

bekriebel commented Apr 13, 2018

manugarg commented Apr 16, 2018

bekriebel commented Apr 16, 2018

manugarg commented Apr 16, 2018

bekriebel commented Apr 17, 2018

bekriebel commented Apr 17, 2018

manugarg commented Apr 17, 2018

manugarg commented Apr 17, 2018

Float metrics values overflow on ARM systems #117

Float metrics values overflow on ARM systems #117

Comments

bekriebel commented Apr 13, 2018

manugarg commented Apr 16, 2018

bekriebel commented Apr 16, 2018

manugarg commented Apr 16, 2018

bekriebel commented Apr 17, 2018

bekriebel commented Apr 17, 2018

manugarg commented Apr 17, 2018

manugarg commented Apr 17, 2018