Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE_PSP_GetTime Loses Resolution in Conversion to Microseconds (GSFC DCR 14952) #76

Closed
skliper opened this issue Sep 25, 2019 · 17 comments · Fixed by #147
Closed

CFE_PSP_GetTime Loses Resolution in Conversion to Microseconds (GSFC DCR 14952) #76

skliper opened this issue Sep 25, 2019 · 17 comments · Fixed by #147
Labels
bug Something isn't working
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 25, 2019

There's a 'bug' in the implementation of the mcp750/vxworks PSP which causes the loss of timing resolution in the function CFE_PSP_GetTime(). for example:

{{{
void CFE_PSP_GetTime( OS_time_t *LocalTime)
{   
uint32 DecCount;   

/* Reads the time from the hardware register, then converts it     
* into usable seconds and microseconds */   

sysPciRead32(0xFC0011C0, (UINT32 )(&DecCount));   
DecCount = DecCount & 0x7FFFFFFF;   
DecCount = ((uint32)  0x0D6937E5) - DecCount;   
LocalTime->seconds = DecCount / 8333311;   
DecCount = DecCount % 8333311;   
LocalTime->microsecs = (DecCount/8333) * 1000;
}/
end CFE_PSP_GetLocalTime */
}}}

In this case, (DecCount/8333) * 1000 is performed as an integer calculation (as DecCount and LocalTime->microsecs are integers), basically reducing resolution from microseconds (which this function can calculated from fractions of microseconds) to miliseconds.

NOTE: GSFC's rad750 version has a similar calculation ((DecCount / 2062) * 1000)

Should this be changed to something like (for mcp750):   

{{{
LocalTime->microsecs = (uint32) (( ((double)DecCount) /8333.0) * 1000.0);
}}}

@skliper skliper self-assigned this Sep 25, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 25, 2019

Imported from trac issue 72. Created by sstrege on 2016-08-08T16:39:59, last modified: 2019-08-14T14:13:14

@skliper skliper added the bug Something isn't working label Sep 25, 2019
@skliper skliper removed their assignment Sep 26, 2019
@skliper
Copy link
Contributor Author

skliper commented Oct 2, 2019

@avan989 This would be a good one to look into if the issue still exists.

If it does I'd suggest:

LocalTime->microsecs = DecCount/8.333;

for simplest solution, and check if there's a test case to confirm this works as expected. Maybe even run on the target and confirm.

@jphickey
Copy link
Contributor

I am not familiar with the MCP750 hardware timer register that is being read here, but I would strongly discourage using any floating-point arithmetic in the calculation. This will almost certainly result in reduced performance.

Note that suggested literal numbers like "8.333" or even "8333.0" or "1000.0" are of type double. This will cause all the integer values to be promoted to doubles, and value to be computed as a double, and then truncated back to an int in the final assignment.

The complication of the existing calculation is almost certainly done with the intent of keeping the calculation entirely within the integer domain. Integer ops are usually an order of magnitude faster.

@skliper
Copy link
Contributor Author

skliper commented Oct 28, 2019

Good performance point. Really the two lines of concern are:

DecCount = DecCount % 8333311;
LocalTime->microsecs = (DecCount/8333) * 1000;

How about:

LocalTime->microsecs=(DecCount*1000000L)/8333311;

So it promotes to long and multiplies (never overflows because DecCount is always <8333311), does the full precision integer divide, then back into the uint32?

@jphickey
Copy link
Contributor

But the intermediate value (DecCount*1000000L) may overflow....

It also looks like the original code has another problem where if the DecCount ends up being within the range of [8333000, 8333310], then the division will result in 1000, which will produce a microsecs value of 1000000, which is invalid, as it is supposed to be in range of [0,999999]. So I concur that this code definitely needs a fix.

My recommendation is to do the multiply before the divide, but limit your multiplier to avoid uint32 overrun. So long as DecCount is guaranteed to be <8333311, you should be able to multiply by up to 512 (or do a 9 bit shift) first. (8333311 * 512) < 2^32.

So either:

LocalTime->microsecs = (DecCount << 9) / 4267;

OR:

LocalTime->microsecs = (DecCount * 500) / 4167;

Neither are perfect because 83333311 doesn't have any common factors with 1000000. But both of these approaches will get you a much better microseconds approximation than the existing code does. The left shift might be slightly faster.

@jphickey
Copy link
Contributor

Edit, just thought why not (3/25). This is 1/8.333333 and that is probably as close of an approximation as one can get with integer ratios, even better than my previous suggestions.

So, LocalTime->microsecs = (DecCount * 3) / 25;

Simple and probably the most accurate approximation of all.

@skliper
Copy link
Contributor Author

skliper commented Oct 28, 2019

But the intermediate value (DecCount*1000000L) may overflow....

Whoops, I meant LL (or ULL). Linux/gcc seems to promote it to 64 bit either way (I tested before I commented), but I haven't had a chance to figure out if that behavior is in the standard... either way probably more obvious as ULL.

So LocalTime->microsecs = (DecCount * 1000000ULL) / 8333311;

I'd prefer the exact conversion, vs 3/25 approximation, since they do give different answers. Biggest error is at max value to rollover your approximation would skip from 999997 to 0, where as ULL solution smoothly goes from 999999 to 0.

@jphickey
Copy link
Contributor

Although C99 requires long long of [at least] 64 bits, the 64-bit ops are not necessarily implemented in hardware. On a 32 bit processor the 64 bit operation it is likely to be emulated in software. At that point the floating-point solution might be more performant.

All this is just speculation though - you might want to write a test program and benchmark all three. But I do think if you do, you'll find that the 3/25 ratio is the best compromise of speed and accuracy.

Note that in my experience control loops do often call some form of polling of "gettime" from high priority threads. So its worth making it as efficient/lightweight as possible.

@skliper
Copy link
Contributor Author

skliper commented Oct 28, 2019

Lacking context, how do you suggest we evaluate accuracy vs speed?

@jphickey
Copy link
Contributor

Just had one more thought on this. How about:

LocalTime->microsecs = ((300 * DecCount) + (DecCount / 1244)) / 2500;

This is an effective conversion ratio of ((300 + (1/1244) / 2500), or approx 1/8.33331100399, so its almost exact, with probably less performance penalty than either of the other options, as it f just an extra integer division. It should roll over smoothly with no "skipped" values.

@jphickey
Copy link
Contributor

You could test for speed by simply writing a small C program that implements all three of the conversion styles (32 bit int, 64 bit int, double floating point) and just run it in a tight loop for a few seconds and count iterations. You need to run it on the actual target hardware, of course.

@skliper
Copy link
Contributor Author

skliper commented Oct 28, 2019

I wasn't questioning how to test the speed of the solutions, but how do you suggest trading accuracy vs speed (3/25 is faster by x, but at the "cost" of up to 2 additional usec of error).

Note (300*DecCount) overflows on Linux w/ gcc withough declaring 300 as unsigned. So:

LocalTime->microsecs = ((300u * DecCount) + (DecCount / 1244)) / 2500;

Gives the exact same results as the ULL solution over the full range, on Linux. Definitely worth a speed check on all three. Since it's fun to guess, I bet this solution takes ~2x as long as 3/25, and is within 10% of ULL.

@jphickey
Copy link
Contributor

I think for the tradeoff, it is just a subjective value decision at this point whether 2-3us of possible error is worth the extra CPU cycles that a more arithmetically-correct implementation produces. Obviously existing code was running OK with up to 1000us (or more!) of error in the existing implementation. To put it in context, most things in CFE apps don't get scheduled at a frequency higher than 100Hz or so anyway, usually more like 1Hz or less.

It does speak to the need for PSP requirements though. Otherwise one cannot say whether CFE_PSP_GetTime is behaving within requirements or not here, or be able to quantify if a slight loss of precision for a faster implementation is "worth it" or not. It is a judgement call for the platform's users to make.

This is also an area that IMO could use clean up because the MPC750 PSP passes a somewhat hw-specific timer up the stack that "rolls over" at weird intervals and there is a lot of corresponding complexity in both CFE_TIME and the PSP API to deal with that rollover, which is really specific to this PSP. But that is a separate issue, a discussion to be had when reviewing PSP requirements. I would definitely advocate for simplification here.

Pedantic note: you really shouldn't need the "u" suffix on literals in the code, because the DecCount is already a uint32, not an int32. So the literal will be promoted to unsigned to match before the multiplication occurs, and the result will be unsigned. If you tried this, my guess is your test variable in the test implementation might be signed? Usually I do not like to add the suffixes, they can have unintended consequences in cross platform environments where the C-standard type promotion rules usually work fine.

@skliper
Copy link
Contributor Author

skliper commented Oct 29, 2019

it is just a subjective value decision at this point

Definitely agree, which is why I lean towards just picking a solution, and documenting the other options in a comment. I'd prefer to avoid implying there is one right answer for all use cases, rather pick one present the trade and move on rather than spend too much time trying to pick the "right" one since it depends on context.

Obviously existing code was running OK with up to 1000us (or more!) of error in the existing implementation.

User submitted a bug report and a fix. I'd bet they "cloned and owned" the fix for their use rather than living with 1000us of error. I don't think your assumption here that 1000us of error was "OK".

It does speak to the need for PSP requirements though. Otherwise one cannot say whether CFE_PSP_GetTime is behaving within requirements or not here, or be able to quantify if a slight loss of precision for a faster implementation is "worth it" or not. It is a judgement call for the platform's users to make.

I suggest we take an approach similar to OSAL, where the API and behavior gets documented. Basically this is what the code is expected to do and why, but allow projects the freedom to make the judgement calls (avoid expressing "requirements" that aren't really "required" without additional project context, like a timer shall be accurate to Xus or similar). The PSP's we provide as part of the open source cFS Framework are example implementations, so I'd prefer to scope our effort to describing the choices made vs trying to generate requirements that work for all use cases.

This is also an area that IMO could use clean up...

Agreed, but I'd consider it pretty low priority (at least from GSFC point of view) since no future projects have targeted this PSP for use. I certainly encourage a ticket to document clean-up opportunities/suggestions, but I'd consider it a low priority enhancement (unless funded I likely won't focus resources on it).

Pedantic note: you really shouldn't need the "u" suffix on literals in the code, because the DecCount is already a uint32, not an int32.

Agreed! I found my bug. I was defining DecCount as uint32_t, but then I used an iterator to check accuracy and defined it as int. Good catch.

@jphickey
Copy link
Contributor

This is also an area that IMO could use clean up...

Agreed, but I'd consider it pretty low priority (at least from GSFC point of view) since no future projects have targeted this PSP for use. I certainly encourage a ticket to document clean-up opportunities/suggestions, but I'd consider it a low priority enhancement (unless funded I likely won't focus resources on it).

The only reason I bring it up is because the issue is not confined to this particular PSP. If only this PSP had to deal with it, then things would be fine. But its an area where the abstraction is lacking, which in turn increases complexity in code on top of it (specifically CFE_TIME which uses this API) which IMO is the real issue.

The CFE_TIME has all sorts of complexity to handle the frequent wraparound because the PSP isn't actually handling it, its just passing basically a "raw" value up the stack. There are other related PSP calls - see CFE_PSP_Get_Dec, CFE_PSP_Get_Timebase, CFE_PSP_GetTimerLow32Rollover, etc. that are related to the insufficient abstraction, but I couldn't tell you what they are for or how one is supposed to implement them on other platforms.

The CFE_TIME has way too much complexity and there are a number of bugs against this. Fixing this should entail simplifying and bringing some sanity/consistency to the PSP time access functions, which would enable much of the complex wraparound logic to be removed from CFE_TIME, which really exists there only for this platform - other platforms don't have this wraparound issue.

Sorry if this is a diversion from the original ticket, this is a longer term fix but it should be on the to-do list somewhere because it will greatly improve CFE_TIME and other apps that need timing info.

@skliper
Copy link
Contributor Author

skliper commented Oct 29, 2019

... it should be on the to-do list somewhere because it will greatly improve CFE_TIME and other apps that need timing info.

Definitely agree. The following tickets are in at the cFE level under @acudmore review, addressing these and your comments above from an architectural design level is definitely on the to-do list. A review of abstraction across the entire cFS for time makes sense to me. That said, I don't have identified resources to focus on it this FY.

nasa/cFE#123
nasa/cFE#109
nasa/cFE#78

@skliper
Copy link
Contributor Author

skliper commented Oct 29, 2019

I did push a branch, but didn't do any associated testing. Really not all that useful without doing all the tests, but at least an example of what could be done.

astrogeco added a commit that referenced this issue Apr 21, 2020
Fix #76, Microsec conversion lost resolution
@skliper skliper added this to the 1.5.0 milestone Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants