-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normal core FPU emulation is incompatible with "Fast" Pentium memcpy trick #119
Comments
Reproduced your very interesting finding! PCem seems to correctly handle the above conversion in its "normal core"; I think below may be (abbreviated) hints.
|
:) That's actually an interesting hack. But, it's a hack. I think a better approach would be to use the "long double" datatype where possible to operate with 80-bit extended precision. I know that's not possible with Microsoft C++, but Linux+GCC has it, and that's where I can start to improve FPU emulation. DOSBox-X is more concerned with emulation accuracy, rather than shortcuts at the expense of FPU precision. |
Just emulate floating-point in software. On Mon, Jan 4, 2016 at 7:22 PM, Jonathan Campbell notifications@github.com
|
Of course! But for x86 targets, we could keep the accuracy and gain some performance if the C++ compiler can map the "long double" type to the right FPU instruction at compile time in the way it does now for "double". |
Here are further details on this issue: Compiled the source code from the above link and the dos program is available here: That should provide a simple test of the fild-fbstp operation. |
The fild/fistp issue is also reported here: And at their forum: And a comment from a user in that thread: That sounds somewhat similar to the pcem method of correcting the OP issue. However, wd favored the compiler approach for accuracy, presumably the same as the "long double" approach as discussed above (full 80-bit and not just recovery of 64 bits for the integer operation). Perhaps the already proven pcem method is worthwhile as an intermediate solution until the more thorough "long double" method is implemented; given that the full solution is difficult to implement, to fully test, and any drawbacks are unknown. Verified that PCem without this "fpu copy patch" does result in "black vertical bars on the screen" as in dosbox-x: Edit: confirmed that the above patch is running a parallel higher precision (HP) stack "ST_i64", at least for the integer operation, along with consistent use of bitwise operations to encode whether a fpu command should use one stack or the other. The other part are the functions "x87_st_fsave" and "x87_ld_frstor" which are called to pack and unpack the HP value (retain the integer portion) so bits are not lost as in the static_cast to a double. The FILD command will activate the HP stack by setting tag[TOP] while the FISTP command will detect that same tag[TOP] value to identify which stack to use (FILD vs. others). I think FLD and FXCH operate on both stacks. The statement "tag[TOP] &= ~TAG_UINT64" turns off the 3rd bit of tag[TOP] so the normal stack is active (many arithmetic and trigonometric commands). The statement "tag[TOP] |= TAG_UINT64" turns on the 3rd bit so the HP stack is active (FILD command). Any tag[#] value at 3rd bit of 1 indicates the HP stack. There are two issues here, the expansion of the fpu from 64-bits to the full 80-bits and the problem of static_cast demoting a 64-bit integer to 53-bits (or thereabouts :) ). The "solution" to the second issue would fix the problem in the demo from OP and the other instances mentioned in the web links above, but the loss of 80- to 64-bits seems to have less incompatibilities with software. If there was generally a loss of function from 80- to 64-bits, then wouldn't a Quake software renderer engine compiled without x86 code also exhibit graphical problems? However, I understand that dosbox-x goal is accuracy and not a subset of accuracy for running software. |
Logged the 'Bit64s' FILD values generated during key games and demos. Here is the log statement at the end of function "FPU_FLD_I64" in fpu_instructions.h: From brief testing, 'Bit64s" FILD values are: Codeholio calculated that the 64 bit double precision value can hold a mantissa length of nearly 52 bits (my paraphrase). For these approximations, we can assume that a 51 bit value is where the true FILD integer value will overflow in the fpu emulation since the conversion to double in emulation is not complete. Here is the decimal value for binary 2^51: 2251799813685248. That is a 51 bit value. For comparison to our above tests, 2^51 converts to (decimal) ~10^16 in size. So this is a rough approximate value where the FILD/FISTP command pair will show inaccuracies. None of the above test games show values near to 10^16 except for the Toontown demo, which exceeds it by 10^3, so the last 3 or 4 digits of those integers are presumably lost in the non-x86 fpu emulation. This is verified by examples from x86 source code where the FILD may operate on a double word (32-bit value) or on a quad word (64-bit); the latter shown by Codeholio's example during use of the "Fast Pentium memcpy trick" versus Quake source code where FILD is typically operating on double word values (x86). |
Ref: http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/CUJ/1996/9612/durham1/durl1.htm
The normal core attempts to use the host floating point support to emulate Intel x87 instructions, but it does so by typecasting everything to "double", even 80-bit extended formats.
Unfortunately, this typecasting causes data corruption for any demo or game that uses FILD+FISTP (64-bit integer load and store) as a "faster" method of copying memory on Pentium-class (pre-MMX) hardware. On an actual processor, the 64-bit integer fits normally into the mantissa of the 80-bit FPU register without losing bits, while in DOSBox-X's normal core, the typecast to "double" loses bits and causes data corruption. In most demos, since the fast memcpy trick is used to copy from a system buffer into video memory, this corruption is visible as black vertical bars on the screen that sometimes shift or blend into the picture content depending on how the 64-bit int -> double conversion truncates.
The text was updated successfully, but these errors were encountered: