Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable BFI #11342

Merged
merged 2 commits into from
Sep 19, 2020
Merged

Variable BFI #11342

merged 2 commits into from
Sep 19, 2020

Conversation

Ophidon
Copy link
Contributor

@Ophidon Ophidon commented Sep 18, 2020

BFI support added for 180hz / 240hz / etc. Solves issue with image retention from voltage issues at 120hz BFI. Also disabled BFI while in menu, as if set to an incorrect value for the current refresh rate, could cause severe flickering and difficulty reverting to the correct value.

Guidelines

  1. Rebase before opening a pull request
  2. If you are sending several unrelated fixes or features, use a branch and a separate pull request for each
  3. If possible try squashing everything in a single commit. This is particularly beneficial in the case of feature merges since it allows easy bisecting when a problem arises

Description

Changes BFi from an on/off value to an unsigned integer to support a variable number of black frames to insert between each real frame, for the rendering engines that the existing 120hz BFI was supported for. 1 is for 120hz, 2 for 180hz, 3 for 240hz, etc. Also I have turned off BFI while in the menu, so that if the value is set incorrectly (or you change refresh rates after setting it, you wont experience severe flickering while trying to revert the value.

Related Issues

#10754 [Feature Request] (BFIv2) BFI is more CRT-like at higher Hz. Need 60Hz BFI for 180Hz, 240Hz, 300Hz, 360Hz
This is not a -full- implementation of the request, as more customizable options for the bfi pattern are not implemented, but I believe by far the most common use of this will be on/off/off at 180hz, and on/off/off/off at 240hz, which is covered by this pull request. Further addressing the customization may come in an update.

Related Pull Requests

None

Reviewers

Unknown who should review.

BFI support added for 180hz / 240hz / etc. Solves issue with image retention from voltage issues at 120hz BFI. Also disabled BFI while in menu, as if set to an incorrect value for the current refresh rate, could cause severe flickering and difficulty reverting to the correct value.
@inactive123
Copy link
Contributor

inactive123 commented Sep 19, 2020

Hi, PR overall is good but I see some things that need addressing before we can merge this:

  • For loop initial declarations have to go, that's not C89 compliant. All variables have to be declared at either the beginning of a code block or the beginning of the function.

For loop iterators updated for C89 compliance.
@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 19, 2020

For loop iterator declarations have been updated.

@inactive123 inactive123 merged commit 7b600d4 into libretro:master Sep 19, 2020
@inactive123
Copy link
Contributor

Hi there @Ophidon ,

can you double check if this PR possibly broke fastforwarding in windowed mode on Windows? Because I seem to no longer be able to use that, and I was wondering if this PR could be related to that.

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

I just checked and windowed FF is working for me. BFI is supposed to (and correctly is for me) shut down during fast forward.

BFI itself however does not work without extreme flickering in windowed mode, even when set to the correct multiple of 60hz. It doesn't work correctly in the original 120hz version for me either though. I presume from dwm hijacking the frame timings.

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

Further clarification that I haven't rebuilt with any of the newer merges since this was done, I will do that this evening and see if FF fails for me then as well.

Also, I was already planning an update to further address the customization to the BFI pattern from the BFI v2 issue, but I can also force disable bfi and hide in the menu, in situations where it is known to work incorrectly (so far in win10 windowed mode or with gsync timing on).

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

Congratulations on the initial BFIv2 source code commit!

This makes RetroArch 180Hz BFI and 240Hz BFI capable now!

This really looks great, superior to 120Hz BFI as expected. And also burnin-proof.

Very good for fast-scrolling arcade games, including platformers and other things that often motionblurred on LCDs.

Still Needs Custom Configurability

However, the comma-separated feature is still currently missing; so this item can't yet be closed until the BFIv2 is considered feature-complete. Internally, the string would be a float array, which indicates the alpha value between the visible image and a black frame (0.5 representing a half dimmed frame).

  1. At first, the comma-separated BFI string could be a configuration-file-only feature initially. Basically of only interest for experimentation. Wouldn't this be a quick change? Doesn't RetroArch support undocumented options? (If I am wrong, then yeah, it would be harder).

  2. Use padding (zeros) for BFI-sequence float array / comma strings that are too short. "1,0" for 120Hz would means "1,0,0,0" for 240Hz.

  3. Use trunctation for BFI-sequence float array / comma strings that are too long. "1, 0.5, 0.25, 0" for 240Hz becomes "1, 0.5" for 120Hz.

  4. Standard settings can be in menus, while custom string would be loaded from configuration file.

Settings configurability considerations

Since comma-separated strings can't be done in settings, you can simply load a string from configuration file and parse it into a float array. Internally RetroArch can synthetically generate its own internal BFI-sequence float array (stand-in for comma-separated string).

Easiest is to probably use a list of named profiles (predefined comma strings for common BFI modes like

Example named profiles for hardcoded BFI strings built into RetroArch
One Visible Frame (120Hz+) "1,0,0,0,0,0"
Two Visible Frames (180Hz+) "1,1,0,0,0,0"
Three Visible Frames (240Hz+) "1,1,1,0,0,0"
Phosphor Decay Simulation "1,0.5,0,25,0.10,0,0"
Custom (load string from config file)

One of the settings being "Custom" to load a hand-edited string from configuration file.

If you want to give users more menu configurability, you can use things like "BFI Full Brightnes Frame Count: Number", "BFI Decay Frame Count: Number" (phosphor decay simulation, fade frames to black in consecutive refresh cycles). And use a "Custom" setting to optionally load a string from configuration file.

Custom BFI string feature is needed to enable advanced-user configuration (people like me) to discover improved BFI sequences as well as burnin-proof BFI sequences for specific monitors, as well as more eye-friendly BFI sequences or those that are brighter without sacrificing too much motion blur, etc.

That way, for future commits, discovered good strings can later be added as extra named profiles for easy user selection. Since Custom BFI strings is essentially a trailblazer feature, it is acceptable for that to be a hand-edited line in a configuration file. Not all advanced users can compile RetroArch, like most of the Blur Busters audience. So to close BFIv2, an access to an optional custom string is a required feature request for incubation's sake.

Any of the above methods would qualify for closing this BFIv2 github feature request item.

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

It's Easy To Make BFI work with G-SYNC

Also, I was already planning an update to further address the customization to the BFI pattern from the BFI v2 issue, but I can also force disable bfi and hide in the menu, in situations where it is known to work incorrectly (so far in win10 windowed mode or with gsync timing on).

You don't have to force-disable BFI during G-SYNC. You just have to correctly pace your Present()'s to make G-SYNC simulate a fixed-Hz monitor of a specified refresh rate.

What many developers don't realize is that software has universal control over the refresh rate of a G-SYNC monitor. The monitor dynamically refreshes when the software presents a frame. As long as intervals between frame presentation API -- DirectX Present() or OpenGL glxxSwapBuffers() -- are within the VRR range -- then the software is currently controlling the refresh rate of the VRR display.

So voila -- to emulate a fixed-Hz monitor via G-SYNC -- simply pace your frame presentations, and you've got your software-driven fixed-Hz refresh rate!

image

(Yes -- even today. See high speed videos of display refresh cycles at www.blurbusters.com/scanout -- A display refreshes from top-to-bottom (raster), whether it's a 1930s analog TV or a 2020s DisplayPort screen, as seen in high speed videos, www.blurbusters.com/scanout .... Raster refreshing has almost always been top-to-bottom for a century.)

Now, to make a VRR display mimic a fixed-Hz display, you want to accurately time the starts of refresh cycles. But, guess what? The software controls refresh cycle timing on FreeSync and G-SYNC screens!

By using a fixed microsecond-precise timer (via a thread doing busywait) on frame presentation, you can simulate a fixed-Hz display via VSYNC OFF or a variable refresh rate display. You can detect unsynchronized behavior via consistently-nonblocking frame presentation that doesn't confirm to a VSYNC interval. RetroArch already supports G-SYNC, so just piggyback on its existing G-SYNC metronome.

Be noted G-SYNC + VSYNC ON will start blocking if you try to do waitable-swapchain Present() with a frametime less than refreshtime, since a display busy-in-scanout will begin blocking the next frame-present call.

Now, G-SYNC + VSYNC OFF will never block, so you will have to fully rely on a timer-based approach, and guess the refresh rate of the display (this can be retrieved by querying the operating system), to be used in your BFI calculations.

A Simple Universal BFI Timing Algorithm For VSYNC ON/OFF/VRR

  1. Detect display Hz, or preferred overdrive Hz that is within VRR range (user-specified "180")
  2. Divide display Hz by emulator Hz
  3. Use BFI sequence that matches (trunctate or pad as needed)
  4. Use an event and/or busywait one display Hz from last known timestamp of last present
  5. Get Timestamp + Present.
  6. Repeat step 4-5 for BFI sequence for current emulator refresh.
  7. Repeat step 4-6 for next emulator refresh

If it's VSYNC ON, step 5 will be blocking anyway, and step 4 will have no busywait delay.
If it's VSYNC OFF, step 5 won't block, so step 4 will do the blocking behaviour
If it's G-SYNC / FreeSync (default VSYNC ON), step 5 sometimes will block and sometimes won't, step 4 compensates

Algorithms can interrupt step #6 as needed.

I think RetroArch already has mechanism for step 4 because RetroArch supports G-SYNC. So just piggyback off its existing technique? I am not sure if the existing G-SYNC framepacing uses a timer or a busywaiter (the latter is superior, though you can timer up to about 0.5ms-1ms prior, then busywait the rest of the way) but clearly RetroArch already has it.

This universal algorithm should work with both G-SYNC and non-G-SYNC operation, VSYNC ON and VSYNC OFF. There will be a stationary or slowly-scrolling tearline during BFI during VSYNC OFF, but that's easy to say "whoops, I forgot to use VSYNC ON". And will work normally with both VSYNC ON, FreeSync, FastSync, G-SYNC, Enhanced Sync, etc, although most reliably with VSYNC ON, it would remain a good user experience.

Busywaiting is recommended in this particular case because sub-millisecond divergences in BFI is human visible flicker. 400 microsecond extra in a 4ms flash is equal to a 1% brightness difference, which can look like a slight candlelight flicker if the length variance is erratic/random. Since monitor refresh cycles is driven by software in G-SYNC and FreeSync, software timing variances in Present() time creates an error margin. So you may want to use a timer up to about 0.5ms before the event, then busywait (loop) on the CPU to the correct microsecond. Most timer events are not accurate to the microsecond.

If you don't have a G-SYNC monitor, then test the algorithm using VSYNC OFF. Optimize it until BFI works correctly during VSYNC OFF (except for a near-stationary tearline effect). Once you do that, it'll automagically work correctly for G-SYNC and FreeSync (without tearing).

It Makes 180 Hz BFI MUCH Easier For 240 Hz Users

You can also override the G-SYNC refresh rate via a refresh rate option (settings screen) or cap option.

A 240Hz monitor can easily emulate a 180Hz monitor simply via 1/180sec framepacing of Present(), and you can have it exactly 3x the RetroArch refresh. Your 54Hz MAME emulator, your 50Hz PAL emulator, and 60Hz NTSC emulator, all automatically BFI-accomodated with no settings fiddling, no ToastyX needed, no NVIDIA Control Panel needed, no custom Hz needed!!!!!!!!! No need for a custom 180 Hz refresh rate, because 240 Hz G-SYNC already can emulate 180 Hz via 180 fps quite accurately and easily. 150Hz for PAL and 180Hz for NTSC are within the VRR ranges of all known 240Hz VRR monitors.

Theoretically, you can use variable BFI that isn't integer-divisible, like a visible frame 1.73851/240ths second and the remainder black, so you have an analog BFI ratios as long as the frametime of all visible and black frames are within the refresh rate range.

However, for simplicity, I'd suggest fixed-Hz simulation with a universal sync-agnostic algorithm, that way it automagically works properly in more situations.

Extra Reading

I helped a game developer add VRR support to their Unity engine game (Cloudpunk), and they credit me in their release notes with many rave user reviews in the comments section.

Long-term: Separate Frame Presenter Thread

Not mandatory to close BFIv2, but might make G-SYNC BFI support programmatically easier

Long-term, frame-presentation threads need to be put in a separate thread than rendering. This allows the thread to be permitted to busywait without interfering with the rest of the emulator. Basically, the thread becomes a software-based VSYNC method for all VSYNC OFF technologies, and it would do its own refresh-partitioning, simulating the blocking behaviour of 60Hz VSYNC ON. It would also emulate refresh rate switching without actually switching refresh rates (e.g. 150Hz BFI or 200Hz BFI for PAL videogames on a 240Hz monitor, while preserving original 50Hz blocking behaviour for the emulator module).

Basically a wrapper for your existing frame presentation.

It also makes it easier to switch G-SYNC framepacing from a timer-based system (good, but could be better), to a hybrid busywait-based system (even better). Yes, busywaits are a necessary evil, simply because because of software-driven refresh cycles. To minimize CPU core load, you timer until ~0.5ms prior (configurable margin perhaps) and busywait the rest of the way. And do it only one core only in a separate thread -- which is a good reason to make frame presenter a separate thread.

It also better decouples emulator activity from frame presenting, helping emulator hogs on slower computers too.

Separate frame presenter thread also makes it easier to add future beamracing support, too.

This frame-present thread approach is also more future-proof to a path towards rolling-scan BFI (the BFIv3) described at #10757 #10757

The separate frame-presenter thread makes long-term software-based VSYNC ON emulators regardless of the actual hardware sync technology in use. So you can more easily implement accurate G-SYNC BFI, or tearingless VSYNC OFF (beamracing #6984 ...), or improved G-SYNC framepacing precision with heavy emulators, etc.

Also, opportunities to decouple from emulator Hz can occur too.

  • Future dynamic adaptation. BFI could transparently enable/disable when incoming emulator framerate into the present thread is now mismatched with refresh rate, or automatically switch (50Hz vs 60Hz) without needing to actually switch display refresh rates. For example, 300Hz fixed-Hz (like the Razer 300Hz laptop) can do both 50Hz and 60Hz BFI. And 240Hz VRR screens can do any refresh rates in VRR range simply by software, supporting all odd emulator Hz.
  • Future flicker prevention recovery from missed refresh cycles. If missed refresh cycles are detected, extra brightness or extra dimness can be added to subsequent refresh cycles -- as a compensation move -- to make the missed refresh more invisible in flicker error.
  • Future image retention prevention. We know evenly-divisible Hz has issues (120Hz) with image retention with BFI on some screens. But there's algorithms to eliminate that. For example, BFI could dynamically slew a little slow or a little fast (0.01%) to add duplicate refresh cycles for burnin-prevention (240Hz, 360Hz), emulating a fixed refresh rate slightly higher or lower than real refresh rate, to achieve invisible burnin-prevention by adding a phase-swap refresh cycle for even-Hz situations.

The frame presenting thread should do nothing except timing/framepacing frames where possible, and should run at a higher thread priority than anything in the emulator, since this present-thread is essentially a defacto software implementation of display (a software-based display), so naturally priority should pre-empt everything else, to prevent erratic flicker artifacts.

This universal technique is good architecturing coding best-practice for emulators of the 2020s and 2030s.

@hizzlekizzle
Copy link
Contributor

Config-only options are acceptable, yeah. However, is it really necessary to expose the comma-separated values to the user? That is, will users really need to micromanage this? Wouldn't it be better to just hardcode them?

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

Config-only options are acceptable, yeah. However, is it really necessary to expose the comma-separated values to the user? That is, will users really need to micromanage this? Wouldn't it be better to just hardcode them?

That's exactly what I said. Profiles. With "Custom" to optionally load a custom string. :)

Custom BFI strings are needed for incubation by advanced users; see above & below for explanation.

Example named profiles for hardcoded BFI strings built into RetroArch
One Visible Frame (120Hz+) "1,0,0,0,0,0"
Two Visible Frames (180Hz+) "1,1,0,0,0,0"
Three Visible Frames (240Hz+) "1,1,1,0,0,0"
Phosphor Decay Simulation "1,0.5,0,25,0.10,0,0"
Custom (load string from config file)

One of the settings being "Custom" to load a hand-edited string from configuration file.

If you want to give users more menu configurability, you can use things like "BFI Full Brightnes Frame Count: Number", "BFI Decay Frame Count: Number" (phosphor decay simulation, fade frames to black in consecutive refresh cycles). And use a "Custom" setting to optionally load a string from configuration file.

Custom BFI string feature is needed to enable advanced-user configuration (people like me) to discover improved BFI sequences as well as burnin-proof BFI sequences for specific monitors, as well as more eye-friendly BFI sequences or those that are brighter without sacrificing too much motion blur, etc.

That way, for future commits, discovered good strings can later be added as extra named profiles for easy user selection. Since Custom BFI strings is essentially a trailblazer feature, it is acceptable for that to be a hand-edited line in a configuration file. Not all advanced users can compile RetroArch, like most of the Blur Busters audience. So to close BFIv2, an access to an optional custom string is a required feature request for incubation's sake.

In other words, this is what I meant, user visible option -- A list of named BFI profiles mapping to an internal hardcoded string.

Select BFI Mode: [One Visible Frame]

<PseudoCode>
[ListBoxKeys]
"One Visible Frame"="1,0,0,0,0,0"
"Two Visible Frame"="1,1,0,0,0,0"
"Three Visible Frame"="1,1,1,0,0,0"
"Phosphor Decay Simulation"="1,0.5,0,25,0.10,0,0"
"Custom From Config"="Load user edited string from config file"
</PseudoCode>

Or whatever format used internally, such the string pre-split into arrays of floats. The "1,0,0,0,0,0,0" would be trunctated to "1,0,0,0" for a 240Hz display, obviously, and "1,0,0" for a 180Hz display.

I am open to other ideas, but a custom feature of some kind, is required to close BFIv2.

I just think that this is probably the easiest, most intuitive way.

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

If you want to give users more menu configurability, you can use things like "BFI Full Brightnes Frame Count: Number", "BFI Decay Frame Count: Number" (phosphor decay simulation, fade frames to black in consecutive refresh cycles).

Working off a combination of 2 simple values for full brightness frames and decay length was my plan for the customization. Simpler for the users and simpler for me, lol.

Which brings me to the next point that proper g-sync handling BFI would be in a considerably later update, unless someone else wants to code it. I am a software developer by both job and degree, but all of emulation, graphics programming, and C in general are considerably outside of my typical working area, and I have far from unlimited time for hobby projects. I'm just doing what I can as a blur reduction enthusiast with a modicum of coding ability to make things better than they were. If someone more skilled in these fields wants to take over, feel free. ;)

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

Working off a combination of 2 simple values for full brightness frames and decay length was my plan for the customization. Simpler for the users and simpler for me, lol.

(See most recent post above for context)

<PseudoCode>
[ListBoxKeys]
"One Visible Frame"="1,0,0,0,0,0"
"Two Visible Frame"="1,1,0,0,0,0"
"Three Visible Frame"="1,1,1,0,0,0"
"Phosphor Decay Simulation"="1,0.5,0,25,0.10,0,0"
"Custom From Config"="Load user edited string from config file" (you can skip this if you don't have time)
</PseudoCode>

At the minimum, what's your opinion of using a list of named profiles instead as in above message? It would reduce the BFI adjustments to just 1 setting, potentially simplify developer + user.

  • It can simply be an array of structs
  • struct is one string (visible named option for list of profiles) and one fixed array of floats (BFI sequence)

In other words, a const array of structs.
Very easy to define as a constant in C++, no?

Then that way -- if you're not going to add a "Custom", then somebody else can add a "Custom" option that loads from a config file later. Most users will probably only be interested in presets anyway. Fewer options make it less complicated for everyday users (who may not understand phosphor decay), but still keeps access to more customized BFI. Such very advanced users interested in nuances of phosphor decay, would thus likely already be experienced in hand-editing MAME HLSL configuration files, and thus no discomfort in editing a custom BFI string.

You could skip "Custom" for now -- at least if you code this way, you've saved yourself programming time (less math, only one option instead of two), and saved the next developer's time (easier ability to add custom string).

If you haven't yet started programming the feature.

Thoughts?

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

P.S. Timesaver possible C++ initializer for hardcoded profile list might be something like:

(not necessarily RetroArch coding/naming standard, but similiar)

typedef struct BFI_Profile {
  string name;
  float sequence[8];
} BFI_Profile_List;

const BFI_Profile_List BFI_LIST_BOX[5] = {
  { "One Visible Frame", { 1,0,0,0,0,0,0,0 } },
  { "Two Visible Frame", { 1,1,0,0,0,0,0,0 } },
  { "Three Visible Frame", { 1,1,1,0,0,0,0,0 } },
  { "Phosphor Decay Simulation", { 1,0.5,0.25,0.1,0,0,0,0 } },
  { "Custom", { 1,1,1,1,1,1,1,1 } }
};

(Not compile tested, might not work in all variants of C++, so optimize accordingly)

That way, developers can experiment with adding/removing strings, even before Custom logic (to load from configuration file) is implemented.

I don't know if this is a timesaver for Ophidion?

Hits three birds with one stone (save time for Ophidion + saves time for next developer + makes BFI less complicated for 90% of users).

Also not mutually exclusive to the two-settings configurable option idea in future, it'd just pre-generate the sequence array. I consider paths to complete configurability much more important than full UI flexibility to users, with the consideration.
(A) Everyday users who just want an easy way to turn on BFI
(B) Advanced users who has special interest in nuances of BFI

This path would simply initially be:
(A) Achieved
(B) Achieved for software developers (edit the source code)

Which is easier to progress to
(A) Achieved
(B) Achieved (via config load implemented by another developer, or a future advanced Custom BFI Settiungs screen)

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

Note that we're working in C, not C++. A far more annoying (but admittedly powerful in its own ways) language.

Anyway, predefined patterns were my previous solution, but I sort of liked the idea of a bit more direct user control with full bright frames and decay speed when you brought it up. If I do go with the predefined patterns, the actual comma separated list still isn't critical, you just need to implicitly know what to execute based on the pattern value, and yes the same pattern value would be able to optimize itself to each refresh rate.

In the end, I think the conflict comes down to where I'm not as concerned as you about giving even power users absolute 100% control over the pattern, I'm just concerned about covering the (extreme) majority of actual use cases in a way that is the most simple to implement.

@mdrejhon
Copy link

mdrejhon commented Sep 21, 2020

Gotcha, yes, C instead of C++. (Different parts of RetroArch use C++).

Yeah... So scrap the easy struct idea (I tried).

Appreciate your due diligence so far, just trying to brainstorm a compromise that considers the target audiences,
(people who just want an easy way to turn on/off BFI with common recommended settings, and people who truly want to fine-tune BFI)

People often ask "what settings do I use!?" so having a high minimum floor of complexity is occasionally problematic. As long as you keep to comfortable defaults for BFI on/off, you're probably OK.

We're in a situation where we're going from simple BFI (50%:50%) to a potentially complex explosion of BFI flexibility, so we have to be responsible trailblazers for future BFI use in other emulators -- where users may just ignore BFI if it's unreliable (doesn't work with most sync technologies) or looks bad with randomly adjusted non-self-explanatory settings users are familiar with. Ultimately BFIv2 explodes into BFIv3 complexity, and eventually we need predefined BFI much like we need predefined CRT filter settings (to mimic common CRTs).

Make sure what you're doing is not mutually exclusive with what a future developer wants to do for custom settings -- see if you can architecture it in a path that will let another developer easily add custom patterns. (Like internally using float arrays). Basically, pre-generate your BFI sequence (based on the adjustable settings) into a internal array of floats, rather than dynamically calculating mid-BFI. Even if you have to regenerate the BFI array dynamically at the beginning of every emulator refresh (if dynamic math is part of your plan)

Just an architecturing idea, if it's zero extra time to implement.

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

As per this morning, I rebuilt with fully up-to-date commits, and fast forward still works in windowed mode Windows for me.

@Ophidon
Copy link
Contributor Author

Ophidon commented Sep 21, 2020

To you Mark,

I unfortunately don't see using BFI being super straight forward to the uninitiated regardless of the in-emulator settings at the moment (though I will strive to make the defaults the best to my eyes for the various current refresh rates).

I think a blog post how-to will be required. Conveniently I think you happen to be familiar with a site devoted to all things blur busting. ;) The reason I believe that are the numerous factors entirely outside of RetroArch, that are required for it to perform optimally.

6bit + FRC panels still have color issues even at 180hz and 240hz for me. These are getting rarer as even the newest TNs are true 8bit like my HP Omen X 27, but it'll be an issue for a while yet.

The 120hz voltage image retention issue will remain a big thing to constantly inform new people trying it out about. As 120hz will be the refresh rate used (and only available) for most people trying out BFI for quite a while.

DWM in Win10 will continue to ruin it in windowed mode Windows (though this can be hidden from the users by just not making it available in this situation). I don't see a software solution to get around dwm like might be possible as you stated for g-sync.

Extraneous software like overlays or even MSI afterburner with overlays and power monitoring off (which -many- people use, and I verified in detail was causing this issues for me on your own forum), or whatever else people have running on their machines can cause intermittent flashing from skipped frames.

People's various level of hardware they are running on, and how strenuous of a core they are running will also determine how likely they are to experience intermittent skipped frame flashing. I was testing this on a well OC'd 8700k, and it took finding and fixing the afterburner issue to make it run perfectly. I can only imagine how much more difficult it might be on a low end laptop.

In the end, for the 95% not pushing the bleeding edge of hardware, this is probably having a feature ready for them, that they wont be ready to use optimally for a good half decade at the least.

@mdrejhon
Copy link

mdrejhon commented Sep 23, 2020

I certainly make huge walls of text, so I'm adding headings to my reply to partition my wall for easier reading.

Easier Self-Explanatory Settings

I unfortunately don't see using BFI being super straight forward to the uninitiated regardless of the in-emulator settings at the moment (though I will strive to make the defaults the best to my eyes for the various current refresh rates).

This is why I suggested a single BFI setting that has hardcoded profiles, with everything else hidden behind Custom.

This could be as simple as uising very descriptive named profiles that are more self-explanatory (instead of "1 Frame", "2 Frame", "3 Frame"). That's why I like named profiles for everyday users, it's easy to rename the nomenclature of them to be more self-descriptive, perhaps with also optional detailed-text stuff.

Example descriptive BFI profiles can be:

  • "Best Motion Blur Reduction (1 visible frame, for 120Hz and up)"
  • "Less Motion Blur Reduction + Brighter (2 visible frame, for 180Hz and up)"
  • "Least Motion Blur Reduction + Brightest (3 visible frame, for 240Hz and up)"
  • "CRT Phosphor Decay Simulation (For 180Hz and up)"
  • "Load Custom Settings from Config"

Yes, I'll Eventually Make Blog Post

I think a blog post how-to will be required. Conveniently I think you happen to be familiar with a site devoted to all things blur busting. ;) The reason I believe that are the numerous factors entirely outside of RetroArch, that are required for it to perform optimally.

Yes, I plan to create articles about this within the next year, once more experimentation is done. GroovyMAME now has the 180Hz and 240Hz BFI feature, and now RetroArch will have this (unlocking 180Hz and 240Hz BFI for a huge number of emulator modules).

However, with descriptive profiles and automatic algorithm, this can open BFI ability to a larger population with fewer instructions.

Good News, 8-bit IPS May Mostly Replace TN In Quantities

6bit + FRC panels still have color issues even at 180hz and 240hz for me. These are getting rarer as even the newest TNs are true 8bit like my HP Omen X 27, but it'll be an issue for a while yet.

I found that there is no color issues on high-Hz IPS panels. The only technology a 360 Hz monitor comes in is an IPS panel, and software BFI looks massively better on these high-Hz IPS panels than on high-Hz TN panels.

Several people said the ViewSonic XG270 (the Blur Busters Approved certified one) running software BFI, looked like a Sony FW900 CRT tube (motion clarity wise), with zero strobe crosstalk visible, and zero color degradation. That's pretty neat. 4ms MPRT only has slightly under 100 nits but 180 Hz permits 5.5ms MPRT that exceeded about 100 nits, which is brighter than many legacy CRT tubes.

The sheer extra refresh rate (360Hz) on an IPS, combined with the IPS technology being 3x faster GtG than yesterday's IPS panels, make them nearly as fast as TN, so the number of disadvantages of software BFI is even fewer;

TN will continue to be useful but popular sales (mainstream sales), once prices fall sufficiently, may eventually be largely IPS going forward.

There Is a Burn-In Fix Available For 120 Hz BFI

The 120hz voltage image retention issue will remain a big thing to constantly inform new people trying it out about. As 120hz will be the refresh rate used (and only available) for most people trying out BFI for quite a while.

See Burn In Fix For Software BFI

Until the 120Hz BFI adds an automated burnin-avoidance algorithm. That can become easy to do once creating your own frame-presenting thread (your own software-based VSYNC), and running it at a tiny Hz off (microseconds per refresh cycle), to allow the refresh rates to slew one destination refresh cycle every minute, to allow the phase-swap to prevent the burn in, as described:

DWM isn't a problem if using the universal algorithm that works for G-SYNC

DWM in Win10 will continue to ruin it in windowed mode Windows (though this can be hidden from the users by just not making it available in this situation). I don't see a software solution to get around dwm like might be possible as you stated for g-sync.

DWM is not a problem if you use the algorithm I described. BFI works fine as long as youre microsecond framepacing the presents, to the known refresh rate of the display which can be detected via OS calls. You can wrapper the refresh-rate-detection OS call to a crossplatform manner, to obtain the appropriate Hz from the appropriate operating system to the nearest decimal digit.

The Windows method is (double)vSyncFreq.Numerator / (double)vSyncFreq.Denominator from data returned by Windows API QueryDisplayConfig() ... you will get the exact number such as 59.940 Hz or 60.003 Hz, and you can speed up / slow down RetroArch to sync to the exact Windows refresh rate, if you wanted... You can use this to automatically detect if multiple-compatible BFI is possible. You can do this for all monitors, so you can use the refresh rate of the monitor that the biggest portion of the RetroArch window is positioned on (this is how Windows chooses which monitor DPI for on-the-fly DPI-switching: For windows overlapping multiple monitors, the biggest surface area of the application "wins" the monitor).

MSI Afterburner is actually partially/mostly solvable

Extraneous software like overlays or even MSI afterburner with overlays and power monitoring off (which -many- people use, and I verified in detail was causing this issues for me on your own forum), or whatever else people have running on their machines can cause intermittent flashing from skipped frames.

This definitely is true. However, I don't get flashing from MSI afterburner when the overlay disappears, as long as I'm correctly framepacing the frame presentations with high-thread-priority precision busywaits (instead of timers).

People's various level of hardware they are running on, and how strenuous of a core they are running will also determine how likely they are to experience intermittent skipped frame flashing. I was testing this on a well OC'd 8700k, and it took finding and fixing the afterburner issue to make it run perfectly. I can only imagine how much more difficult it might be on a low end laptop.

This be true, but the next programmer can also make it more user friendly by using the high-priority threaded presenter (to allow your own roll-your-own-sync-technologies). That's how RTSS achieved a software-based low-lag "VSYNC ON clone" via tearingless VSYNC OFF -- pretty much a DIY sync technology in a microsecond-priority frame presenter thread.

Making It Automagically User Friendly >90% Of Time

Flicker-reducing manoevers from my experience:

  • Present thread + raised thread priority + the algorithm I described
  • Remember to keep a trailing second's worth of frame-present timestamps from RTDSC or other high precision clock. Do a check before/after present, and watch for jitter on both leading and trailing edges.
  • Busywaits instead (or in addition to) of timers to align frame presentation best-effort to microsecond. One can also timer up to about ~0.5ms prior, then busywait remaining of the way.
  • Decouple frame presenting from emulator thread; keep seqeuencing BFI even if number of refresh cycles per emulator refresh cycle begins varying (this is doable if you're running a separate present thread). Use blending algorithm to slowly slew the phase of BFI back to the lowest-lag sequence in a dynamic manner without flicker.
  • Automatic disabling of power management whenever BFI is enabled or whenever future beamracing is enabled (Device Power API)
  • Self detection of mis-precision (detection of excess time jitter between frame presentation)
  • Detecting the max refresh rate of the monitor the RetroArch window is currently positioned on (QueryDisplayConfig for decimal-precision refresh rate, other OSes have similar methods, so use wrappers)
  • Letting user configure the refresh rate;
  • Automatic enabling of anti-burnin logic (by default) for evenly-divisible Hz
  • Disable BFI automatically when conditions fails (e.g. fixed-Hz not divisible by emulator Hz, or window overlaps multiple different-Hz monitors, or during fast forward, or when present-linked timestamps starts jittering randomly multiple instances for sustained periods (like a full second) -- e.g. slow systems, power management fluctuations, massive background activity, AfterBurner, etc), frame presentation jitters injected by externals (like AfterBurner) are detectable via RTDSC / high precision clocks / etc.

Autoconfigure KISS: Assume max Hz for. For VRR displays, API check will return max Hz and you can simply for safety assume that, and run logic on that. For VRR users you can let them manully enter a Hz or a VRR range to allow automatic switching decisions to be made (e.g. 150fps=Hz PAL versus 180fps=Hz NTSC). There's no difference between fps and Hz within VRR range -- fps is the Hz -- Hz is the fps -- the frame presentation drives the refresh cycles).

This then turns software-based BFI into an easy setting that can be turned on/off and works reliably with fixed-Hz, G-SYNC, Fast Sync, Enhanced Sync, Windows DWM, etc.

I have achieved 10 microsecond precision in C#, enough to eliminate flicker

Using higher thread priority + busywait technique for the frame presenter thread, I have achieved sub-10 microsecond frame present time jitter in C# using Tearline Jedi when it is the only app running. In other words, the RTDSC / QueryPerformanceCounter() timestamps calculating the time interval between Present() -- and intervals varied only by less than 10 microseconds on a typical i7 + NVIDIA system, when compiled as Release x64 code (rather than Debug) and I was using a slow garbage-collected language of C#

But yes, this is extra programming complexity to make things easier for users. I've already done this programming (on a paid basis in other application projects) so I'm happy to give you advice on how to improve BFI reliability.

I've done over a thousand hours of BFI research with multiple parties / multiple vendors.

Aside: One minor further-future (easily solved) potential complication is pre-rendering curved CRT filters and beam racing, in a theoretical present-one-pixel-row-at-a-time workflow scenario made possible via #10758. In this case there are multiple solutions (all easily hidden in the jitter safety margin of beamracing emuHz to realHz), as you don't need 1:1 mapping of scanlines between CRT-filtered emulator buffer and output display, for present-a-scanline if a beamrace margin is used (jitter margin), it can present a jumble of wrong scanlines or portions of different scanlines without problem, if the actual realdisplay raster (where a curved filter is fully complete in the emulator framebuffer) is chasing behind several rasters behind the emulator raster in relative vertical screen dimensions of emulator buffer and actual monitor height so don't worry about that. It'll all hide easily in the jitter margin. PresentScanLine theoretically could be PresentFrameSlice() instead, though the size of frameslices in the rendering thread and the presenting thread can be decoupled from each other, since it's typically really just a BitBlt (framebuffer copying) operation between two threads. Either way, this will be accomodated the out-of-sync scanlines are easily hidden in the beamrace margin / beamchase margin / jitter margin for emuHz-vs-realHz beam racing. More important is avoiding multicore rendering & making the present thread a present-only thread. Presumably you don't have to worry about curved CRT filters for now, when (a) implementing this presenter thread idea for first time, (b) implementing retro_set_raster_poll #10758 for the first time, (c) then implementing either Lagless VSYNC #6984 or the BFIv3 CRT emulator #10757 or both. Basically the developer doing (a) or (b) doesn't need to know the details of having to do (c), since the algorithms have flexibility to accomodate. Even (a) and (b) and (c) could be separate programmers only superficially familiar with the other items (a) (b) (c).

In the end, for the 95% not pushing the bleeding edge of hardware, this is probably having a feature ready for them, that they wont be ready to use optimally for a good half decade at the least.

Unless extra effort is done as I described above. It CAN be made easy (I've done it before), but it's very difficult for a programmer that doesn't know all the special BFI considerations.

I realize this is work for the next programmer this decade

Ophidion, I realize you don't have time to do this programming work. I realize it's an ease-versus-programming-complexity problem.

Understandably this is a niche feature. As 120Hz commoditizes in the 2020s (witness both Apple and Samsung biting the 120Hz bait in coming years) and 240Hz commoditizes in the 2030s (with 1000Hz+ premium), this can be a more in-demand feature.

I simply write this information, for the next programmer, which may come by in days, months, or even years. (As well as for Ophidion to be careful not to "architecture it into a corner" making it hard for the next programmer). This is a long term journey, even to 2030s.

@mdrejhon
Copy link

mdrejhon commented Sep 29, 2020

Tips for future software developer wanting to do the presenter-thread technique

  • No multithreaded rendering
  • Presenting thread is a present-only thread
  • CRT filters should be done by the rendering thread before passing to presenting thread concept
  • One technique for BFI is that the rendering thread would pre-render all BFI frames for one emulator refresh, and then pass all frames to the future presenter-thread.
  • Presenter thread would manage a frame queue
  • It can automatically drop sequences to prevent flicker for odd emulator framerates / beyond Hz (e.g. if 2 refresh cycles worth of BFI gets buffered up, drop the oldest refresh cycle's worth).
  • It can automatically repeat sequences to prevent flicker for odd emulator framerates / below Hz (e.g. if no new sequence of BFI framebuffers is received by presenter thread, keep cycling on the last frame sequence)

Advantages:

  • Presenter-thread has no "rendering work". No multithreaded render bugs!
  • Presenter thread is only a presenter thread
  • Ability to drop emulator frames without creating interrupting flicker.
  • Ability to repeat emulator frames without creating interrupting flicker.
  • Ability to avoid flicker during wrong Hz BFI (e.g. 75Hz BFI at 60fps).
  • Wrong Hz BFI just look like stutter, even as flicker is perfectly merrily continuing, e.g. 60fps at 50Hz or 75Hz

Automatic perfect smoothness (BFI and non-BFI):

  • Works out-of-box for VSYNC ON at Hz multiples (120Hz, 180Hz, 240Hz, etc)
  • Works out-of-box for VRR systems (automatic adapting for both PAL and NTSC)
  • Works out-of-box for DWM + VSYNC OFF
  • Works out-of-box for triple buffers (NVIDIA Fast Sync and AMD Enhanced Sync)

Automatic with tolerable artifacts (BFI and non-BFI):

  • Works out-of-box for VSYNC OFF at Hz multiples; a stationary tearline will appear (can be steered offscreen via future beamracing techniques)
  • Works out-of-box for VSYNC ON at wrong Hz. (BFI flickers correctly, motion simply stutters)

Intolerable flicker artifact will generally now be reduced only to these situations (BFI only):

  • VSYNC OFF at wrong Hz (erratic flickering + tearing), like 60fps emulator trying to do 120Hz BFI during 144Hz VSYNC OFF
  • Window overlapping two different-Hz monitors on multimonitor

Thus, user experience is increased to maximum with the high-priority presenter-only thread technique that does best-effort microsecond precision framepacing. (As smooth as RTSS already is able to -- except a simplified version is built into RetroArch as a presenter thread)

You have to framepace anyway for VRR, so this is just an enhanced improvement to this, that adds BFI compatiblity, and catch-all capability (works with all sync tech, not just VRR).

And it futureproofs for future technologies, and can be made compatible with future beam racing methods (e.g. PresentScanLine() to blit one scanline at a time from emulator thread to the presenter thread, which can do whatever beamracing algorithms are supposed, such as hardware-based beamracing (frameslice beamracing, usually emuHz=realHz) or software-based beamracing (ultrahigh realHz, using rolling-scan outputted framebuffers).

It's just one algorithm fits-all, that usually only needs to detect max Hz (can be a wrappered call to whatever OS API returns the current decimal-point refresh rate), that upon failure simply automatically assumes the user defined refresh rate (Retroarch existing Refresh Rate setting perhaps).

Doing a single algorithm that does a catch-all will reduce the amount of HOWTOs I have to write, so I highly encourage a future software developer to create the futureproof presenter-thread technique.

Conclusion: Think Simple

It's actually probably much simpler than my walls of text -- it's simply the existing G-SYNC framepacing algorithm, simply improved in precision (by using busywaits instead of just timer) and moved into a separate frame-presenting thread.

And BFI is simply that existing algorithm partitioned up (e.g. same G-SYNC framepacing algorithm but at higher frequency). With a few best-practices thrown in to make sure it's a catch-all.

Decoupling the presenting from the rendering, so that frame presenting can continue (at higher precision) independently of the RetroArch emulators.

The separate-present thread could be a separate task than BFI initially. This adds a future-proofing technique (e.g. BFI, beamracing, future CRT emulators, etc). BFI keeps flickering merrily at best-effort-microsecond precision (and in a VRR-compatible / DWM-compatible way). Precision beamracing also becomes much easier to add (once #10758 is completed), since no further modifications to any emulators are needed to add beamracing support. And it also more easily preserves compatibility with features such as RunAhead and emulator-pause, since you don't need to add code to enable/disable BFI outside the presenting thread.

@mdrejhon
Copy link

mdrejhon commented Sep 29, 2020

UPDATE:

I have resummarized the frame-presenter thread idea in a much, much simpler way here:

[Feature Request] Futureproof RetroArch with precision frame pacing presenter thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants