Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fantasy CPU: Driving the Display (poll vs wait vs interrupt) #1685

Closed
joshgoebel opened this issue Nov 9, 2021 · 8 comments
Closed

Fantasy CPU: Driving the Display (poll vs wait vs interrupt) #1685

joshgoebel opened this issue Nov 9, 2021 · 8 comments
Labels
discussion Issues with no clear action or preferred solution fantasy cpu Related to the fantasy CPU effort

Comments

@joshgoebel
Copy link
Collaborator

joshgoebel commented Nov 9, 2021

Related: #1678 #1660 #1007

I wanted to open a thread just on driving the display "hardware". We've talked elsewhere about how a fantasy CPU might actually draw to the screen:

  • Writing directly to VRAM
  • Accessing the "GPU" directly via GPU command sequences
  • Both

Writing to VRAM would be no different than how scripting languages do it. Accessing the GPU directly would be some of memory mapped registers perhaps paired specifically with features of the given fantasy (or real) CPU in question. Perhaps you first write to a few registers in RAM than send a "run_command" signal to the GPU over an IO port, etc... In this thread I'd like to talk about the whole process in a bit more detail but especially in relation to all the current complexity of the graphics "pipeline". For example our current drawing callback:

-- FRAME BEGINS (60 FPS)

   TIC()
   BDR(0) BDR(1) BDR(2)... -- top border
   SCN(x) BDR(y) SCN(x+1) BDR(y+1) -- the actual display portion of the screen, interleaved BDR/SCN callbacks
   BDR(x) BDR(y) BDR(z)... BDR(143) -- bottom border
   OVR() -- do fancy overlay stuffs

-- FRAME END

This is all done via callbacks... There is some talk that BDR and SCN is currently a bit of duplication, but it's how things currently stand. ...so scripting environs now get:

  • TIC (1)
  • BDR (144)
  • SCN (136)
  • OVR (1)

282 callbacks per frame... Would these all be interrupts? None of the retro hardware I've worked on used interrupts for screen drawing (Gamebuino, Gamebuino META, Arduboy, etc)... the hardware software was all rather simple:

  • "VRAM" is reserved CPU side to "cache" the state of the screen, as the real hardware VRAM is not accessible
  • LCD/OLED is attached via a serial/parallel bus (and typically serial), graphics are streamed to the display a full screen at a time...
  • Often the LCD controller would support more than a few commands (inverse, offset), but these weren't commonly used - preferring instead to do most effects in software.
  • Frame management was all done via timing. For 60 FPS you rendered (and streamed) the screen every 16.66ms... if you get done early then you sleep until the next frame is needed. The difference in time between frames was tracked so that frame speed was consistent even if the rendering time varied slightly from frame to frame.
  • None of this was interrupt driven. (well, the time tracking itself was via timers, but that's indirect)...
  • ...what I mean is there was no interrupt saying "I need another frame"
  • None of this hardware supported VSYNC.

I have no familiarity with how early retro systems worked (NES, Gameboy, etc)... perhaps someone else can weigh in there.


So how might we be thinking about handing from from a fantasy CPU perspective? Is our CPU mostly idle, but it supports software interrupts for TIC, SCN, OVR, BDR? Or would we instead have only a timer interrupt and the CPU would be responsible for figuring out these other things? Or perhaps we have only a single TIC/VSYNC interrupt that everything keys off of.

I feel like the complexity of the callbacks might force our hand here... for example currently OVR is a very specific "hardware mode" that allows writing to VRAM but via a bit-mask - ie keeping track of which pixels are written and which aren't such that during the "signal rendering" what you have is:

  • BUFFER = Internal 240x136 24-bit color buffer
  • BITMASK = 240x136 bit buffer to track changes to any given pixel in OVR pass
  • TIC()
  • During SCN/BDR:
  • Initial VRAM layer 0 painted into BUFFER (using palette)
  • OVR()
    • Any draw commands (or pokes) are written to VRAM as well as tracked by BITMASK
  • VRAM layer 1 (current just a bit-masked layer 0) painted into BUFFER (using palette)
  • BUFFER is finally painted to our "hardware"/SDL2, etc...

Right now there is no way we could do this with just a VSYNC interrupt without access to the "magic" BUFFER or BITMASK... since we'd have no way to do the color mapping needed from the palette, etc... and just allowing us directly access to the BUFFER as if it were part of hardware would allow us to write 24-bit color games. Meanwhile driving all this nuance from the CPU side seems weird as well - meaning the CPU saying "GPU, go into OVR now." or "GPU, prepare for scanline 5 now'...

Maybe this stuff just doesn't translate perfectly when you start thinking about real hardware... the Gamebuino META (in it's 16 color high-res mode) is very close to TIC-80 in some ways... you had a 16 color palette... the CPU was wired to the LCD controller in RGB565 mode (16-bit color)... so whenever it was time to "paint" the screen whatever 4-bit screen data was in RAM was translated (on the fly) to 16-bit color from a 24-bit color space... we didn't support BDR or SCN type palette swaps per scanline, but we easily could have. (esp since there was no VSYNC and the timing wasn't critical)

But, I'm rambling... so any thoughts on how this whole graphics pipeline might translate over to a fantasy CPU?

@joshgoebel joshgoebel added the discussion Issues with no clear action or preferred solution label Nov 9, 2021
@joshgoebel
Copy link
Collaborator Author

Technically it's possible that in the "fantasy" there is no BUFFER... you could build the 24-bit output (to a "fantasy" dumb LCD) on the fly based on VRAM and BITMASK alone... so perhaps it's safer to assume that's what the internals of the TIC-80 fantasy hardware look like... rather than assuming there is 100kb of hidden RAM to buffer the screen. :-) Though our fantasy GPU is going to have to have pretty fast bit-shifting if we pretend the bitmask is truly a bitmask. (instead of the very RAM heavy nibble mask we actually use in C) :-)

@joshgoebel
Copy link
Collaborator Author

I'll start:

In "real life" (for simple retro graphics hardware) it seems more likely we'd have an interrupt driven VSYNC (or we'd be polling for VSYNC)... and that then we'd have to stream the entire screen fairly quickly to an LCD controller (before the next VSYNC)... and that the LCD contents would be buffered somewhere in the LCD controller - such that if we didn't refresh the screen a few frames that whatever was there last would persist.

Or should I stop talking about LCDs at all and go back to the days of CRT and real scanlines? If we're going to say there are discrete CPU and GPU we have to start thinking about it that way and that might mean there are some things the CPU truly can't do - such as change colors every border/scanline. That's probably tied up in very tight timing on the GPU side... asking and waiting on the CPU to answer might be far too slow.

@joshgoebel
Copy link
Collaborator Author

I assume also that none of this would be necessary to be used (even if provided) - given the assumption that VRAM is mapped directly to output. So much like someone writing a regular cartridge must only implement TIC, perhaps wrt to an actual CPU the minimum requirement is no callbacks/interrupts at all... since we could just assume "constant execution" rather than a scripting language than requires callbacks...

IE, the following should produce output:

start:
  LOAD A, 0
  LOAD (A), 0xFF
  JMP start

IE, address 0, push two pixels (assuming VRAM is still addressable at address 0)... I assume that this would draw two (color 15) pixels to the screen permanently (as the CPU is in a tight loop)... (this is obviously wasteful of CPU cycles, yes...)

Or even if an interrupt WAS required it could probably just be ignored... draw the screen once, then sleep, interrupts briefly wake us then we go right back to sleep:

start:
  LOAD A, 0
  LOAD (A), 0xFF
nap:
  SLEEP
  JMP nap

interrupt(TIC):
  return

@Anrock
Copy link
Collaborator

Anrock commented Nov 9, 2021

I think it's time to create a project to group all this hardware issues.


Maybe I'm missing something but what's the problem with having exactly same callbacks but in form of interrupts? For what it's worth it would at least be familiar for any person who coded for TIC already and we won't have to invent whole new system and maintain documentation and separate code paths for it.

Speaking of real retro consoles - they all seem to had at least VBLANK interrupt, sometimes HBLANK and others on top of it. So TIC's current pipeline doesn't seem to be much out of line. Maybe except BDR, but why not havin't just for compatibility sense.

@joshgoebel
Copy link
Collaborator Author

Maybe I'm missing something but what's the problem with having exactly same callbacks but in form of interrupts?

Well per the original request #1007 the idea was to get closer to "real". Just doing the same thing we do for scripting languages for a fantasy CPU doesn't seem "real" to me at all from what I know of such things. Bit-banging out an analog VGA signal (for example) is a VERY time sensitive process... you could indeed (depending on the speed of your CPU) play around with palette and such concerns per scanline, but you'd have to be very careful to keep everything in sync.

Unless you were running quite fast the whole idea of:

  • GPU fires an interrupt for a scanline
  • GPU waits
  • CPU receives scanline interrupt
  • CPU processes and writes response data to bus/RAM
  • CPU returns
  • GPU receives scanline response data
  • GPU rasterizes the scanline to the output device
  • repeat

And remember we'd have to do this TWICE... once for SCN, then again for BDR... roundtrips between the GPU/CPU...

Now you could solve this by saying "it's all buffered, exact timing is less critical"... so that would presume our GPU has a 100kb BUFFER (or some significant portion thereof)... such that all this data is getting written to the BUFFER... and only when a full buffer is prepared would the GPU output that (aligned with VSYNC one would guess). But if the video card truly held such a buffer it'd be cool to let us at it more directly... I think it's more fun to pretend it doesn't - and that might force certain limitations on us.

It could be for fantasy purposes none of this matters, but I think upfront it's worth thinking about all this and seeing if we can learn anything from it. Particular since no one has offered up a real working version yet. :-)

I guess the hardware question about interrupts (if we care) is what is the interrupt latency... we can assume the CPU is soft-realtime... such that it has nothing better to do than process interrupts... so if the SCN routines were short enough... perhaps a round-trip to the CPU and back could be performed while the GPU was painting the right side border, then left side border... and if you burnt too many CPU cycles, then you'd start to see the display signal mess us because the timing was out of sync.

Do we want that type of realism? ;-) Or do we want the programmer free to ignore all such implementation concerns? :-)

@joshgoebel
Copy link
Collaborator Author

I think it's time to create a project to group all this hardware issues.

Or at least a tag... @nesbox You have anything against a project?

@Anrock
Copy link
Collaborator

Anrock commented Nov 9, 2021

@joshgoebel from what I know it works like what you described in retro consoles. There is a single buffer in GPU, GPU constantly draws it on a screen while producing V/H blank interrupts during which CPU is free to alter that buffer (via direct writes or GPU commands, doesn't matter). Since outputting VGA signal is realtime process there is a limited (but known beforehand) count of cycles to alter the buffer in interrupt handler. If interrupt handler takes too much cycles then there is either garbage on screen or, I'm not sure here, GPU just ignores that commands.

@nesbox
Copy link
Owner

nesbox commented Nov 10, 2021

You have anything against a project?

I don't have anything against it, pls create a project if you need, for now, I’m just watching where this road will lead us :)

@joshgoebel joshgoebel added the fantasy cpu Related to the fantasy CPU effort label Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issues with no clear action or preferred solution fantasy cpu Related to the fantasy CPU effort
Projects
None yet
Development

No branches or pull requests

3 participants