-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSP: implement SDL_GetWindowSurface() API for direct VRAM access #9267
Conversation
Good stuff by the way! |
4616653
to
c4a108a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Hey @rofl0r thanks for implementing this, this is a really cool change. I'll do some testing with it tonight and merge it. |
Rather than reaching into the PSP renderer internals, you should set properties on the renderer that will make it possible for this approach to be used by applications as well. The direct3d12 renderer uses this approach to expose the command queue in Also, please remove the change using the texture access as a mask. If this is really necessary we should evaluate changing that field to a Uint32 mask for SDL3, and that can be done as a separate PR. |
oops sorry, somehow that crept in. fixed now
you mean getting the front/back pointers from the renderer ? not sure what you mean with applications. /me off reading the d3d12 code... |
Yes, maybe have something like: That way, any code that instantiates the PSP renderer will have access to the front and back buffer if they need it, and you don't need to reach into the PSP renderer internals from the video subsystem. We try to keep the video and renderer subsystems mostly independent as a way of keeping ourselves honest about what an application can do outside of SDL. |
hmm, it seems SDL_GetProperty is an SDL3-only feature. how do you want me to tackle this for the SDL2 branch ? is it OK if i create a property enum and a special function to query it inside PSP_render.c ? |
For SDL2 it's fine to leave as-is. Can you port this to SDL3 and use properties in that PR? |
Actually yeah, that seems like a better approach than splitting apart the PSP rendering internals to a separate header. |
Do you have some example code for testing this? I'm looking at this now, but I'm not sure how to implement it in my code. |
Certainly, will start working on it as soon as this is merged
done
i made sure to put all relevant info the commit message, but maybe this not-yet-cleaned-up diff for gnuboy using the new method helps more: diff --git a/sys/sdl2/sdl2.c b/sys/sdl2/sdl2.c
index 9013348..340fff7 100644
--- a/sys/sdl2/sdl2.c
+++ b/sys/sdl2/sdl2.c
@@ -18,6 +18,17 @@
#include "input.h"
#include "rc.h"
+#if 0 && defined(__psp__)
+extern int sceMpegRingbufferConstruct(void *r, int i, void*p, int s, void*c, void* pa);
+/* shows n in the PPSSPP debug output */
+static void ppsspp_debug_int(int n) {
+ sceMpegRingbufferConstruct(0, n, 0, 0, 0, 0);
+}
+#else
+#include <stdio.h>
+static void ppsspp_debug_int(int n) { dprintf(2, "%d\n", n); }
+#endif
+
extern void sdljoy_process_event(SDL_Event *event);
struct fb fb;
@@ -25,10 +36,14 @@ struct fb fb;
static int fullscreen = 0;
static int use_altenter = 1;
static int vsync;
+static int softfb;
static SDL_Window *win;
static SDL_Renderer *renderer;
static SDL_Texture *texture;
+static SDL_Surface *fbs;
+
+static float stretchh, stretchv;
static int vmode[3] = { 0, 0, 32 };
@@ -38,6 +53,8 @@ rcvar_t vid_exports[] =
RCV_VECTOR("vmode", &vmode, 3, "video mode: w h bpp"),
RCV_BOOL("fullscreen", &fullscreen, "start in fullscreen mode"),
RCV_BOOL("altenter", &use_altenter, "alt-enter can toggle fullscreen"),
+ RCV_FLOAT("stretchh", &stretchh, "horizontal stretch factor"),
+ RCV_FLOAT("stretchv", &stretchv, "vertical stretch factor"),
RCV_END
};
@@ -72,12 +89,26 @@ void vid_init()
SDL_PixelFormat *format;
SDL_RendererInfo info;
+ softfb = 0;
+
+ stretchh = rc_getfloat("stretchh");
+ stretchv = rc_getfloat("stretchv");
+
if (!vmode[0] || !vmode[1])
{
if (scale < 1) scale = 1;
vmode[0] = 160 * scale;
vmode[1] = 144 * scale;
}
+#ifdef __psp__
+ vmode[0] = 480;
+ vmode[1] = 272;
+ vmode[2] = 16;
+ stretchh = 480 / 160;
+ stretchv = 272.0 / 144.0;
+ softfb = 1;
+ fullscreen = 1;
+#endif
if (vmode[2] == 32)
fmt = SDL_PIXELFORMAT_BGRA32;
else if(vmode[2] == 16)
@@ -99,6 +130,17 @@ void vid_init()
SDL_WINDOWPOS_UNDEFINED, vmode[0], vmode[1], flags)))
die("SDL: can't set video mode: %s\n", SDL_GetError());
+ if(softfb) {
+ SDL_DisplayMode dm = {.format = fmt, .w = vmode[0], .h = vmode[1] };
+ if(SDL_SetWindowDisplayMode(win, &dm) < 0)
+ die("SDL_SetWindowDisplayMode failed %s\n", SDL_GetError());
+ // SDL_BlitScaled
+ if(!(fbs = SDL_CreateRGBSurfaceWithFormat(0, dm.w, dm.h, vmode[2], fmt)))
+ die("SDL_CreateRGBSurfaceWithFormat failed %s\n", SDL_GetError());
+ pixels = fbs->pixels;
+ pitch = fbs->pitch;
+ } else {
+
/* for SDL2, which uses OpenGL, we internally use scale 1 and
render everything into a 32bit high color buffer, and let the
hardware do the scaling; thus "fb.delegate_scaling" */
@@ -122,10 +164,12 @@ void vid_init()
texture = SDL_CreateTexture(renderer, fmt,
SDL_TEXTUREACCESS_STREAMING, 160, 144);
- SDL_ShowCursor(0);
+ SDL_LockTexture(texture, NULL, &pixels, &pitch);
+ }
+
+ //SDL_ShowCursor(0);
format = SDL_AllocFormat(fmt);
- SDL_LockTexture(texture, NULL, &pixels, &pitch);
fb.delegate_scaling = 1;
fb.w = 160;
@@ -142,9 +186,12 @@ void vid_init()
fb.cc[2].r = format->Bloss;
fb.cc[2].l = format->Bshift;
- SDL_UnlockTexture(texture);
+ if(!softfb)
+ SDL_UnlockTexture(texture);
+
SDL_FreeFormat(format);
+ ppsspp_debug_int(1);
fb.enabled = 1;
fb.dirty = 0;
}
@@ -226,20 +273,41 @@ void vid_settitle(char *title)
SDL_SetWindowTitle(win, title);
}
+static int drawtime;
+
void vid_begin()
{
void *p = 0;
int pitch;
- SDL_LockTexture(texture, NULL, &p, &pitch);
+ if(!softfb)
+ SDL_LockTexture(texture, NULL, &p, &pitch);
+ else
+ p = fbs->pixels;
fb.ptr = p;
+ drawtime = SDL_GetTicks();
}
void vid_end()
{
- SDL_UnlockTexture(texture);
- if (fb.enabled) {
- SDL_RenderCopy(renderer, texture, NULL, NULL);
- SDL_RenderPresent(renderer);
+ SDL_Rect destrect = {0, 0, 160*stretchh, 144*stretchv}, *d;
+ static const SDL_Rect srcrect = {0, 0, 160, 144};
+ if(stretchh > 0.0) d = &destrect;
+ else d = NULL;
+
+ if(!softfb) {
+ SDL_UnlockTexture(texture);
+ if (fb.enabled) {
+ SDL_RenderCopy(renderer, texture, d, d);
+ SDL_RenderPresent(renderer);
+ }
+ } else if(fb.enabled) {
+ SDL_Surface *screen;
+ if(!(screen = SDL_GetWindowSurface(win)))
+ die("SDL_GetWindowSurface failed %s\n", SDL_GetError());
+ SDL_BlitScaled(fbs, &srcrect, screen, d);
+ SDL_UpdateWindowSurface(win);
+ drawtime = SDL_GetTicks() - drawtime;
+ ppsspp_debug_int(drawtime);
}
}
the old method is used if softfb == 0, the new one if softfb == 1 note that this code isn't optimal, because the gameboy screen is 160x144 and needs to be stretched - so here i first render on a normal surface and then stretch onto the vram buffer (in software) with SDL_BlitScaled. however that one indirection is already sufficient that the framerate drops around 40. for this application it might be preferable to use actual textures and let the GE do the stretching. however without SDL_TEXTUREACCESS_STREAMING you can't get the memory behind a texture directly (via SDL_LockTexture) - unless i missed something - , and without SDL_TEXTUREACCESS_TARGET the texture won't be created in vram. with the changes i had in the original form of the PR it was actually possible to use both modes ored together, so getting a texture on screen involved only a single copy. |
@slouken would it be acceptable to backport SDL properties to SDL2 ? if so i could use the same code for both branches, and introduce a property to tell the renderer that a texture is already swizzled (or do you see a better way to achieve this ?). when reading about the speed benefits of PSP swizzling, it seems there's a great speedup to be achieved if the SDL user already prepares the image data in a swizlled form. @fjtrujy it appears the ShouldSwizzle function has the logic reversed - the leading |
No, the properties is a massive API change and can't be backported to SDL2. If you'd like to make this an SDL3 only feature, that's fine. We're wrapping up the API changes for SDL3 preview release soon, so the timing works out well. |
It seems like PSP textures should always convert to swizzled form when being updated, for speed. Conceptually the application can't assume anything about the data backing the texture, it might be on the video card, or tiled or compressed for optimal usage in rendering. |
@sharkwouter did you get an opportunity to test ? |
I found that this change does not break existing functionality, but would need an example of how to use the code to test more. |
/me presents silver platter main.c: #include <SDL2/SDL.h>
#include <stdlib.h>
#include <stdio.h>
#define PIXEL_FMT SDL_PIXELFORMAT_BGR565
SDL_Window *win;
SDL_Surface *sur;
typedef unsigned short pix;
typedef unsigned char uchar;
static pix *get_vram_and_pitch(unsigned *pitch) {
sur = SDL_GetWindowSurface(win);
if(sur == 0) return 0;
*pitch = sur->pitch;
return sur->pixels;
}
static void release_vram(void) {
SDL_UpdateWindowSurface(win);
sur = 0;
// if you use other drawing methods, such as a texture, release/unlock
// it here.
}
static void draw_font(pix *p, unsigned pitch, int x, int y, int ch) {
int xx, yy;
pix* pline;
pitch /= sizeof(pix);
p += y*pitch;
pline = p;
// reuse font from pspDebugPrintf
extern uchar msx[];
uchar *font = &msx[(ch&0xff)*8];
for(yy = y; yy < y+8; ++yy, ++font) {
p = pline + x;
for(xx = x; xx < x+8; ++xx)
*(p++) = *font & (128 >> (xx-x)) ? 0x0000 : 0xffff;
pline += pitch;
}
}
static void draw_bg(pix *p, unsigned pitch) {
unsigned x, y;
pix *pline = p, bg = rand()%0xffff;
pitch /= sizeof(pix);
for(y = 0; y < 272; ++y) {
for(x = 0; x < 480; ++x)
*(p++) = bg;
pline += pitch;
p = pline;
}
}
static void draw_frame() {
static long drawtime;
if(drawtime == 0) drawtime = 1;
long drawstart = SDL_GetTicks();
unsigned x, i, pitch;
pix *pixels;
char buf[20];
snprintf(buf, sizeof buf, "fps %04.2f", 1000.0/(float)drawtime);
pixels = get_vram_and_pitch(&pitch);
if(!pixels) exit(1);
draw_bg(pixels, pitch);
for(i = 0, x = 240; i < strlen(buf); x += 8, ++i)
draw_font(pixels, pitch, x, 136, buf[i]);
release_vram();
drawtime = SDL_GetTicks() - drawstart;
}
int SDL_main(int argc, char **argv) {
srand(0x1337);
if (SDL_Init(SDL_INIT_VIDEO) < 0) {
dprintf(2, "Could not initialize SDL: %s\n", SDL_GetError());
return 1;
}
win = SDL_CreateWindow("sdl-demo",
SDL_WINDOWPOS_UNDEFINED,
SDL_WINDOWPOS_UNDEFINED,
480, 272,
SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE);
SDL_DisplayMode dm = {.format = PIXEL_FMT, .w = 480, .h = 272 };
if(SDL_SetWindowDisplayMode(win, &dm) < 0) {
dprintf(2, "SDL_SetWindowDisplayMode failed %s\n", SDL_GetError());
}
while(1) {
SDL_Event ev;
if(SDL_PollEvent(&ev) && ev.type == SDL_QUIT)
break;
draw_frame();
}
return 0;
} Makefile: TARGET = All
OBJS = main.o
BUILD_PRX=0
CFLAGS = -O2 -g -mno-gpopt -Wall
ASFLAGS = $(CFLAGS)
SDL_LIBS := $(shell pkg-config --static --libs sdl2)
LIBS = $(SDL_LIBS) -lm -lpspdebug
EXTRA_TARGETS = EBOOT.PBP
PSP_EBOOT_TITLE =
PSP_EBOOT_ICON =
PSP_EBOOT_PIC1 =
PSPSDK=$(shell psp-config --pspsdk-path)
include $(PSPSDK)/lib/build.mak
|
if all one needs is a raw framebuffer to the PSP's vram, instead of dealing with renderers and textures, that need to be copied hence and forth, this method allows one to create a window, set the pixel format using SDL_SetWindowDisplayMode() - preferably BGR565 for optimal speed (the other possible natively supported option is ABGR8888) - and then request SDL_GetWindowSurface(), which provides one with a surface with direct framebuffer access. note that the pixels pointer inside the surface will be switched after each call because of double-buffering. it's advisable to overwrite all pixels of the PSP visible area (480x272) to not encounter old data. after writing the pixels, a call to SDL_UpdateWindowSurface() sends the changes to the graphics chip. the result is a raw framerate of 250 fps with BGR565 mode, under optimal circumstances - i.e. nothing else is done than drawing, and the drawing loop is as simple as possible. that leaves about 12 ms per frame for other tasks and still allow a fluent 60 fps.
updated PR with a few SDL3-style fixes and rebased on current SDL2 HEAD. |
Would it be possible for applications to detect if the video driver is using double/triple buffering? This seems like a feature that would be useful on RISC OS as well. Also, would it be necessary for the framebuffer to be lockable if the pointer to the pixel data can change, or would it be sufficient to explicitly document that the pointer may change after calling |
in the case of the PSP not, because its OS implements cooperative multitasking, no real hardware threads. this has caused me quite some headache already before i found that out, because i've been waiting for the sound thread to do its work, but nothing happened. the OS thread scheduler only kicks in on blocking syscalls. (for platforms like this, SDL should actually have a function like SDL_yield(), because abusing SDL_Delay(1) for this purpose already slims down the available time budget for 60 fps considerably).
that would certainly make sense, yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested the example you sent and it works like expected. Existing code is also not impacted by this change. Thanks for this work, from my side this is approved.
@rofl0r, can you update your PR with any documentation that's needed? Once you've done that, I'll go ahead and merge this. Also, I'm about to create an SDL 3.1.0 preview release. Do you want this in for that? |
will do it in a few mins
the getwindowsurface api already exists, so i guess it's nbd if the fast version for PSP is missing for the preview. |
@slouken does this look alright ? |
Just to confirm, is it still possible to implement dirty rectangle based rendering this way by copying the modified areas for the last 2 frames instead of 1? |
practically yes (i think), but that's highly specific to the system. though you could probably measure in your code after how many calls the pixels pointer is restored to the first value, so you know whether double or triple-buffering is at play (if at all). |
Looks great, thanks! |
instead of dealing with renderers and textures, this allows you to create a window, set the pixel format using SDL_SetWindowDisplayMode() - preferably BGR565 for optimal speed - and then request SDL_GetWindowSurface() for each redraw, which provides you a surface with direct framebuffer access - that's why you need to call it on every repaint, rather than storing the surface variable.
note that you need to overwrite all pixels of the PSP visible area (480x272) to not encounter old data.
after you're done writing your pixels, call SDL_UpdateWindowSurface() and your changes will be sent to the graphics chip.
the result is a framerate of 60+ fps with BGR565 mode, unlike the other methods involving renderer and textures and a lot of copying.
Existing Issue(s)
#9260