Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSP: implement SDL_GetWindowSurface() API for direct VRAM access #9267

Merged
merged 2 commits into from
Mar 24, 2024

Conversation

rofl0r
Copy link
Contributor

@rofl0r rofl0r commented Mar 14, 2024

instead of dealing with renderers and textures, this allows you to create a window, set the pixel format using SDL_SetWindowDisplayMode() - preferably BGR565 for optimal speed - and then request SDL_GetWindowSurface() for each redraw, which provides you a surface with direct framebuffer access - that's why you need to call it on every repaint, rather than storing the surface variable.

note that you need to overwrite all pixels of the PSP visible area (480x272) to not encounter old data.

after you're done writing your pixels, call SDL_UpdateWindowSurface() and your changes will be sent to the graphics chip.

the result is a framerate of 60+ fps with BGR565 mode, unlike the other methods involving renderer and textures and a lot of copying.

Existing Issue(s)

#9260

@fjtrujy
Copy link
Contributor

fjtrujy commented Mar 14, 2024

Good stuff by the way!

@rofl0r rofl0r force-pushed the psp_fb branch 2 times, most recently from 4616653 to c4a108a Compare March 14, 2024 11:03
Copy link
Contributor

@fjtrujy fjtrujy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@sharkwouter
Copy link
Collaborator

Hey @rofl0r thanks for implementing this, this is a really cool change. I'll do some testing with it tonight and merge it.

@slouken
Copy link
Collaborator

slouken commented Mar 14, 2024

Rather than reaching into the PSP renderer internals, you should set properties on the renderer that will make it possible for this approach to be used by applications as well. The direct3d12 renderer uses this approach to expose the command queue in SDL_PROP_RENDERER_D3D12_COMMAND_QUEUE_POINTER for example.

Also, please remove the change using the texture access as a mask. If this is really necessary we should evaluate changing that field to a Uint32 mask for SDL3, and that can be done as a separate PR.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 14, 2024

Also, please remove the change using the texture access as a mask. If this is really necessary we should evaluate changing that field to a Uint32 mask for SDL3, and that can be done as a separate PR.

oops sorry, somehow that crept in. fixed now

Rather than reaching into the PSP renderer internals, you should set properties on the renderer that will make it possible for this approach to be used by applications as well.

you mean getting the front/back pointers from the renderer ? not sure what you mean with applications. /me off reading the d3d12 code...

@slouken
Copy link
Collaborator

slouken commented Mar 14, 2024

Rather than reaching into the PSP renderer internals, you should set properties on the renderer that will make it possible for this approach to be used by applications as well.

you mean getting the front/back pointers from the renderer ? not sure what you mean with applications. /me off reading the d3d12 code...

Yes, maybe have something like:
SDL_PROP_RENDERER_PSP_FRONT_BUFFER_POINTER
SDL_PROP_RENDERER_PSP_FRONT_BUFFER_PITCH_NUMBER
SDL_PROP_RENDERER_PSP_BACK_BUFFER_POINTER
SDL_PROP_RENDERER_PSP_BACK_BUFFER_PITCH_NUMBER

That way, any code that instantiates the PSP renderer will have access to the front and back buffer if they need it, and you don't need to reach into the PSP renderer internals from the video subsystem. We try to keep the video and renderer subsystems mostly independent as a way of keeping ourselves honest about what an application can do outside of SDL.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 14, 2024

hmm, it seems SDL_GetProperty is an SDL3-only feature. how do you want me to tackle this for the SDL2 branch ? is it OK if i create a property enum and a special function to query it inside PSP_render.c ?

@slouken
Copy link
Collaborator

slouken commented Mar 14, 2024

For SDL2 it's fine to leave as-is. Can you port this to SDL3 and use properties in that PR?

@slouken
Copy link
Collaborator

slouken commented Mar 14, 2024

hmm, it seems SDL_GetProperty is an SDL3-only feature. how do you want me to tackle this for the SDL2 branch ? is it OK if i create a property enum and a special function to query it inside PSP_render.c ?

Actually yeah, that seems like a better approach than splitting apart the PSP rendering internals to a separate header.

@sharkwouter
Copy link
Collaborator

Do you have some example code for testing this? I'm looking at this now, but I'm not sure how to implement it in my code.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 15, 2024

Can you port this to SDL3 and use properties in that PR?

Certainly, will start working on it as soon as this is merged

Actually yeah, that seems like a better approach than splitting apart the PSP rendering internals to a separate header.

done

Do you have some example code for testing this? I'm looking at this now, but I'm not sure how to implement it in my code.

i made sure to put all relevant info the commit message, but maybe this not-yet-cleaned-up diff for gnuboy using the new method helps more:

diff --git a/sys/sdl2/sdl2.c b/sys/sdl2/sdl2.c
index 9013348..340fff7 100644
--- a/sys/sdl2/sdl2.c
+++ b/sys/sdl2/sdl2.c
@@ -18,6 +18,17 @@
 #include "input.h"
 #include "rc.h"
 
+#if 0 && defined(__psp__)
+extern int sceMpegRingbufferConstruct(void *r, int i, void*p, int s, void*c, void* pa);
+/* shows n in the PPSSPP debug output */
+static void ppsspp_debug_int(int n) {
+        sceMpegRingbufferConstruct(0, n, 0, 0, 0, 0);
+}
+#else
+#include <stdio.h>
+static void ppsspp_debug_int(int n) { dprintf(2, "%d\n", n); }
+#endif
+
 extern void sdljoy_process_event(SDL_Event *event);
 
 struct fb fb;
@@ -25,10 +36,14 @@ struct fb fb;
 static int fullscreen = 0;
 static int use_altenter = 1;
 static int vsync;
+static int softfb;
 
 static SDL_Window *win;
 static SDL_Renderer *renderer;
 static SDL_Texture *texture;
+static SDL_Surface *fbs;
+
+static float stretchh, stretchv;
 
 static int vmode[3] = { 0, 0, 32 };
 
@@ -38,6 +53,8 @@ rcvar_t vid_exports[] =
 	RCV_VECTOR("vmode", &vmode, 3, "video mode: w h bpp"),
 	RCV_BOOL("fullscreen", &fullscreen, "start in fullscreen mode"),
 	RCV_BOOL("altenter", &use_altenter, "alt-enter can toggle fullscreen"),
+	RCV_FLOAT("stretchh", &stretchh, "horizontal stretch factor"),
+	RCV_FLOAT("stretchv", &stretchv, "vertical stretch factor"),
 	RCV_END
 };
 
@@ -72,12 +89,26 @@ void vid_init()
 	SDL_PixelFormat *format;
 	SDL_RendererInfo info;
 
+        softfb = 0;
+
+	stretchh = rc_getfloat("stretchh");
+	stretchv = rc_getfloat("stretchv");
+
 	if (!vmode[0] || !vmode[1])
 	{
 		if (scale < 1) scale = 1;
 		vmode[0] = 160 * scale;
 		vmode[1] = 144 * scale;
 	}
+#ifdef __psp__
+	vmode[0] = 480;
+	vmode[1] = 272;
+	vmode[2] = 16;
+	stretchh = 480 / 160;
+	stretchv = 272.0 / 144.0;
+	softfb = 1;
+	fullscreen = 1;
+#endif
 	if (vmode[2] == 32)
 		fmt = SDL_PIXELFORMAT_BGRA32;
 	else if(vmode[2] == 16)
@@ -99,6 +130,17 @@ void vid_init()
 		SDL_WINDOWPOS_UNDEFINED, vmode[0], vmode[1], flags)))
 		die("SDL: can't set video mode: %s\n", SDL_GetError());
 
+	if(softfb) {
+		SDL_DisplayMode dm = {.format = fmt, .w = vmode[0], .h = vmode[1] };
+		if(SDL_SetWindowDisplayMode(win, &dm) < 0)
+			die("SDL_SetWindowDisplayMode failed %s\n", SDL_GetError());
+		// SDL_BlitScaled
+		if(!(fbs = SDL_CreateRGBSurfaceWithFormat(0, dm.w, dm.h, vmode[2], fmt)))
+			die("SDL_CreateRGBSurfaceWithFormat failed %s\n", SDL_GetError());
+		pixels = fbs->pixels;
+		pitch = fbs->pitch;
+	} else {
+
 	/* for SDL2, which uses OpenGL, we internally use scale 1 and
 	   render everything into a 32bit high color buffer, and let the
 	   hardware do the scaling; thus "fb.delegate_scaling" */
@@ -122,10 +164,12 @@ void vid_init()
 	texture = SDL_CreateTexture(renderer, fmt,
 			SDL_TEXTUREACCESS_STREAMING, 160, 144);
 
-	SDL_ShowCursor(0);
+	SDL_LockTexture(texture, NULL, &pixels, &pitch);
+	}
+
+	//SDL_ShowCursor(0);
 
 	format = SDL_AllocFormat(fmt);
-	SDL_LockTexture(texture, NULL, &pixels, &pitch);
 
 	fb.delegate_scaling = 1;
 	fb.w = 160;
@@ -142,9 +186,12 @@ void vid_init()
 	fb.cc[2].r = format->Bloss;
 	fb.cc[2].l = format->Bshift;
 
-	SDL_UnlockTexture(texture);
+	if(!softfb)
+		SDL_UnlockTexture(texture);
+
 	SDL_FreeFormat(format);
 
+	ppsspp_debug_int(1);
 	fb.enabled = 1;
 	fb.dirty = 0;
 }
@@ -226,20 +273,41 @@ void vid_settitle(char *title)
 	SDL_SetWindowTitle(win, title);
 }
 
+static int drawtime;
+
 void vid_begin()
 {
 	void *p = 0;
 	int pitch;
-	SDL_LockTexture(texture, NULL, &p, &pitch);
+	if(!softfb)
+		SDL_LockTexture(texture, NULL, &p, &pitch);
+	else
+		p = fbs->pixels;
 	fb.ptr = p;
+	drawtime = SDL_GetTicks();
 }
 
 void vid_end()
 {
-	SDL_UnlockTexture(texture);
-	if (fb.enabled) {
-		SDL_RenderCopy(renderer, texture, NULL, NULL);
-		SDL_RenderPresent(renderer);
+	SDL_Rect destrect = {0, 0, 160*stretchh, 144*stretchv}, *d;
+	static const SDL_Rect srcrect = {0, 0, 160, 144};
+	if(stretchh > 0.0) d = &destrect;
+	else d = NULL;
+
+	if(!softfb) {
+		SDL_UnlockTexture(texture);
+		if (fb.enabled) {
+			SDL_RenderCopy(renderer, texture, d, d);
+			SDL_RenderPresent(renderer);
+		}
+	} else if(fb.enabled) {
+		SDL_Surface *screen;
+		if(!(screen = SDL_GetWindowSurface(win)))
+			die("SDL_GetWindowSurface failed %s\n", SDL_GetError());
+		SDL_BlitScaled(fbs, &srcrect, screen, d);
+		SDL_UpdateWindowSurface(win);
+		drawtime = SDL_GetTicks() - drawtime;
+		ppsspp_debug_int(drawtime);
 	}
 }
 

the old method is used if softfb == 0, the new one if softfb == 1

note that this code isn't optimal, because the gameboy screen is 160x144 and needs to be stretched - so here i first render on a normal surface and then stretch onto the vram buffer (in software) with SDL_BlitScaled. however that one indirection is already sufficient that the framerate drops around 40. for this application it might be preferable to use actual textures and let the GE do the stretching.

however without SDL_TEXTUREACCESS_STREAMING you can't get the memory behind a texture directly (via SDL_LockTexture) - unless i missed something - , and without SDL_TEXTUREACCESS_TARGET the texture won't be created in vram. with the changes i had in the original form of the PR it was actually possible to use both modes ored together, so getting a texture on screen involved only a single copy.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 15, 2024

@slouken would it be acceptable to backport SDL properties to SDL2 ? if so i could use the same code for both branches, and introduce a property to tell the renderer that a texture is already swizzled (or do you see a better way to achieve this ?). when reading about the speed benefits of PSP swizzling, it seems there's a great speedup to be achieved if the SDL user already prepares the image data in a swizlled form.

@fjtrujy it appears the ShouldSwizzle function has the logic reversed - the leading ! should be removed afaict

@slouken
Copy link
Collaborator

slouken commented Mar 15, 2024

No, the properties is a massive API change and can't be backported to SDL2. If you'd like to make this an SDL3 only feature, that's fine. We're wrapping up the API changes for SDL3 preview release soon, so the timing works out well.

@slouken
Copy link
Collaborator

slouken commented Mar 15, 2024

It seems like PSP textures should always convert to swizzled form when being updated, for speed. Conceptually the application can't assume anything about the data backing the texture, it might be on the video card, or tiled or compressed for optimal usage in rendering.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 21, 2024

@sharkwouter did you get an opportunity to test ?

@sharkwouter
Copy link
Collaborator

I found that this change does not break existing functionality, but would need an example of how to use the code to test more.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 22, 2024

/me presents silver platter

main.c:

#include <SDL2/SDL.h>
#include <stdlib.h>
#include <stdio.h>

#define PIXEL_FMT SDL_PIXELFORMAT_BGR565

SDL_Window *win;
SDL_Surface *sur;

typedef unsigned short pix;
typedef unsigned char uchar;

static pix *get_vram_and_pitch(unsigned *pitch) {
	sur = SDL_GetWindowSurface(win);
	if(sur == 0) return 0;
	*pitch = sur->pitch;
	return sur->pixels;
}

static void release_vram(void) {
	SDL_UpdateWindowSurface(win);
	sur = 0;
	// if you use other drawing methods, such as a texture, release/unlock
	// it here.
}

static void draw_font(pix *p, unsigned pitch, int x, int y, int ch) {
	int xx, yy;
	pix* pline;
	pitch /= sizeof(pix);
	p += y*pitch;
	pline = p;
	// reuse font from pspDebugPrintf
	extern uchar msx[];
	uchar *font = &msx[(ch&0xff)*8];
	for(yy = y; yy < y+8; ++yy, ++font) {
		p = pline + x;
		for(xx = x; xx < x+8; ++xx)
			*(p++) = *font & (128 >> (xx-x)) ? 0x0000 : 0xffff;
		pline += pitch;
	}
}

static void draw_bg(pix *p, unsigned pitch) {
	unsigned x, y;
	pix *pline = p, bg = rand()%0xffff;
	pitch /= sizeof(pix);
	for(y = 0; y < 272; ++y) {
		for(x = 0; x < 480; ++x)
			*(p++) = bg;
		pline += pitch;
		p = pline;
	}
}

static void draw_frame() {
	static long drawtime;
	if(drawtime == 0) drawtime = 1;
	long drawstart = SDL_GetTicks();
	unsigned x, i, pitch;
	pix *pixels;
	char buf[20];
	snprintf(buf, sizeof buf, "fps %04.2f", 1000.0/(float)drawtime);
	pixels = get_vram_and_pitch(&pitch);
	if(!pixels) exit(1);
	draw_bg(pixels, pitch);
	for(i = 0, x = 240; i < strlen(buf); x += 8, ++i)
		draw_font(pixels, pitch, x, 136, buf[i]);
	release_vram();
	drawtime = SDL_GetTicks() - drawstart;
}

int SDL_main(int argc, char **argv) {
	srand(0x1337);
        if (SDL_Init(SDL_INIT_VIDEO) < 0) {
		dprintf(2, "Could not initialize SDL: %s\n", SDL_GetError());
		return 1;
	}
	win = SDL_CreateWindow("sdl-demo",
                             SDL_WINDOWPOS_UNDEFINED,
                             SDL_WINDOWPOS_UNDEFINED,
                             480, 272,
                             SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE);
	SDL_DisplayMode dm = {.format = PIXEL_FMT, .w = 480, .h = 272 };
	if(SDL_SetWindowDisplayMode(win, &dm) < 0) {
		dprintf(2, "SDL_SetWindowDisplayMode failed %s\n", SDL_GetError());
	}
	while(1) {
		SDL_Event ev;
		if(SDL_PollEvent(&ev) && ev.type == SDL_QUIT)
			break;
		draw_frame();
	}
	return 0;
}

Makefile:

TARGET = All
OBJS = main.o
BUILD_PRX=0

CFLAGS = -O2 -g -mno-gpopt -Wall
ASFLAGS = $(CFLAGS)

SDL_LIBS := $(shell pkg-config --static --libs sdl2)

LIBS = $(SDL_LIBS) -lm -lpspdebug

EXTRA_TARGETS = EBOOT.PBP
PSP_EBOOT_TITLE =
PSP_EBOOT_ICON =
PSP_EBOOT_PIC1 =

PSPSDK=$(shell psp-config --pspsdk-path)
include $(PSPSDK)/lib/build.mak

if all one needs is a raw framebuffer to the PSP's vram,
instead of dealing with renderers and textures, that need to be
copied hence and forth, this method allows one to create a window,
set the pixel format using SDL_SetWindowDisplayMode() - preferably
BGR565 for optimal speed (the other possible natively supported
option is ABGR8888) - and then request SDL_GetWindowSurface(),
which provides one with a surface with direct framebuffer access.
note that the pixels pointer inside the surface will be switched
after each call because of double-buffering.

it's advisable to overwrite all pixels of the PSP visible area
(480x272) to not encounter old data.

after writing the pixels, a call to SDL_UpdateWindowSurface()
sends the changes to the graphics chip.

the result is a raw framerate of 250 fps with BGR565 mode, under
optimal circumstances - i.e. nothing else is done than drawing,
and the drawing loop is as simple as possible.
that leaves about 12 ms per frame for other tasks and still allow
a fluent 60 fps.
@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 23, 2024

updated PR with a few SDL3-style fixes and rebased on current SDL2 HEAD.

@ccawley2011
Copy link
Contributor

Would it be possible for applications to detect if the video driver is using double/triple buffering? This seems like a feature that would be useful on RISC OS as well.

Also, would it be necessary for the framebuffer to be lockable if the pointer to the pixel data can change, or would it be sufficient to explicitly document that the pointer may change after calling SDL_UpdateWindowSurface()?

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 23, 2024

would it be necessary for the framebuffer to be lockable if the pointer to the pixel data can change

in the case of the PSP not, because its OS implements cooperative multitasking, no real hardware threads. this has caused me quite some headache already before i found that out, because i've been waiting for the sound thread to do its work, but nothing happened. the OS thread scheduler only kicks in on blocking syscalls. (for platforms like this, SDL should actually have a function like SDL_yield(), because abusing SDL_Delay(1) for this purpose already slims down the available time budget for 60 fps considerably).

would it be sufficient to explicitly document that the pointer may change after calling SDL_UpdateWindowSurface()

that would certainly make sense, yes

Copy link
Collaborator

@sharkwouter sharkwouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested the example you sent and it works like expected. Existing code is also not impacted by this change. Thanks for this work, from my side this is approved.

@slouken
Copy link
Collaborator

slouken commented Mar 24, 2024

@rofl0r, can you update your PR with any documentation that's needed? Once you've done that, I'll go ahead and merge this.

Also, I'm about to create an SDL 3.1.0 preview release. Do you want this in for that?

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 24, 2024

can you update your PR with any documentation that's needed?

will do it in a few mins

Also, I'm about to create an SDL 3.1.0 preview release. Do you want this in for that?

the getwindowsurface api already exists, so i guess it's nbd if the fast version for PSP is missing for the preview.

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 24, 2024

@slouken does this look alright ?

@ccawley2011
Copy link
Contributor

note that you need to overwrite all pixels of the PSP visible area (480x272) to not encounter old data.

Just to confirm, is it still possible to implement dirty rectangle based rendering this way by copying the modified areas for the last 2 frames instead of 1?

@rofl0r
Copy link
Contributor Author

rofl0r commented Mar 24, 2024

Just to confirm, is it still possible to implement dirty rectangle based rendering this way by copying the modified areas for the last 2 frames instead of 1?

practically yes (i think), but that's highly specific to the system. though you could probably measure in your code after how many calls the pixels pointer is restored to the first value, so you know whether double or triple-buffering is at play (if at all).

@slouken
Copy link
Collaborator

slouken commented Mar 24, 2024

@slouken does this look alright ?

Looks great, thanks!

@slouken slouken merged commit 6ba3e56 into libsdl-org:SDL2 Mar 24, 2024
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants