Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of broken examples #26

Closed
ijacquez opened this issue Nov 18, 2018 · 16 comments
Closed

List of broken examples #26

ijacquez opened this issue Nov 18, 2018 · 16 comments

Comments

@ijacquez
Copy link

@ijacquez ijacquez commented Nov 18, 2018

I ran through these examples, and those marked FAILED are the ones that caused some sort of exception. I compiled each example with PSP_FW_REVISION=660.

Below is the list.

Status Filename
FAILED morphskin/morphskin.prx
PASSED shadowprojection/shadowprojection.prx
PASSED logic/logic.prx
PASSED speed/speed.prx
FAILED envmap/envmap.prx
FAILED beginobject/beginobject.prx
FAILED splinesurface/splinesurface.prx
PASSED blend/blend.prx
FAILED sprite/sprite.prx
FAILED spharm/spharm.prx
PASSED skinning/skinning.prx
PASSED blit/blit.prx
PASSED copy/copy.prx
PASSED text/gufont.prx
FAILED morph/morph.prx
FAILED reflection/reflection.prx
FAILED zbufferfog/zbufferfog.prx
FAILED cube/cube.prx
FAILED integerdrawing/integerdrawing.prx
FAILED vertex/vertex.prx
FAILED timing/timing.prx
FAILED lights/lights.prx
FAILED signals/signals.prx
PASSED lines/lines.prx
PASSED clut/clut.prx
FAILED ortho/ortho.prx
FAILED rendertarget/rendertarget.prx
FAILED celshading/celshading.prx
@carstene1ns
Copy link
Member

@carstene1ns carstene1ns commented Nov 20, 2018

Just a quick guess:
Did you build them with a (recent) toolchain version that has the fix for double precision crash?
See pspdev/psptoolchain#85 and pspdev/psptoolchain#65 for reference.

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Nov 20, 2018

I just double checked, and I'm at a435ee0df19d8e07fa4c0a01f1f47fb7aeca834d, which is the HEAD.

@yoanlcq
Copy link

@yoanlcq yoanlcq commented Nov 30, 2018

This looks similar to the situation I was in when I started out (that is, recently), and I remember that setting USE_PSPSDK_LIBC = 1 in the Makefiles (or passing it as argument to make) did the trick for me.
Some seemingly basic floating-point operations caused a crash otherwise (these were easy to figure out by sprinkling printf() calls through one of the samples).

I didn't dig too deep and this does sound like a bug, but hopefully that could help.

(edit: I'm always building with PSP_FW_VERSION = 635.)

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Dec 1, 2018

Thanks for that. I'll give it a try.

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Dec 2, 2018

First run of cube.prx and a few other tests I've marked as failed do work.

Yeah, this may be a bug. Can anyone confirm?

@carstene1ns
Copy link
Member

@carstene1ns carstene1ns commented Dec 2, 2018

So, something wrong with our newlib? Very strange.

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Dec 2, 2018

I haven't checked, but Newlib being built with floating point support?

@yoanlcq
Copy link

@yoanlcq yoanlcq commented Dec 8, 2018

I tried to understand the issue with the cube sample. It turns out that sceGumPerspective() is the culprit in this particular case, because the remainder of the code works just fine.
Since the sample links against pspgum and not pspgum_vfpu, the source for sceGumPerspective() is the following :

void sceGumPerspective(float fovy, float aspect, float near, float far)
{
	float angle = (fovy / 2) * (GU_PI/180.0f);
	float cotangent = cosf(angle) / sinf(angle);
	float delta_z = near-far;
	register ScePspFMatrix4* t __asm("a0") = GUM_ALIGNED_MATRIX();

	memset(t,0,sizeof(ScePspFMatrix4));
	t->x.x = cotangent / aspect;
	t->y.y = cotangent;
	t->z.z = (far + near) / delta_z; // -(far + near) / delta_z
	t->w.z = 2.0f * (far * near) / delta_z; // -2 * (far * near) / delta_z
	t->z.w = -1.0f;
	t->w.w = 0.0f;

	sceGumMultMatrix(t);
}

With

// these macros are because GCC cannot handle aligned matrices declared on the stack
#define GUM_ALIGNED_MATRIX() (ScePspFMatrix4*)((((unsigned int)alloca(sizeof(ScePspFMatrix4)+64)) + 63) & ~63)
#define GUM_ALIGNED_VECTOR() (ScePspFVector4*)((((unsigned int)alloca(sizeof(ScePspFVector4)+64)) + 63) & ~63)

Debugging in psp-gdb with PSPLINK shows that a store exception happens at the assignment immediately following the call to memset(). From the disassembly, I'm assuming that alloca() did resolve to GCC's __builtin_alloca(), because there's no jal alloca instruction. I also see that the call to memset() was indeed compiled as a library call, and was not inlined by GCC.

Here's the thing: newlib's memset() clobbers the a0 register, meaning that t points to garbage after the call. PSPSDK libc's memset() does not; I've reached this conclusion by looking at the disassembly of both functions in psp-gdb.

It looks to me like the fix would be to remove the __asm("a0") part when declaring t. Why would that be so important? In fact I don't see the point in doing an alloca() either. ScePspFMatrix4 is declared with an alignment of 16. If "GCC cannot handle aligned matrices declared on the stack" as the comment says, then isn't all user code screwed anyway? That looks unlikely to me.

What I would do is simply declare t just like a regular variable, and spare the call to memset() altogether via zero-initialization. Is there something critical I'm missing?

Worth noting is that sceGumOrtho() does a similar trick.

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Dec 10, 2018

Yeah, this all seems rather odd. I'm fairly sure that __attribute__ ((aligned(64))) would be sufficient for ScePspFVector4?

@yoanlcq
Copy link

@yoanlcq yoanlcq commented Dec 10, 2018

I'm fairly sure that attribute ((aligned(64))) would be sufficient for ScePspFVector4?

Maybe you meant ScePspFMatrix4? Such a requirement on ScePspFVector4 would forbid contiguous arrays of it... Or maybe you mean putting the attribute at the declaration site instead of the type itself, which looks reasonable (way better than resorting to alloca()... yikes!)

From what I gather, 16-byte alignment is a requirement for VFPU "unaligned" load/stores (ulv.q/usv.q), and 64-byte alignment is a requirement for VFPU "aligned" load/stores (lv.q/sv.q) (none of which pspgum actually needs....)

There are these pieces of info in the chapter 4 of Hitmen's Yet Another PSP Documentation:

VFPU load/store instructions seem to support only 16-byte-aligned accesses (similiar to Altivec and SSE).

But right below, in the section for lv and sv (Load/Store Vector Quadword) instructions :

Final Address needs to be 64-byte aligned.

That second statement was confusing to me at first, but I think I "got it" by looking at how pspgum_vfpu.c manages its loads and stores.

About the a0 stuff, I'm surprised GCC doesn't do "the right thing" by saving it or something, if it knows that the ABI allows clobbering it, but oh well. This makes me wonder which other pieces of code are doing such a thing in the SDK, probably accidents waiting to happen.
In any case let's just remove that __asm("a0") and call it a day; that's the only problematic part really x).

(For posterity, I think this issue's title should be changed to match what we know now :) )

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Dec 10, 2018

Yes, I referred to the wrong one. Adding the attribute to the declaration site is preferred.

I'll run a few tests, removing the need for the a0 register.

@dialupnoises
Copy link

@dialupnoises dialupnoises commented Apr 8, 2019

Can confirm that removing that __asm("a0") bit fixes the issue.

@albe
Copy link

@albe albe commented Aug 24, 2019

"raphael" here from old ps2dev days. I spent a couple years playing with digging into the VFPU opcodes and also added some GE related code to the pspsdk.

From what I recall from back then (that's been ages 😱), GCC had a couple of bugs, especially with alignment of variables on stack. Since the VFPU, but also the GE was pretty strict with the alignment requirements, this hack with aligning the matrix with alloca was necessary, though I'm not sure why the tacking onto the "a0" register was in place.

@ijacquez
Copy link
Author

@ijacquez ijacquez commented Sep 4, 2019

Thanks!

Do we at least know when that GCC bug occurred? If there is still a need for backwards compatibility with older versions of GCC, then we could wrap these workarounds around a pre-processor GCC version check.

@albe
Copy link

@albe albe commented Sep 4, 2019

Uhhh, I don't really remember which GCC version exactly I used back then. I roughly remember something 4.0 or 4.1 at least at some point, which would roughly fit https://gcc.gnu.org/releases.html and the fact that I did a lot of that deeper stuff in 2006-07 (see e.g. http://lukasz.dk/mirror/forums.ps2dev.org/viewtopic35a4.html?t=6929#47791)

But if the removal of just the __asm("a0") part fixes things, that should be fine from my gut-feeling.

carstene1ns added a commit that referenced this issue Jul 10, 2020
Fixes some examples (see #26)

Reason is newlib's `memset()` clobbers the a0 register, meaning that `t`
points to garbage after the call.
@carstene1ns
Copy link
Member

@carstene1ns carstene1ns commented Jul 10, 2020

I have pushed a commit that has the removes the assembly register call. Tested some examples in PPSSPP (cube example even on hardware) and all worked.
Would be nice if we can get some other hacks (like the alloca() stuff mentioned earlier) out of the SDK in the future, but we need some contributors.

Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants