In mGBA 0.8.4, the UMLAL instruction yields incorrect results when rd_hi and rd_lo are the same register.
On hardware (3DS + open-agb-firm), UMLAL r4, r4, r4, r4 with r4 = 20001h sets r4 to 20005h. mGBA sets r4 to 60006h. Presumably, this is the correct order of operations:
a = do_unsigned_long_multiplication(rm, rs)
a += concat(rd_hi, rd_lo)
rd_lo = a_lo
rd_hi = a_hi

