Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory refactorings #313

Merged
merged 66 commits into from Oct 14, 2017
Merged

Conversation

bsmiles32
Copy link
Member

@bsmiles32 bsmiles32 commented Jun 15, 2017

DON'T MERGE !
This is a WIP, with several refactorings concerning mainly the memory module :

  • now the memory module only deals with "physical" mappings (virtual to physical translation is done in r4300)
  • no more usage of global variables in this module
  • single allocation of memory for all device memories. I allocate 512M, but only a fraction gets used (eg RAM, ROM, PIF_RAM, SP_MEM). This simplify the fast_mem_access function which gives a nice speed boost on the pure_interpreter. This also paves the way for the "fast memory" implementation.
  • memory mappings are defined in the device module
  • pifbootrom has been reworked to not use the device structure, but only r4300_core struct. This also paves the way for external PIFBootROM support. Not there yet, but closer.

edit: It also breaks the cached_interpreter, don't know why. And I didn't really bother with the new_dynarec so it should break it quite badly.

When I get some time, I will try to rework it and put it into shape. But early testers are welcome :)
Especially, I'd like to know if it improves the emulation speed on other computers without bad regressions.

@loganmc10
Copy link
Member

I tried Super Mario and Smash Bros, neither booted with x64 dynarec or either interpreter, it didn't crash, but CPU was at 100%, here is what GDB had if I Ctrl+C'ed out of the program:

(gdb) backtrace
#0  SPECIAL (PC=3915365300, inst=<optimized out>) at ../..//su.c:1702
#1  run_task () at ../..//su.c:2006
#2  0x00007fffccc1e7c9 in DoRspCycles (cycles=4294967295) at ../..//module.c:375
#3  0x00007ffff05edead in do_SP_Task (sp=0x7ffff1939718 <g_dev+17834392>) at ../../src/device/rsp/rsp_core.c:297
#4  0x00007ffff05eb583 in r4300_write_aligned_word (r4300=0x7ffff0837580 <g_dev>, address=67371024, value=1, mask=<optimized out>) at ../../src/device/r4300/r4300_core.c:393
#5  0x00007ffff05d0eee in pifbootrom_hle_execute (r4300=0x7ffff0837580 <g_dev>) at ../../src/device/pifbootrom/pifbootrom.c:74
#6  0x00007ffff05f5e64 in main_run () at ../../src/main/main.c:1120
#7  0x00007ffff05d834d in CoreDoCommand (Command=<optimized out>, ParamInt=<optimized out>, ParamPtr=<optimized out>, ParamPtr=<optimized out>, ParamInt=<optimized out>, 
    Command=<optimized out>) at ../../src/api/frontend.c:225
#8  0x0000000000402e7f in main (argc=<optimized out>, argv=<optimized out>) at ../../src/main.c:798

I tested this using rsp-cxd4 in HLE video mode

@loganmc10
Copy link
Member

Also, malloc'ing 512MB of RAM will be pretty tight on some ARM devices. A lot of Android devices only have 2GB of RAM, my Shield Tablet has 2GB and it says only 487 is free right now (although I assume it would just close some old apps if an app requested memory, so it's probably not a problem). The Raspberry Pi also has 1GB of RAM.

Is there some way to have a "fallback mode"? Either if the malloc fails or if the user requests it. I'm really excited for fast memory access so I think it should go forward, but it might mean that performance actually suffers on low memory devices

@bsmiles32
Copy link
Member Author

Normally the kernel should lazy alloc it, so only pages that are really accessed are allocated. I guess it should be no problem.

@bsmiles32
Copy link
Member Author

bsmiles32 commented Jun 15, 2017

Several fixes which re-enable the cached interpreter and improve pifbootrom hle implementation.
As far as I understand, only new dynarec needs fixing and that is mostly at commit 424938e because memory mappings are now defined with physical addresses instead of virtual addresses.
@Gillou68310 : any ideas on how to fix this. Because previously the new_dynarec hooked his own MMIO routines to do its own TLB operations. But with physical mappings this hooking is out of question.

@loganmc10
Copy link
Member

Crashing is fixed for me, tested several games on x64 dynarec and Cached/Pure Interpreter, seems to be working well now

@loganmc10
Copy link
Member

Just tried compiling x86 new dynarec, I get this error on linking:

_obj/device/r4300/new_dynarec/x86/linkage_x86.o:../../src/device/r:(.text+0x5c8): undefined reference to `write_mi'

I'm not sure if the error is a result of a commit from this PR or the previous one

@bsmiles32
Copy link
Member Author

New_dynarec requires some more work unfortunately. I will need help from Gillou on this, because I'm not familiar with how the new_dynarec does code invalidation, and TLB translation.
Still if you manage to hack around and get the pure or cached interpreters to run on your rpi I'm interested in your results 😀

@loganmc10
Copy link
Member

I've tested the Pure Interpreter inside my VMware machine. In my testing this PR saves about 2% CPU time in Pure Interpreter mode on average. Sometimes it was pretty much the same, I would just run the same scene in both versions and compare what I saw in "top", so the testing wasn't anything too scientific. I don't see any regressions anyway.

@bsmiles32
Copy link
Member Author

Good ! Thanks for testing ! 2% now, hope for some more with full fast mem 😀

@bsmiles32
Copy link
Member Author

This also validate the big allocation approach. So this is good.

@loganmc10
Copy link
Member

My dream would be to get to the point where the cached interpreter would be fast enough on mobile devices to get rid of the dynarec's altogether (but this is probably an unrealistic dream). I think having an arm dynarec without anyone with enough time or knowledge to maintain it is really going to hold things back.

@bsmiles32
Copy link
Member Author

I feel the same way ! Only one person knows how to work with the new dynarec... And its not me. So all my refactorings take much more time than needed. I have hope (and maybe a long term plan) to unify the pure,cached and both dynarec. But this will take more time than I have now.

@bsmiles32
Copy link
Member Author

Tried something to fix the new dynarec. Don't know if it will work, but it's a step closer I guess.

@ricrpi
Copy link
Contributor

ricrpi commented Jun 18, 2017

What if only 128MB + ROM size is used and a catch for PIF access?

Also how does Linux handle lazy allocation if MAP_LOCKED is used with mmap() to try and reduce page faults?

@bsmiles32
Copy link
Member Author

Not really sure, but from what I undestand it allocates 512MB of virtual memory, then when accesses are made physical pages are allocated when needed. Of all of these 512MB only ram_size+rom_size+rsp_mem_size+pif_boot_rom_size+pif_ram_size+rounding to next physical page size will get physically allocated (approximately).
Note that for now no pagefault mecanism is in use (this is not yet the fast mem approach). This just make all N64 memories accessible through à single pointer, which gives a small speed boost on fast_mem_address due to the removed if/else logic at the cost of a small increase of memory usage.
The next step is indeed to change memory protection of region with mmio semantic in order to trigger a pagefault which will allow to call the appropriate mmio handler. But that's for later.
Still I'm interested to know if this big allocation works reliably on 32bit systems such as the rpi.

@dankcushions
Copy link
Contributor

build error with arm + RPI3:

   CC  _obj/asm_defines/asm_defines.o
    CXX _obj/osd/screenshot.o
nm _obj/asm_defines/asm_defines.o | awk -v dest_dir="../../src/asm_defines" -f ../../tools/gen_asm_defines.awk
    CC  _obj/device/r4300/new_dynarec/arm/linkage_arm.o
    LD  libmupen64plus.so.2.0.0
/usr/bin/ld: _obj/device/r4300/new_dynarec/arm/linkage_arm.o: relocation R_ARM_MOVW_ABS_NC against `dynarec_read_aligned_dword' can not be used when making a shared object; recompile with -fPIC
_obj/device/r4300/new_dynarec/arm/linkage_arm.o: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:706: recipe for target 'libmupen64plus.so.2.0.0' failed
make: *** [libmupen64plus.so.2.0.0] Error 1

this might be a compilation options issue with -fPIC, but by default I get this.

@loganmc10
Copy link
Member

loganmc10 commented Jun 20, 2017

I briefly tested a few games with the x86 new dynarec and it seems to be working properly.

EDIT: Actually I must have been pointing to the wrong thing, I still get this with x86 new dynarec:

UI-Console Error: dlopen('./libmupen64plus.so.2') failed: ./libmupen64plus.so.2: undefined symbol: write_mi

@bsmiles32
Copy link
Member Author

Updated with 2 fixes :

  • one from remove_memd PR concerning the way I called dynarec_{read,write}_aligned_dword
  • one for the missing write_mi function

Thank you guys for testing :)

@loganmc10 loganmc10 mentioned this pull request Jun 22, 2017
@dankcushions
Copy link
Contributor

builds! but segfaults for me. unfortunately i think the backtrace is not useful:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0x478aa460 (LWP 15533)]
[New Thread 0x46ef3460 (LWP 15534)]
[New Thread 0x464ff460 (LWP 15535)]
[New Thread 0x45cff460 (LWP 15536)]
[New Thread 0x454ff460 (LWP 15537)]
[New Thread 0x44cff460 (LWP 15538)]
[New Thread 0x444ff460 (LWP 15539)]

Program received signal SIGSEGV, Segmentation fault.
0x61746144 in ?? ()
(gdb) bt
#0  0x61746144 in ?? ()
#1  0x67ade5e0 in ?? () from /opt/retropie/emulators/mupen64plus/lib/libmupen64plus.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

went back to master and everything worked ok via gdb. that's all i've got, sorry :(

@bsmiles32
Copy link
Member Author

@dankcushions Thanks for testing.
I guess, this PR will be on hold until we can merge PR312 "remove_memd".

@bsmiles32 bsmiles32 force-pushed the memory_refactorings branch 2 times, most recently from 4f0b5df to bd42118 Compare July 3, 2017 06:47
@bsmiles32
Copy link
Member Author

Rebased against master, includes the changes from PR remove_memd, cleaned up commits.

@bsmiles32 bsmiles32 force-pushed the memory_refactorings branch 2 times, most recently from 762b38b to bfb3842 Compare October 13, 2017 22:51
The allocated memory is big enough (512M) to hold all the directly
accessible physical memory an N64 system have.
By having a single allocated block of memory we don't have to if/switch
for accessing ram, sp_mem, pif_ram and cart_rom in fast_mem_access.

This is also a first step toward implementation of "fast memory".
I broke it when introducing opaque data to read/write memory handlers.
The issue was that there is a single opaque pointer for both read and
write function, whereas read and write breakpoints can be activated
independently. This was leading to a situation where wrong opaque
pointers was passed to read/write handlers.
@bsmiles32
Copy link
Member Author

Rebased against master.

Finally, this PR should be ready for merging.

@bsmiles32 bsmiles32 changed the title [WIP] Memory refactorings Memory refactorings Oct 13, 2017
@richard42 richard42 merged commit 94bcb40 into mupen64plus:master Oct 14, 2017
@Gillou68310
Copy link
Member

@bsmiles32 something in this PR broke the reset functionality

@Gillou68310
Copy link
Member

It was introduced by 1c01c23

@bsmiles32 bsmiles32 deleted the memory_refactorings branch November 2, 2017 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants