Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A64 fastmem #528

Merged
merged 1 commit into from Jul 1, 2020
Merged

A64 fastmem #528

merged 1 commit into from Jul 1, 2020

Conversation

vdwjeremy
Copy link
Contributor

Direct access to host (virtual) address space with fastmem offset, propagated from A32.
The goal is to delegate page table computation and page boundaries detection to the hardware.

Benchmark
From memory intensive sequential read/write

    env.MemoryWrite32(0, 0x910004E7); // ADD X7, X7, 1
    env.MemoryWrite32(4, 0x91002042); // ADD X2, X2, 8
    env.MemoryWrite32(8, 0xF9400046); // LDR X6, [X2]
    env.MemoryWrite32(12, 0x91002063); // ADD X3, X3, 8
    env.MemoryWrite32(16, 0xF9000066); // STR X6, [X3]
    env.MemoryWrite32(20, 0xF9400046); // LDR X6, [X2]
    env.MemoryWrite32(24, 0xF9000066); // STR X6, [X3]
    env.MemoryWrite32(28, 0xF9400046); // LDR X6, [X2]
    env.MemoryWrite32(32, 0xF9000066); // STR X6, [X3]
    env.MemoryWrite32(36, 0xF9400046); // LDR X6, [X2]
    env.MemoryWrite32(40, 0xF9000066); // STR X6, [X3]
    env.MemoryWrite32(44, 0x17FFFFF5); // B -24

Elapse time on i5-4590 compared to page table is

  • 70% when copying 2MB
  • 71% when copying 20MB
  • 78% when copying 200MB

src/backend/x64/a64_emit_x64.cpp Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
@merryhime
Copy link
Owner

merryhime commented Jun 18, 2020

Please match the codestyle of the rest of the codebase.

Design comments:

  1. Handling of segmentation faults is required. Users of this library intentionally map inaccessible memory into a fastmem region to trigger this behaviour. This must fallback to another method of memory access. Recompilation should be a configuration option.
  2. Not all users of this library find mirroring the address space desirable behaviour. This is why it’s a configuration option.
  3. You can reserve a register for the fastmem base pointer.

Misc comments:

  1. Your memory read/write implementations in RawMemoryTestEnv are incorrect. Misaligned memory accesses need to be supported.

I will give this a proper review when I can.

Copy link
Owner

@merryhime merryhime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for my stricter than usual comments, but this touches a part of the codebase which is difficult to adequately test.

include/dynarmic/A64/config.h Outdated Show resolved Hide resolved
include/dynarmic/A64/config.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
tests/A64/testenv.h Outdated Show resolved Hide resolved
@vdwjeremy
Copy link
Contributor Author

Apologies for my stricter than usual comments, but this touches a part of the codebase which is difficult to adequately test.

no worry, that's helpful :)

@vdwjeremy
Copy link
Contributor Author

Design comments:
1. Handling of segmentation faults is required. Users of this library intentionally map inaccessible memory into a fastmem region to trigger this behaviour. This must fallback to another method of memory access. Recompilation should be a configuration option.
2. Not all users of this library find mirroring the address space desirable behaviour. This is why it’s a configuration option.
3. You can reserve a register for the fastmem base pointer.

  1. ok, I was not sure this could be useful, in that case I will add this part
  2. can you develop ?
  3. will do, the register will be reserved for the full block though, do you foresee potential issues ?

@merryhime
Copy link
Owner

Just FYI, I've implemented what I've said above here: https://github.com/MerryMage/dynarmic/compare/fm

@vdwjeremy
Copy link
Contributor Author

vdwjeremy commented Jun 19, 2020

Just FYI, I've implemented what I've said above here: https://github.com/MerryMage/dynarmic/compare/fm

Too fast !
I was working on something similar, but it won't be compatible I'm afraid, and I didn't have the R13 register yet. Do you want to continue on branch fm ?

On the exception fallback I think we had the same, but I couldn't validate it succesfully, it goes through A64EmitX64::FastmemCallback but then crashes.
I'm testing with restrictions on windows pages such as

    void* vmem = VirtualAlloc(nullptr, memory_size, MEM_RESERVE, PAGE_NOACCESS);
    VirtualAlloc(vmem, 1024, MEM_COMMIT, PAGE_READWRITE); // code memory
    VirtualAlloc((char*)vmem+ 1024 * 1024, 1024 * 1024, MEM_COMMIT, PAGE_NOACCESS); // no access

Do you get better results ?

@vdwjeremy
Copy link
Contributor Author

vdwjeremy commented Jun 21, 2020

rebased the PR on top of fm branch, changed on A32/A64:

  • added unit tests
  • fixed side effect on register allocation when changing page_table/fastmem
  • fixed conditions to use user callbacks (was always using callbacks)

The few tests I did on page faults still can't clear the SIGSEGV though, even on A32, I would be curious to know how this mechanism is used

@merryhime
Copy link
Owner

Hey! Apologies for getting back to this late. LGTM as is, I'll rebase onto mp when I can.

@merryhime
Copy link
Owner

merryhime commented Jun 27, 2020

can't clear the SIGSEGV

Could you clarify what you mean by this? If you're running in a debugger on Windows, some debuggers hook segfaults (and get to them before we can intercept them). We're already using the fastest possible SEH handling method, though.

@vdwjeremy
Copy link
Contributor Author

OK I understood what happens.
Effectivelly the MSVC debugger intercepts the segfault, that can be resumed properly and does not appear when the debugger is not attached.
But also the test framework interferes with signals. I could get a different behaviour by playing with these flags

CATCH_CONFIG_NO_POSIX_SIGNALS
CATCH_INTERNAL_CONFIG_NO_POSIX_SIGNALS
CATCH_CONFIG_NO_WINDOWS_SEH
CATCH_INTERNAL_CONFIG_NO_WINDOWS_SEH

But it starts to be messy and I can't find a way to make a portable test out of it. For now I will leave it as is, if that's ok with you.

@merryhime merryhime changed the base branch from master to fm July 1, 2020 20:50
@merryhime merryhime merged commit 9261f56 into merryhime:fm Jul 1, 2020
@degasus
Copy link
Contributor

degasus commented May 26, 2021

Is there any way how I can help to get this merged to master as well? I can confirm that this implementation is working fine. I've seen a 15% performance increase in the CPU emulation in yuzu ingame BOTW.

@merryhime
Copy link
Owner

I can rebase this branch to a commit which is yuzu-compatible. (I've removed some yuzu-specific hacks.)

@degasus
Copy link
Contributor

degasus commented May 26, 2021

Thanks for the offer, I already did so: https://github.com/degasus/dynarmic/commits/fm
That's how I've tested this branch. However I think in the long term, this should all go to master, shouldn't it?

Edit: Would it help you if I rebase it on latest master, and move the fixups from the second commit to the first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants