Fix locking of Working set in various places#3798
Fix locking of Working set in various places#3798zefklop wants to merge 0 commit intoreactos:masterfrom
Conversation
9c8668e to
ada8b0b
Compare
cf7e4e1 to
95f2a3b
Compare
|
zefklop, you mentioned this PR in https://jira.reactos.org/browse/CORE-17698 . So it sounds in JIRA as if it would fix that. Understood right? |
|
With your PR I can not longer reproduce https://jira.reactos.org/browse/CORE-17690 see https://jira.reactos.org/browse/CORE-17690?focusedCommentId=129089&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-129089 for a successful log with the patch. |
|
But this ticket does NOT fix CORE-17595 as the tickets description currently implies. I retested with the artifacts iso. So please remove that JIRA-ID from the PRs description! It's a false promise! |
ac7ac64 to
b32d2f0
Compare
c392406 to
2e985a8
Compare
ntoskrnl/mm/ARM3/miarm.h
Outdated
| _In_ PEPROCESS CurrentProcess) | ||
| { | ||
| BOOLEAN ret = _MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess); | ||
| KeMemoryBarrierWithoutFence(); |
There was a problem hiding this comment.
Why would this be needed? A function call is a sequence point and everything (as visible to the current thread) is evaluated before the return. If the concern is, that this function is inlined on release builds and the write to the PTE is reordered to after access to the page, then we should rather add a memory barrier to the MI_WRITE_VALID_PTE and MI_UPDATE_VALID_PTE functions.
There was a problem hiding this comment.
Correction: In standard C, there is a sequence point between the return and the start of the next full expression, so MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess) + *(PUCHAR)PageTableVirtualAddress wouldn't have a sequence point, but hopefully nobody would write that code :).
There was a problem hiding this comment.
It's not about "being evaluated before the return", it's about compiler optimizing memory accesses before or after the call to such functions. Only volatiles are guaranteed to be read or written between sequence points.
I believe that an optimizing compiler can make the following actually access *PointerPte before the call to MiMakeSystemAddressValid and this leads to problems:
PMMPTE PointerPte = MiAddressToPte(addr);
MiMakeSystemAddressValid(DifferentPointerPteWithinPDRange);
MMPTE TempPte =*PointerPte;
There was a problem hiding this comment.
In that case all code that maps any PTE is prone to this. Therefore I suggest adding the memory barrier to the end of MI_WRITE_VALID_P*E and MI_UPDATE_VALID_P*E, as well as to the beginning of MI_WRITE_INVALID_PTE. This would "propagate" to any place it's needed, as long as these functions are used.
There was a problem hiding this comment.
LTO may inline all the functions and start reordering code which once was in different ones.
Two questions: why not putting the barrier inside the function, and shouldn't here be something more hard than a compiler barrier? If that's important, how can we be sure CPU won't reorder stuff?
There was a problem hiding this comment.
why not putting the barrier inside the function ?
The compiler won't see it on caller side if it's inside the function / in a different compilation unit.
If that's important, how can we be sure CPU won't reorder stuff?
We can't. But the CPU won't trigger a PF until it's "sure" it has to, and we would have the instructions actually making the PT valid in the pipeline.
There was a problem hiding this comment.
The CPU doesn't reorder reads and writes on the same CPU core/thread (exception is speculative execution and that doesn't cause faults).
If it's in a different compilation unit and not inlined, then the compiler cannot move memory accesses around it, because it doesn't know the side effects of that function. Otherwise no locking function would ever work without an explicit memory barrier after it.
So the only reason to add that memory barrier is to prevent the compiler from shuffling things around, when it inlines the function and (wrongly) determines that there is no dependency between them . So adding the memory barrier in the function does exactly what is needed.
There was a problem hiding this comment.
The CPU doesn't reorder reads and writes on the same CPU core/thread
As far as I understand, this is true only for x86. Not sure if that's important for us right now though.
ntoskrnl/mm/ARM3/mdlsup.c
Outdated
|
|
||
| //HACK: Pass a placeholder TrapInformation so the fault handler knows we're unlocked | ||
| Status = MmAccessFault(TRUE, Address, KernelMode, (PVOID)(ULONG_PTR)0xBADBADA3BADBADA3ULL); | ||
| Status = MmAccessFault(TRUE, Address, KernelMode, NULL); |
There was a problem hiding this comment.
Stupid remark, but you pass TRUE/FALSE values to an ULONG field.
Shouldn't we define a set of flags somewhere, because currently I see only a set of MI_IS_* set of macroses which use raw constants inside.
There was a problem hiding this comment.
Yes, I have to make up some names.
ntoskrnl/mm/ARM3/miarm.h
Outdated
| _In_ PEPROCESS CurrentProcess) | ||
| { | ||
| BOOLEAN ret = _MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess); | ||
| KeMemoryBarrierWithoutFence(); |
There was a problem hiding this comment.
LTO may inline all the functions and start reordering code which once was in different ones.
Two questions: why not putting the barrier inside the function, and shouldn't here be something more hard than a compiler barrier? If that's important, how can we be sure CPU won't reorder stuff?
9dcc76e to
a238b9b
Compare
81eb350 to
5da1c51
Compare
1f6a492 to
ae7e0a6
Compare
ntoskrnl/mm/ARM3/miarm.h
Outdated
| KeMemoryBarrierWithoutFence(); | ||
| ret = PointerPte->u.Hard.Valid; | ||
| KeMemoryBarrierWithoutFence(); |
There was a problem hiding this comment.
Looks like we're trying to emulate atomic_load() from C11 here
There was a problem hiding this comment.
Doesn't atomic_load involve some specific CPU instruction ? Here the Memory Barrier is for compiler only.
There was a problem hiding this comment.
On x86, it doesn't do anything special except being a compiler barrier
https://godbolt.org/z/h6Yo7Y4z5
|
For those trying to figure out what this memory barrier stuff is about: Adapted from a code sample from @tkreuzer |
|
[~zefklop] I downloaded the latest artifacts again today from #3798 after the most recent changes. It was the gcc 386 dbg build, which identified itself as And CORE-17642 also has been reported to be fixed already in master head. In sum this means that this PR does currently fix no known JIRA-ticket! |
|
Can't we make MMPTE or some parts of it |
|
I've got a problem while testing this PR inside a clang-cl build. It triggers an IRQL check in So we're in scsiport's |
There was a problem hiding this comment.
Ok, I've got a bit further. Inside MmProbeAndLockPages, this code path goes wrong way:
/* Check how we should lock */
if (MI_IS_SESSION_ADDRESS(Base))
{
WorkingSet = &MmSessionSpace->GlobalVirtualAddress->Vm;
}
else if (MI_IS_NON_PAGED_POOL_ADDRESS(Base))
{
UsePfnLock = TRUE;
OldIrql = MiAcquirePfnLock();
}
else
{
WorkingSet = &MmSystemCacheWs;
}
In this case, Base address is the one which comes from IoAllocateMdl. It was a small allocation, so it was taken from a lookaside buffer (LookasideMdlList):
reactos/ntoskrnl/io/iomgr/iomdl.c
Lines 53 to 54 in 911fc3c
MI_IS_NON_PAGED_POOL_ADDRESS(Base) returns FALSE for such address, looks like this is what's wrong here
The address in my case is 0xf2e95fe8 which (according to table) belongs to 0xEB000000 - 0xF7BE0000 System PTE Space
Base is the buffer passed to IoBuildAsynchronousFsdRequest. Why should that be in system PTE space ?
Thanks for the analysis. Indeed, this should take PTE space into account, I'll see how to correct this. |
|
Sorry, I was wrong at some things, don't know how had I overlooked that :) It is reproducible with both MSVC and Clang when Special Pool is enabled. Sorry for a confusion. (I sometimes enable it without taking much attention). I guess for GCC it will trigger that too (can't check right now).
I've found the allocation actually, it comes from cdrom. Nothing interesting: reactos/drivers/storage/class/cdrom/scratch.c Lines 261 to 263 in db8dd3b What's interesting to me is that according to log, special pool has this range: But the address allocated is higher than that area - |
|
The branch |
Purpose
Avoid some race conditions, non-serialized read & writes on the VAD tree, etc.
JIRA issues: CORE-17595 CORE-17642
Proposed changes