Skip to content

Fix locking of Working set in various places#3798

Closed
zefklop wants to merge 0 commit intoreactos:masterfrom
zefklop:CORE-17595
Closed

Fix locking of Working set in various places#3798
zefklop wants to merge 0 commit intoreactos:masterfrom
zefklop:CORE-17595

Conversation

@zefklop
Copy link
Copy Markdown
Contributor

@zefklop zefklop commented Jul 5, 2021

Purpose

Avoid some race conditions, non-serialized read & writes on the VAD tree, etc.

JIRA issues: CORE-17595 CORE-17642

Proposed changes

  • Get rid of old MmAccessFault, use MmArm3Accessfault instead (renamed to MmAccessFault)
  • From MmAccessFault, handle Working Set locking
    • Do not lock when not coming from a trap
    • Fix callers accordingly
  • From MmAccessFault, dispatch to old Mm if needed
  • Misc related fixes

@binarymaster binarymaster added the bugfix For bugfix PRs. label Jul 5, 2021
@github-actions github-actions bot added drivers Kernel mode drivers and frameworks kernel&hal Code changes to the ntoskrnl and HAL labels Jul 5, 2021
@github-actions github-actions bot removed the drivers Kernel mode drivers and frameworks label Jul 5, 2021
@zefklop zefklop force-pushed the CORE-17595 branch 3 times, most recently from 9c8668e to ada8b0b Compare July 5, 2021 13:29
@zefklop zefklop force-pushed the CORE-17595 branch 4 times, most recently from cf7e4e1 to 95f2a3b Compare July 23, 2021 16:36
@JoachimHenze
Copy link
Copy Markdown
Contributor

zefklop, you mentioned this PR in https://jira.reactos.org/browse/CORE-17698 . So it sounds in JIRA as if it would fix that. Understood right?
Why don't you add https://jira.reactos.org/browse/CORE-17698 and https://jira.reactos.org/browse/CORE-17690 to the JIRA issues it addresses in the PRs description then?

@JoachimHenze
Copy link
Copy Markdown
Contributor

With your PR I can not longer reproduce https://jira.reactos.org/browse/CORE-17690 see https://jira.reactos.org/browse/CORE-17690?focusedCommentId=129089&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-129089 for a successful log with the patch.
I guess you should mention that ticket in the description also! It happifies me for CORE-17690.

@JoachimHenze
Copy link
Copy Markdown
Contributor

But this ticket does NOT fix CORE-17595 as the tickets description currently implies. I retested with the artifacts iso. So please remove that JIRA-ID from the PRs description! It's a false promise!
https://jira.reactos.org/browse/CORE-17595?focusedCommentId=129090&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-129090

@zefklop zefklop force-pushed the CORE-17595 branch 2 times, most recently from ac7ac64 to b32d2f0 Compare July 26, 2021 10:09
@zefklop zefklop mentioned this pull request Jul 26, 2021
@zefklop zefklop force-pushed the CORE-17595 branch 4 times, most recently from c392406 to 2e985a8 Compare July 27, 2021 13:37
_In_ PEPROCESS CurrentProcess)
{
BOOLEAN ret = _MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess);
KeMemoryBarrierWithoutFence();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be needed? A function call is a sequence point and everything (as visible to the current thread) is evaluated before the return. If the concern is, that this function is inlined on release builds and the write to the PTE is reordered to after access to the page, then we should rather add a memory barrier to the MI_WRITE_VALID_PTE and MI_UPDATE_VALID_PTE functions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: In standard C, there is a sequence point between the return and the start of the next full expression, so MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess) + *(PUCHAR)PageTableVirtualAddress wouldn't have a sequence point, but hopefully nobody would write that code :).

Copy link
Copy Markdown
Contributor Author

@zefklop zefklop Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about "being evaluated before the return", it's about compiler optimizing memory accesses before or after the call to such functions. Only volatiles are guaranteed to be read or written between sequence points.
I believe that an optimizing compiler can make the following actually access *PointerPte before the call to MiMakeSystemAddressValid and this leads to problems:

PMMPTE PointerPte = MiAddressToPte(addr);
MiMakeSystemAddressValid(DifferentPointerPteWithinPDRange);
MMPTE TempPte =*PointerPte;

Copy link
Copy Markdown
Contributor

@tkreuzer tkreuzer Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case all code that maps any PTE is prone to this. Therefore I suggest adding the memory barrier to the end of MI_WRITE_VALID_P*E and MI_UPDATE_VALID_P*E, as well as to the beginning of MI_WRITE_INVALID_PTE. This would "propagate" to any place it's needed, as long as these functions are used.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTO may inline all the functions and start reordering code which once was in different ones.
Two questions: why not putting the barrier inside the function, and shouldn't here be something more hard than a compiler barrier? If that's important, how can we be sure CPU won't reorder stuff?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not putting the barrier inside the function ?

The compiler won't see it on caller side if it's inside the function / in a different compilation unit.

If that's important, how can we be sure CPU won't reorder stuff?

We can't. But the CPU won't trigger a PF until it's "sure" it has to, and we would have the instructions actually making the PT valid in the pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPU doesn't reorder reads and writes on the same CPU core/thread (exception is speculative execution and that doesn't cause faults).
If it's in a different compilation unit and not inlined, then the compiler cannot move memory accesses around it, because it doesn't know the side effects of that function. Otherwise no locking function would ever work without an explicit memory barrier after it.
So the only reason to add that memory barrier is to prevent the compiler from shuffling things around, when it inlines the function and (wrongly) determines that there is no dependency between them . So adding the memory barrier in the function does exactly what is needed.

Copy link
Copy Markdown
Member

@Extravert-ir Extravert-ir Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPU doesn't reorder reads and writes on the same CPU core/thread

As far as I understand, this is true only for x86. Not sure if that's important for us right now though.


//HACK: Pass a placeholder TrapInformation so the fault handler knows we're unlocked
Status = MmAccessFault(TRUE, Address, KernelMode, (PVOID)(ULONG_PTR)0xBADBADA3BADBADA3ULL);
Status = MmAccessFault(TRUE, Address, KernelMode, NULL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid remark, but you pass TRUE/FALSE values to an ULONG field.
Shouldn't we define a set of flags somewhere, because currently I see only a set of MI_IS_* set of macroses which use raw constants inside.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have to make up some names.

_In_ PEPROCESS CurrentProcess)
{
BOOLEAN ret = _MiMakeSystemAddressValid(PageTableVirtualAddress, CurrentProcess);
KeMemoryBarrierWithoutFence();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTO may inline all the functions and start reordering code which once was in different ones.
Two questions: why not putting the barrier inside the function, and shouldn't here be something more hard than a compiler barrier? If that's important, how can we be sure CPU won't reorder stuff?

@zefklop zefklop force-pushed the CORE-17595 branch 4 times, most recently from 9dcc76e to a238b9b Compare August 4, 2021 10:11
@zefklop zefklop force-pushed the CORE-17595 branch 2 times, most recently from 81eb350 to 5da1c51 Compare August 4, 2021 15:05
@zefklop zefklop added the refactoring For refactoring changes. label Aug 4, 2021
@zefklop zefklop self-assigned this Aug 4, 2021
@zefklop zefklop force-pushed the CORE-17595 branch 2 times, most recently from 1f6a492 to ae7e0a6 Compare August 4, 2021 17:04
Comment on lines +2444 to +2446
KeMemoryBarrierWithoutFence();
ret = PointerPte->u.Hard.Valid;
KeMemoryBarrierWithoutFence();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're trying to emulate atomic_load() from C11 here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't atomic_load involve some specific CPU instruction ? Here the Memory Barrier is for compiler only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On x86, it doesn't do anything special except being a compiler barrier
https://godbolt.org/z/h6Yo7Y4z5

@zefklop
Copy link
Copy Markdown
Contributor Author

zefklop commented Aug 5, 2021

For those trying to figure out what this memory barrier stuff is about:
https://gcc.godbolt.org/z/16qejTcGn

Adapted from a code sample from @tkreuzer

@JoachimHenze
Copy link
Copy Markdown
Contributor

[~zefklop] I downloaded the latest artifacts again today from #3798 after the most recent changes. It was the gcc 386 dbg build, which identified itself as
ReactOS 0.4.15-x86-dev (Build 20210805-57cd3d8) (Commit 57cd3d8)
(x) and even that did fail with the strong-name bug at 2nd try.
(x) #3798 does still not fix CORE-17595.

And CORE-17642 also has been reported to be fixed already in master head.

In sum this means that this PR does currently fix no known JIRA-ticket!
I don't want to say, that it is wrong therefore, but we should not give any false promises or keep wrong relations if we ultimately intend to commit something from within here!

@Extravert-ir
Copy link
Copy Markdown
Member

Can't we make MMPTE or some parts of it volatile? That should eliminate the need in explicit compiler barriers

@Extravert-ir
Copy link
Copy Markdown
Member

I've got a problem while testing this PR inside a clang-cl build. It triggers an IRQL check in MiLockWorkingSet. Here is a stack trace:

kd> kp
 # ChildEBP RetAddr  
00 (Inline) -------- nt!MiLockWorkingSet+0x4d [C:\rosgit\ntoskrnl\mm\ARM3\miarm.h @ 1315] 
01 f2741e28 804a467a nt!MmProbeAndLockPages(struct _MDL * Mdl = 0xf30c4f88, char AccessMode = 0n0 '', _LOCK_OPERATION Operation = IoReadAccess (0))+0x826 [C:\rosgit\ntoskrnl\mm\ARM3\mdlsup.c @ 1179] 
02 f2741e88 f7a4c9f9 nt!IoBuildAsynchronousFsdRequest(unsigned long MajorFunction = 3, struct _DEVICE_OBJECT * DeviceObject = 0xf2f52a08 Device for "\Driver\UniATA", void * Buffer = 0xf3072fe8, unsigned long Length = 0x12, union _LARGE_INTEGER * StartingOffset = 0xf2741eb0 {0x1}, struct _IO_STATUS_BLOCK * IoStatusBlock = 0x00000000)+0x25a [C:\rosgit\ntoskrnl\io\iomgr\irp.c @ 824] 
03 f2741ef0 f7a4becd scsiport!SpiSendRequestSense(struct _SCSI_PORT_LUN_EXTENSION * LunExtension = 0xf2f52ac0, struct _SCSI_REQUEST_BLOCK * InitialSrb = 0xf2f74fc0)+0xf9 [C:\rosgit\drivers\storage\port\scsiport\scsi.c @ 555] 
04 f2741f30 f7a4b44e scsiport!SpiProcessCompletedRequest(struct _SCSI_PORT_DEVICE_EXTENSION * DeviceExtension = 0xf2d0cb30, struct _SCSI_REQUEST_BLOCK_INFO * SrbInfo = 0x00000000, unsigned char * NeedToCallStartIo = 0xf2741f57 "")+0x89d [C:\rosgit\drivers\storage\port\scsiport\scsi.c @ 951] 
05 f2741f80 804d0bf1 scsiport!ScsiPortDpcForIsr(struct _KDPC * Dpc = 0xf2d0caec, struct _DEVICE_OBJECT * DpcDeviceObject = 0xf2d0ca78 Device for "\Driver\UniATA", struct _IRP * DpcIrp = 0x00000000, void * DpcContext = 0xf2d0cb30)+0x32e [C:\rosgit\drivers\storage\port\scsiport\scsi.c @ 1381] 
06 f2741ff4 80645e7f nt!KiRetireDpcList(struct _KPRCB * Prcb = 0xffdff120)+0x1c1 [C:\rosgit\ntoskrnl\ke\dpc.c @ 633] 
07 f2741ff8 f26f5740 nt!KiRetireDpcListInDpcStack+0xa

So we're in scsiport's DpcForIsr handler, so the IRQL is DISPATCH_LEVEL. It calls IoBuildAsynchronousFsdRequest, which calls MmProbeAndLockPages with a MDL address (which itself is nonpaged), so the usage of MmProbeAndLockPages at DISPATCH_LEVEL should be fine at this case.
What's wrong then, and why it is not triggered on MSVC?

Copy link
Copy Markdown
Member

@Extravert-ir Extravert-ir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've got a bit further. Inside MmProbeAndLockPages, this code path goes wrong way:

/* Check how we should lock */
if (MI_IS_SESSION_ADDRESS(Base))
{
    WorkingSet = &MmSessionSpace->GlobalVirtualAddress->Vm;
}
else if (MI_IS_NON_PAGED_POOL_ADDRESS(Base))
{
    UsePfnLock = TRUE;
    OldIrql = MiAcquirePfnLock();
}
else
{
    WorkingSet = &MmSystemCacheWs;
}

In this case, Base address is the one which comes from IoAllocateMdl. It was a small allocation, so it was taken from a lookaside buffer (LookasideMdlList):

/* Allocate one from the lookaside list */
Mdl = IopAllocateMdlFromLookaside(LookasideMdlList);

MI_IS_NON_PAGED_POOL_ADDRESS(Base) returns FALSE for such address, looks like this is what's wrong here
The address in my case is 0xf2e95fe8 which (according to table) belongs to 0xEB000000 - 0xF7BE0000 System PTE Space

@zefklop
Copy link
Copy Markdown
Contributor Author

zefklop commented Aug 16, 2021

Ok, I've got a bit further. Inside MmProbeAndLockPages, this code path goes wrong way:

/* Check how we should lock */
if (MI_IS_SESSION_ADDRESS(Base))
{
    WorkingSet = &MmSessionSpace->GlobalVirtualAddress->Vm;
}
else if (MI_IS_NON_PAGED_POOL_ADDRESS(Base))
{
    UsePfnLock = TRUE;
    OldIrql = MiAcquirePfnLock();
}
else
{
    WorkingSet = &MmSystemCacheWs;
}

In this case, Base address is the one which comes from IoAllocateMdl. It was a small allocation, so it was taken from a lookaside buffer (LookasideMdlList):

/* Allocate one from the lookaside list */
Mdl = IopAllocateMdlFromLookaside(LookasideMdlList);

Base is the buffer passed to IoBuildAsynchronousFsdRequest. Why should that be in system PTE space ?

MI_IS_NON_PAGED_POOL_ADDRESS(Base) returns FALSE for such address, looks like this is what's wrong here
The address in my case is 0xf2e95fe8 which (according to table) belongs to 0xEB000000 - 0xF7BE0000 System PTE Space

Thanks for the analysis. Indeed, this should take PTE space into account, I'll see how to correct this.

@Extravert-ir
Copy link
Copy Markdown
Member

Sorry, I was wrong at some things, don't know how had I overlooked that :)

It is reproducible with both MSVC and Clang when Special Pool is enabled. Sorry for a confusion. (I sometimes enable it without taking much attention). I guess for GCC it will trigger that too (can't check right now).

Base is the buffer passed to IoBuildAsynchronousFsdRequest. Why should that be in system PTE space ?

I've found the allocation actually, it comes from cdrom. Nothing interesting:

DeviceExtension->ScratchContext.ScratchSense = ExAllocatePoolWithTag(NonPagedPoolNx,
sizeof(SENSE_DATA),
CDROM_TAG_SCRATCH);

What's interesting to me is that according to log, special pool has this range:

(ntoskrnl\mm\ARM3\special.c:196) Special pool start F272B000 - end F2B2A000

But the address allocated is higher than that area - 0xf300dfe8. Is it a mistake in DPRINT?

@binarymaster
Copy link
Copy Markdown
Member

The branch zefklop:CORE-17595 got synced with ROS master, but without PR commits applied on top, and this led to PR close... perhaps by accident? 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix For bugfix PRs. refactoring For refactoring changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants