Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NTOS:CM] Implement registry validation checks & registry healing #4571

Merged
merged 15 commits into from Nov 19, 2023

Conversation

GeoB99
Copy link
Member

@GeoB99 GeoB99 commented Jul 3, 2022

CmCheckRegistry is a vital kernel routine that provides the brains and heart mechanism of registry validation, namely the CM hive, the hive and corresponding bins and cells and whatnot. CmCheckRegistry comes into action when logs have been replayed and more specifically when certain actions have occurred such as hive initialization, key flushing, key saving, key restoring and whatnot.

CmCheckRegistry is an essential routine that provides the bulk of registry healing and a piece of puzzle among others to achieve that. This routine works dependently based upon a bit flag that is passed to Flags parameter. The following flags are:

  • CM_CHECK_REGISTRY_DONT_PURGE_VOLATILES -- Don't do any volatile purges
  • CM_CHECK_REGISTRY_PURGE_VOLATILES -- Purge out volatile information data from a registry hive, on-demand
  • CM_CHECK_REGISTRY_BOOTLOADER_PURGE_VOLATILES -- Purge out bootloader related volatile data, whatever that is
  • CM_CHECK_REGISTRY_VALIDATE_HIVE -- Perform hive validation checks and thorough analyzing of hive's bins and cells, HvValidateHive will be triggered in this case
  • CM_CHECK_REGISTRY_FIX_HIVE -- Fix a broken registry hive on-demand. This flag is typically used by an offline registry repair tool such as ReactOS Check Registry Utility, where one could repair a hive without the need to boot the ReactOS Usetup to repair it.

CmCheckRegistry performs the following actions in this strict order:

  • Validate the hive if requested by submitting the CM_CHECK_REGISTRY_VALIDATE_HIVE flag, from there each bin internal structure header is validated for correctness (non-corrupt), size and if it's mapped, that cells sizes make sense and cells pointers, etc.
  • Validate and analyze the security descriptor of each hive block, in accordance with the requirements imposed by the Security subsystem (aka Se) with RtlValidSecurityDescriptor routine. A security descriptor that went to shit will be reported as such and security information will be defaulted accordingly. From there, security caching comes into play.
  • Perform deep recursive registry checking, the key is also validated. Lexicographical order is also validated.

KVM https://reactos.org/testman/compare.php?ids=85306,85313

TODO

For the last point, apparently there's quite some stuff in our Configuration Manager of the kernel that are a stub or barely implemented at all in the kernel, namely map view of hives and security caching support and stuff. All of this further work has to be shipped separately in different PRs.

  • Implement HvValidateHive
  • Implement HvValidateBin
  • [ ] Implement security descriptor validation (to be implemented on a separate PR in the future)
  • Implement deep checking of the registry
  • Implement key validation, value lists, lexicographical order, etc
  • Implement registry self healing (CmSelfHeal and CmpBootType switch indicators)
  • Implement bootloader hive healing & recovery
  • Implement log transaction & other necessary code
  • Fix registry lazy flushing (merged separately in [NTOS:CM] Don't lazy flush the registry during unlocking operation #4955)
  • Do more investigations for missing implementations of stuff in Configuration Manager that prevent the further progress of the implementation of registry checking itself
  • Run testbot / perform further tests
  • Fix / work around freeldr size issue on x64

JIRA Issues: CORE-9195, CORE-6762, CORE-18303, CORE-19207

@GeoB99 GeoB99 added enhancement For PRs with an enhancement/new feature. kernel&hal Code changes to the ntoskrnl and HAL labels Jul 3, 2022
@GeoB99 GeoB99 added this to New PRs in ReactOS PRs via automation Jul 3, 2022
@GeoB99 GeoB99 self-assigned this Jul 3, 2022
@HBelusca HBelusca self-requested a review July 3, 2022 19:13
@HBelusca
Copy link
Contributor

HBelusca commented Jul 4, 2022

It could be of interest to move the cmcheck file (+ the associated defines) in the cmlib instead. The reasons:

  • it's going to be used in the bootloader (for fixing the SYSTEM hive);
  • it's of course also used in the kernel;
  • it would help in testing the code in user-mode (and debugging it with e.g. VS or whatever);
  • it would help in creating an offline registry repair tool, similar to Windows' chkreg.exe : https://www.microsoft.com/en-us/download/details.aspx?id=20068

@HBelusca
Copy link
Contributor

HBelusca commented Jul 4, 2022

Build has been fixed. I let you re-order + squash the commits as needed:

  • the [CMLIB][NTOS:CM] Deduplicate other common definitions between CMLIB and the NTOS CM commit must be moved at the very first position, and never be squashed (ideally it should be committed in master right now);
  • the ** TO BE SQUASHED ** commit should be squashed with your current last commit `Implement registry check validation".

sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
@GeoB99 GeoB99 force-pushed the cm-check-registry branch 8 times, most recently from 21b0eca to 43be481 Compare July 10, 2022 12:28
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmlib.h Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
@GeoB99 GeoB99 force-pushed the cm-check-registry branch 3 times, most recently from d164689 to 8377bd8 Compare July 17, 2022 10:09
@GeoB99 GeoB99 changed the title [NTOS:CM] Implement registry validation checks [NTOS:CM] Implement registry validation checks & registry healing Jul 17, 2022
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
sdk/lib/cmlib/cmcheck.c Outdated Show resolved Hide resolved
@GeoB99 GeoB99 force-pushed the cm-check-registry branch 4 times, most recently from 265a1e2 to d7dc765 Compare July 21, 2022 17:38
@GeoB99 GeoB99 removed the help wanted Request for help. label Nov 11, 2023
@GeoB99 GeoB99 marked this pull request as ready for review November 11, 2023 17:09
@GeoB99 GeoB99 force-pushed the cm-check-registry branch 3 times, most recently from b4b5876 to ac024c3 Compare November 19, 2023 15:50
…ellaneous Stuff

=== DOCUMENTATION REMARKS ===

HBOOT_TYPE_REGULAR and HBOOT_TYPE_SELF_HEAL are boot type values set up by the CMLIB library (for the BootType field respectively). HBOOT_TYPE_REGULAR indicates a normal system boot whereas HBOOT_TYPE_SELF_HEAL indicates the system boot is assisted within self healing mode.

Whether the former or the latter value is set it's governed by both the kernel and the bootloader. The bootloader and the kernel negotiate together to determine if any of the registry properties (the hive, the base block, the registry base, etc) are so severed from corruption or not. In extreme cases where
registry healing is possible, the specific base block of the damaged hive will have its flags marked with HBOOT_TYPE_SELF_HEAL. At this point the boot phase procedure is orchestrated since the boot phase no longer goes on the default path but it's assisted, as I have already said above.

HBOOT_NO_BOOT_RECOVER, HBOOT_BOOT_RECOVERED_BY_HIVE_LOG and HBOOT_BOOT_RECOVERED_BY_ALTERNATE_HIVE on the other hand are identifiers for the BootRecover field of the BASE_BLOCK header structure. These are used exclusively by FreeLdr to tell the kernel if the bootloader recovered the SYSTEM hive or not. In case where the bootloader did recover the SYSTEM hive,
the kernel will perform a flush request on the dirty data down to disk. The (almost) worse case FreeLdr could not repair the main hive by applying log data, it will load the alternate mirror version of the hive.

In addition to that, declare other miscellaneous CMLIB identifiers for log transaction writes purposes.
When shutting down the registry of the system we don't want that the registry in question gets poked again, such as flushing the hives or syncing the hives and respective logs for example. The reasoning behind this is very simple, during a complete shutdown the system does final check-ups and stuff until the computer
shuts down.

Any writing operations done to the registry can lead to erratic behaviors. CmShutdownSystem call already invokes a final flushing of all the hives on the backing storage which is more than enough to ensure consistency of the last session configuration. So after that final flushing, mark HvShutdownComplete as TRUE indicating
that any eventual flushing or syncying (in the case where HvSyncHive gets called) request is outright ignored.
…ile / annotate CmpFileSetSize parameters with SAL

During a I/O failure of whatever kind the upper-level driver, namely a FSD, can raise a hard error and a deadlock can occur. We wouldn't want that to happen for particular files like hives or logs so in such cases we must disable hard errors before toying with hives until we're done.

In addition to that, annotate the CmpFileSetSize function's parameters with SAL.
…atus codes

Add these NTSTATUS codes in the CMLIB library. STATUS_INVALID_PARAMETER will be used mostly for HvInitialize function, STATUS_REGISTRY_IO_FAILED for whatever routines that deal with reading or writing into a hive file.
This implements cmheal.c file which provides the basic registry self-heal infrastructure needed by the public CmCheckRegistry function. The infrastructure provides a range of various self-heal helpers for the hive, such as subkey, class, values and node healing functions.
CmCheckRegistry is a function that provides the necessary validation checks for a registry hive. This function usually comes into action when logs have been replayed for example, or when a registry hive internals have changed such as when saving a key, loading a key, etc.

This commit implements the whole Check Registry infrastructure (cmcheck.c) in CMLIB library for ease of usage and wide accessibility across parts of the OS. In addition, two more functions for registry checks are also implemented -- HvValidateHive and HvValidateBin.

Instead of having the CmCheckRegistry implementation in the kernel, it's better to have it in the Configuration Manager library instead (aka CMLIB). The benefits of having it in the library are the following:

- CmCheckRegistry can be used in FreeLdr to fix the SYSTEM hive
- It can be used on-demand in the kernel
- It can be used for offline registry repair tools
- It makes the underlying CmCheckRegistry implementation code debug-able in user mode

CORE-9195
CORE-6762
…ckRegistry & add missing CmCheckRegistry calls

In addition to that, in some functions like CmFlushKey, CmSaveKey and CmSaveMergedKeys we must validate the underlying hives as a matter of precaution that everything is alright and we don't fuck all the shit up.
=== DOCUMENTATION REMARKS ===

This implements (also enables some parts of code been decayed for years) the transacted writing of the registry. Transacted writing (or writing into registry in a transactional way) is an operation that ensures the successfulness can be achieved by monitoring two main points.
In CMLIB, such points are what we internally call them the primary and secondary sequences. A sequence is a numeric field that is incremented each time a writing operation (namely done with the FileWrite function and such) has successfully completed.

The primary sequence is incremented to suggest that the initial work of syncing the registry is in progress. During this phase, the base block header is written into the primary hive file and registry data is being written to said file in form of blocks. Afterwards the seconady sequence
is increment to report completion of the transactional writing of the registry. This operation occurs in HvpWriteHive function (invoked by HvSyncHive for syncing). If the transactional writing fails or if the lazy flushing of the registry fails, LOG files come into play.

Like HvpWriteHive, LOGs are updated by the HvpWriteLog which writes dirty data (base block header included) to the LOG themselves. These files serve for recovery and emergency purposes in case the primary machine hive has been damaged due to previous forced interruption of writing stuff into
the registry hive. With specific recovery algorithms, the data that's been gathered from a LOG will be applied to the primary hive, salvaging it. But if a LOG file is corrupt as well, then the system will perform resuscitation techniques by reconstructing the base block header to reasonable values,
reset the registry signature and whatnot.

This work is an inspiration from PR reactos#3932 by mrmks04 (aka Max Korostil). I have continued his work by doing some more tweaks and whatnot. In addition to that, the whole transaction writing code is documented.

=== IMPORTANT NOTES ===

HvpWriteLog -- Currently this function lacks the ability to grow the log file size since we pretty much lack the necessary code that deals with hive shrinking and log shrinking/growing as well. This part is not super critical for us so this shall be left as a TODO for future.

HvLoadHive -- Currently there's a hack that prevents us from refactoring this function in a proper way. That is, we should not be reading the whole and prepare the hive storage using HvpInitializeMemoryHive which is strictly used for HINIT_MEMORY but rather we must read the hive file block by block
and deconstruct the read buffer from the file so that we can get the bins that we read from the file. With the hive bins we got the hive storage will be prepared based on such bins. If one of the bins is corrupt, self healing is applied in such scenario.

For this matter, if in any case the hive we'll be reading is corrupt we could potentially read corrupt data and lead the system into failure. So we have to perform header and data recovery as well before reading the whole hive.
…heckRegistry

Thanks to CmCheckRegistry, the function can perform volatile data purging upon boot which this removes old hacky CmPrepareHive code. This also slightly refactors HvInitialize making it more proper.
…CheckRegistry for registry validation

Validate the SYSTEM hive with CmCheckRegistry and purge volatile data with the same function when initializing a hive descriptor for SYSTEM.
Also implement SYSTEM recovery code that takes use of SYSTEM log in case something is fishy with the hive. If hive repair doesn't have fully recovered the SYSTEM hive, FreeLdr will load the alternate variant of the SYSTEM hive, aka SYSTEM.ALT.

If FreeLdr repairs the hive with a LOG, it will mark it with HBOOT_BOOT_RECOVERED_BY_HIVE_LOG on BootRecover field of the header. All the recovered data that is present as dirty in memory will have to be flushed by the kernel once it is in charge of the system.
Otherwise if the system boot occurred by loading SYSTEM.ALT instead, FreeLdr will mark HBOOT_BOOT_RECOVERED_BY_ALTERNATE_HIVE, the kernel will start recovering the main hive as soon as it does any I/O activity into it.
The newly implemented code for registry recovery makes the FreeLdr binary to grow
in size, to the point that it would BSOD because the PE image is too big.

For now we have to temporarily disable any of the newly added code, until
either FreeLdr is split into a basic PE bootloader image itself and a
"FreeLdrlib" that is used by the PE image to access various bootloader APIs
or another proper solution is found.
…covered by FreeLdr

If FreeLdr performed recovery against the SYSTEM hive with a log, all of its data is only present in volatile memory thus dirty. So the kernel is responsible to flush all the data that's been recovered within the SYSTEM hive into the backing storage.
…rrupt

As we iterate over the chunk hive data pointer for hive bins that we are going
to enlist, we might encounter one or several bins that would get corrupted
during a premature abortion of a registry writing operation such as due to
a power outage of the system, hardware malfunction, etc.

Corruption at the level of hive bins is nasty because they contain actual cell
data of registry information such as keys, values etc. Assuming a bin is corrupt
in part we can fix it by recovering some of the bin properties that, theoretically,
could be fixed -- namely the signature, size and offset.

For size and offset we are more or less safe because a bin typically has a size
of a block, and the offset is the coordinate index of where a hive bin should lay at.
Sometimes repairing a broken hive with a hive log does not always guarantee the hive
in question has fully recovered. In worst cases it could happen the LOG itself is even
corrupt too and that would certainly lead to a total unbootable system. This is most likely
if the victim hive is the SYSTEM hive.

This can be anyhow solved by the help of a mirror hive, or also called an "alternate hive".
Alternate hives serve the purpose as backup hives for primary hives of which there is still
a risk that is not worth taking. For now only the SYSTEM hive is granted the right to have
a backup alternate hive.

=== NOTE ===

Currently the SYSTEM hive can only base upon the alternate SYSTEM.ALT hive, which means the
corresponding LOG file never gets updated. When time comes the existing code must be adapted
to allow the possibility to use .ALT and .LOG hives simultaneously.
@GeoB99 GeoB99 merged commit f3141fb into reactos:master Nov 19, 2023
37 checks passed
ReactOS PRs automation moved this from New PRs to Done Nov 19, 2023
@GeoB99
Copy link
Member Author

GeoB99 commented Nov 19, 2023

It's been finally merged. I want to thank @mrmks04, @HBelusca, testers and everybody who helped in and made this possible!

@GeoB99 GeoB99 deleted the cm-check-registry branch November 19, 2023 20:07
@JoachimHenze
Copy link
Contributor

Attention: This PR introduced regression:
https://jira.reactos.org/browse/CORE-19337

@binarymaster binarymaster removed this from Done in ReactOS PRs Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement For PRs with an enhancement/new feature. freeldr Freeloader changes kernel&hal Code changes to the ntoskrnl and HAL
Projects
None yet
5 participants