Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap minhook for Microsoft Detours #12964

Merged
merged 20 commits into from Jan 10, 2022
Merged

Conversation

LeonarddeR
Copy link
Collaborator

@LeonarddeR LeonarddeR commented Oct 20, 2021

Link to issue number:

Related to #8420

Summary of the issue:

NVDA has been using minhook from the start to hook several Windows APIs, particularly related to the display model. However, the version of minhook currently in use is very old. I tried to update it once, but that failed and was reverted in #8456.

Description of how this pull request fixes the issue:

Microsoft always had their own hooking library, but it was closed source in the past. Now it is open, it offers us the following benefits:

  1. Support for hooking on ARM64, which should make the display model work.
  2. It is well documented and much more widely used and proven than Minhook
  3. Its code is more recent than minhook's.

Testing strategy:

Apart from testing in Alpha, this can be tested by checking whether screen review mode still works in several apps, including Windows Explorer, Notepad and other Win32 apps. Additionaly, it would be good to verify that #8420 does not return. I can't test that myself.

Test display model and general smoke test with:

Known issues with pull request:

None known so far.

Change log entries:

For Developers

  • Switched from Minhook to Microsoft Detours as a hooking library for NVDA. Hooking with this library is mainly used to aid the display model.

Code Review Checklist:

  • Pull Request description:
    • description is up to date
    • change log entries
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • API is compatible with existing add-ons.
  • Documentation:
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English

Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very useful change - have you tested this with an ARM64 device yet?

readme.md Show resolved Hide resolved
@LeonarddeR
Copy link
Collaborator Author

Have you tested this with an ARM64 device yet?

Afraid not as I don't own such a device.

@lukaszgo1
Copy link
Contributor

Is this work mature enough to be used for testing? I rely pretty heavily on display model and would like to avoid it broken in Alpha if possible hence my question.

@LeonarddeR
Copy link
Collaborator Author

I would really love it if you could test drive the try build for a bit to see how it behaves.

@seanbudd
Copy link
Member

@LeonarddeR - would a signed try build be better for testing this? Happy to create one from this branch when ready.

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Oct 21, 2021

@LeonarddeR - would a signed try build be better for testing this? Happy to create one from this branch when ready.

That would be very helpful indeed!

@seanbudd
Copy link
Member

@zstanecic
Copy link
Contributor

Hi! How this should affect users on w11? How to test this on the user side? what should be tested?

@feerrenrut
Copy link
Contributor

General approach looks good. I'll need to do a more detailed review once successful testing is reported on arm devices. I assume this will also need testing on 32 bit and older Windows releases. A list of the intended test targets would be helpful.

@LeonarddeR
Copy link
Collaborator Author

@seanbudd wrote:

@LeonarddeR @lukaszgo1 Here is the signed try-build and appveyor build link

Thanks, I'll, test this for some days on my Windows 11 machine.

Hi! How this should affect users on w11? How to test this on the user side? what should be tested?

Honestly, the intention is that users don't get affected. It is like changing an essential part of a car to something broader supported without breaking the car.

General approach looks good. I'll need to do a more detailed review once successful testing is reported on arm devices.

I'mnot sure how to do this, as I don't know an ARM device user who can test. How was this done by NV Access in the past? May be @jcsteh?

I assume this will also need testing on 32 bit and older Windows releases. A list of the intended test targets would be helpful.

While this is trictly spoken correct, like Minhook, Detours behaves exactly the same on all versions of Windows, and hooking has never been a problem when a new major version of Windows came out. I think it is enough to get this to Alpha for broader testing as soon as ARM64 tests are performed successfully.

Having said that, something like a system test for display model behaviour could be helpful.

@LeonarddeR LeonarddeR changed the title Proof of concept: Swap minhook for Microsoft Detours Swap minhook for Microsoft Detours Oct 22, 2021
@zstanecic
Copy link
Contributor

zstanecic commented Oct 22, 2021 via email

@jcsteh
Copy link
Contributor

jcsteh commented Oct 22, 2021

General approach looks good. I'll need to do a more detailed review once successful testing is reported on arm devices.

I'mnot sure how to do this, as I don't know an ARM device user who can test. How was this done by NV Access in the past? May be @jcsteh?

I implemented ARM64 support for NVDA as part of a project at Mozilla. I do still have a Mozilla ARM64 machine floating around here which I could probably test on if it still works; I haven't used it in years. I don't know when I could get to this, though.

That said, I've never seen ARM64 API hooking as being particularly important. Display model is mostly useful in legacy apps... but it seems highly unlikely that a legacy app would still be using legacy GDI and yet be recompiled for ARM64. I only say this because if I'm unable to get to this in a reasonable time frame, I would suggest you could ship this with ARM64 support disabled (as we currently have with MinHook) and then deal with ARM64 at some future point.

@jcsteh
Copy link
Contributor

jcsteh commented Oct 22, 2021

I vaguely recall @michaelDCurran might also have an ARM64 machine? I'm not sure.

@LeonarddeR
Copy link
Collaborator Author

Display model is mostly useful in legacy apps... but it seems highly unlikely that a legacy app would still be using legacy GDI and yet be recompiled for ARM64.

We rely on the display model for some Inhouse Windows apps, the management console for example (see the mmc appModule).

Basically I agree that's not super important indeed.

@michaelDCurran
Copy link
Member

michaelDCurran commented Oct 22, 2021 via email

@zstanecic
Copy link
Contributor

zstanecic commented Oct 22, 2021 via email

@lukaszgo1
Copy link
Contributor

That said, I've never seen ARM64 API hooking as being particularly important. Display model is mostly useful in legacy apps... but it seems highly unlikely that a legacy app would still be using legacy GDI and yet be recompiled for ARM64.

There are still parts of MS Office i.e. controls used to edit custom lists in MS Excel which cannot be used without GDI Hooks, Until recently this probably worked on ARM64 as people were using normal 32-bit versions of Office, but some time ago Microsoft released Office which is recompiled for ARM64

I only say this because if I'm unable to get to this in a reasonable time frame, I would suggest you could ship this with ARM64 support disabled (as we currently have with MinHook) and then deal with ARM64 at some future point.

The only currently known improvement from this PR is support for ARM64 therefore merging this without it seems pretty pointless (we don't have any problems with hooking for other architectures after all) and therefore there is no rush with getting this merged.

Having said that the most usages of Windows for ARM64 with NVDA is probably on the new M1 mac computers where people virtualize it. I believe @pitermach does so. Would you be willing to do some tests for us assuming you indeed have such a device with a Windows for ARM64 Vm?

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Oct 22, 2021

@lukaszgo1 wrote:

... there is no rush with getting this merged.

While that is true, it is API breaking at the C++ level and therefore I prefer to have it in an API breaking release. Theoretically, add-ons could use the minhook dll bundled with NVDA for all sort of magic things they really should avoid. Additionally, #11768 was delayed for similar reasons, though availability of newer WinRT headers played a role in there as well.

@LeonarddeR LeonarddeR marked this pull request as ready for review October 22, 2021 09:23
@LeonarddeR LeonarddeR requested a review from a team as a code owner October 22, 2021 09:23
@pitermach
Copy link

Hi,

I tried the try build linked a few comments back inside a Windows 10 ARM virtual machine in UTM running build 21370, NVDA was running in portable mode. Screen Review appeared to work inside the NVDA menu, but was completely blank in other dialogs I looked at (run dialog, the WinVer message box, notepad, and a properties dialog for a file in explorer).

I can try updating this VM to a newer version of Windows (since this build will expire soon anyway) or I can do other tests if you need me to like running this build installed. For the moment I don't think I can try this in Windows 11 because I'm not sure UTM supports a virtual TPM, but I can look into that if possible or point someone who has a parallels license at this thread.

@lukaszgo1
Copy link
Contributor

Screen Review appeared to work inside the NVDA menu,

Just to make sure - how screen review behaves with the normal builds of NVDA when in NVDA menu on ARM?

but was completely blank in other dialogs I looked at (run dialog, the WinVer message box, notepad, and a properties dialog for a file in explorer).

That's pretty bad and debugging this is going to be painful. It looks like QEMU can emulate ARM on x64 so that might be an option assuming it would not be unbearably slow.

@jcsteh
Copy link
Contributor

jcsteh commented Oct 22, 2021

NVDA itself is an x86 process, so screen review has always worked there. Winver, Run dialog, etc. are all native ARM64.

@LeonarddeR, I think you missed the ifdef around gdiHooks_inProcess_initialize in nvdaHelper/remote/inProcess.cpp.

@josephsl
Copy link
Collaborator

josephsl commented Oct 22, 2021 via email

@LeonarddeR
Copy link
Collaborator Author

I got rid of the macro. I would like to remove apiHook_hookFunction as well, e.g. in the header:

template<typename funcType>
bool apiHook_hookFunction_safe(funcType realFunction, funcType fakeFunction, funcType* targetPointerRef);

In the cpp, swap this:

bool apiHook_hookFunction(void* realFunction, void* fakeFunction, void** targetPointerRef) {

with

template<typename funcType>
bool apiHook_hookFunction_safe(funcType realFunction, funcType fakeFunction, funcType* targetPointerRef) {

It compiles, but than fails linking. I have not enough understanding of templating to fix this. Would anyone be able to help out here?

@lukaszgo1
Copy link
Contributor

1. How much memory your machine has

16 GB

2. Whether this is an x86 or x64 process (I assume it is safe to say X64, looking at the name of the executable).

Yes - this is a 64-bit executable

3. How much memory the executable is using at the time of error

574 408 K of ram

4. What OS you're running on the system

Windows 7 X64

5. Whether this is reproducible on another system.

I've tried on two different machines - one with 8 GB of ram running Windows 10, and the second with 6 GB of ram running Windows 7, and this issue cannot be reproduced on any of them.
I'm inclined to say that it is specific to something on my main machine, and while it still would be nice to get to the bottom of it this should not block this PR.

@LeonarddeR
Copy link
Collaborator Author

I'm inclined to say that it is specific to something on my main machine, and while it still would be nice to get to the bottom of it this should not block this PR.

I agree it would be helpful to go to the bottom of this. There are two things we do not know though:

  1. May be Detours really can't hook. Question would be why detours can't and minhook could, but as hooking this app doesn't make much sense in the first place, there's no real problem here.
  2. May be minhook doesn't raise an error even though it can't hook correctly on your machine as well.

I think it would be helpful to have this merged ASAP. If any C++ hero could address my question in #12964 (comment), that would be extremely helpful.

@jcsteh
Copy link
Contributor

jcsteh commented Nov 16, 2021

In the cpp, swap this:

bool apiHook_hookFunction(void* realFunction, void* fakeFunction, void** targetPointerRef) {

with

template<typename funcType>
bool apiHook_hookFunction_safe(funcType realFunction, funcType fakeFunction, funcType* targetPointerRef) {

It compiles, but than fails linking. I have not enough understanding of templating to fix this. Would anyone be able to help out here?

Templated functions are created by the compiler when they're called, with each instantiation of the template effectively created as a separate function. When another module (e.g. gdiHooks) tries to call this, it tries to find the exported function in an external module. Since this specific version of this function was never created in apiHook, you get a linker error.

If you want to do this with just a template, you need to do it in the header file so that all modules can create the templated functions as needed.

@jcsteh
Copy link
Contributor

jcsteh commented Nov 16, 2021

To put this another way, templates are effectively syntactic sugar. They're basically macros, just with a lot more safety.

@LeonarddeR
Copy link
Collaborator Author

Ah, thanks for the great explanation. I guess I'll leave it this way since moving the whole function to the header file, including calls to the logger and detours function is pretty ugly I suppose.

@jcsteh
Copy link
Contributor

jcsteh commented Nov 16, 2021 via email

@lukaszgo1
Copy link
Contributor

I agree it would be helpful to go to the bottom of this. There are two things we do not know though:

It looks like we won't be able to find out - as of today these errors are not occurring anymore across several restarts, even though I haven't done any changes to my machine in the last week.

@lukaszgo1
Copy link
Contributor

Is this PR planned for inclusion in 2022.1? If so perhaps it should be added to the milestone.

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Nov 24, 2021

I think it makes sense to do so as Microsoft seems to intend to expand on their ARM64 interests.

I added the milestone so it can at least be tracked. NV Access can always decide to further delay this pr.

@LeonarddeR LeonarddeR added this to the 2022.1 milestone Nov 24, 2021
nvdaHelper/remote/injection.cpp Outdated Show resolved Hide resolved
@feerrenrut feerrenrut self-assigned this Dec 14, 2021
Copy link
Contributor

@feerrenrut feerrenrut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking pretty good. Happy to discuss whether these changes should be in this PR or a subsequent one.

* a helper template used internally by apiHook_hookFunction_safe
*/
template<typename funcType> funcType _apiHook_hookFunction_tpl(const char* moduleName, const char* functionName, funcType funcSyg, funcType fakeFunction) { return (funcType)apiHook_hookFunction(moduleName,functionName,(void*)fakeFunction); }
bool apiHook_hookFunction(void* realFunction, void* fakeFunction, void** targetPointerRef);

/**
* Safely hooks a given function from a given module with a given fake function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets expand on what makes this "safe", Eg what danger does this avoid?

It seems to me, the only thing this ensures is that the three parameters have matching function signatures.
EG

// the function to be replaced
bool testReal(int x, bool y){
	return false;
}

// a valid replacement
auto *real_func = testReal;
bool testFake(int x, bool y){
	return real_func(x, y);
}

// a broken replacement
auto *real_func2 = testReal;
bool testBroken(int x, float y){
	return false;
}

void doHookTest(){
	auto res = apiHook_hookFunction_safe(testReal, testFake, &real_func);
	// auto res2 = apiHook_hookFunction_safe(testReal, testBroken, &real_func2); // this line causes a compiler error
}

Note I have used auto to get the type of the real function pointer, you could instead do decltype(testReal) *real_func = testReal;. But I think that is overly verbose.

nvdaHelper/remote/apiHook.h Outdated Show resolved Hide resolved

* @param realFunction the name of the function you wish to hook.
* @param fakeFunction the function you wish to be called instead of the original one.
* @param targetPointerRef Pointer to the target pointer to which the detour will be attached.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is accurate, this just seems to be a way to maintain access to the original function (by saving its address). Look at the implementation in nvdaHelper/remote/apiHook.cpp:52, it overrides this with the realFunction.

The same pattern used by Detours: https://github.com/microsoft/Detours/wiki/Using-Detours

Given this, we don't really need the realFunction argument, though it is helpful for type checking, making it harder to mess up.

real_ScriptStringOut=apiHook_hookFunction_safe("USP10.dll",ScriptStringOut,fake_ScriptStringOut);
real_ScriptTextOut=apiHook_hookFunction_safe("USP10.dll",ScriptTextOut,fake_ScriptTextOut);
// Hook needed functions
apiHook_hookFunction_safe(TextOutA, hookClass_TextOut<char>::fakeFunction, &hookClass_TextOut<char>::realFunction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of these realFunction vars are more complicated than they need to be. I think they should be initialized with the real function rather than NULL

LeonarddeR and others added 2 commits January 6, 2022 09:17
Co-authored-by: Reef Turner <feerrenrut@users.noreply.github.com>
@LeonarddeR
Copy link
Collaborator Author

@feerrenrut I'm not sure what you want me to do with the remaining comments. You're right that the safe part is meant to type check, and that's why I left the real function argument in tact even though it is not strictly necessary any more.

@feerrenrut
Copy link
Contributor

I think the doc strings for those functions should be updated to be more clear about their design (and explaining what safe means), and also the targetPointerRef param. Then in a subsequent PR lets update the definitions of the variables used for the targetPointerRef param. They can just be done with auto *savedRealFunctionTarget = realFunction;

Copy link
Contributor

@feerrenrut feerrenrut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @LeonarddeR

@AppVeyorBot

This comment has been minimized.

@feerrenrut feerrenrut merged commit 1e4f869 into nvaccess:master Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants