Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define UNICODE and _UNICODE preprocessors for windows #6338

Merged
merged 11 commits into from Jul 24, 2020
Merged

Define UNICODE and _UNICODE preprocessors for windows #6338

merged 11 commits into from Jul 24, 2020

Conversation

farfella
Copy link
Contributor

Define UNICODE and _UNICODE preprocessors. This will result in Windows API function macros defaulting to the wide implementation instead of the ANSI (local) implementations. This PR ought to be the final PR in the line of PRs that aim to resolve internationalization/localization issues observed in osquery on Windows.

This should resolve #2569 from 2016. :-)

Please encourage future PR submitters to not call the *A functions but the *W functions instead.

@directionless directionless added this to the 4.2.1 milestone Mar 31, 2020
@Smjert Smjert modified the milestones: 4.3.0, 4.3.1 Apr 9, 2020
@farfella
Copy link
Contributor Author

@Breakwell if you could please review the changes to windows\process.cpp I made, would appreciate it. :-) I think you had also made changes recently around the code I updated here.

@theopolis
Copy link
Member

This is great! I'll bring up the performance concern again. :D

Is there a way to compare the before / after:

» time osqueryi --profile 1  --disable_extensions "select * from processes"

Heads up I am not sure the Windows equivalent of time is.

Also @alessandrogario, do you have time to help test performance impact to windows_events usage on a system under minor load?

@farfella
Copy link
Contributor Author

farfella commented Apr 26, 2020

Is there a way to compare the before / after:

» time osqueryi --profile 1  --disable_extensions "select * from processes"

With powershell you can do something like

Measure-Command {your command statement}

Performance-wise, these changes will be the same or better than what currently exists in master. Reasoning is below:

Existing case. Let's say, a piece of osquery code calls OS's CreateFileA(S, ...). CreateFileA will performW = MultiByteToWideChar(GetACP(), A, ...) and then call CreateFileW(W, ...).

So in pseudo-code simplification,

void OurCode()
{
      const std::string S = "Hello, world!";
      CreateFileA(S.c_str());
      ...
}

HANDLE CreateFile(A, ...)
{
    W = MultiByteToWideChar(CP_ACP, A);
    return CreateFileW(W, ...);
}

What the update does. We perform MultiByteToWideChar(CP_UTF8, A, W, ...), and then call CreateFileW(W).

void OurCode()
{
    std::string S = "Hello, world!";
    W = MultiByteToWideChar(CP_UTF8, S.c_str());*
    CreateFileW(W,...);
    ...
}

So, these should two are at least equal in performance (ignoring CP_ACP/CP_UTF8 parameter difference).

Why the updated code may be better* performance-wise: In the case of ANSI Code Page (ACP), the system is performing localization table lookups, whereas converting from UTF-16 to UTF-8 or reverse is a functional transformation as per RFC 2279. So, it depends on whether the localization table (for ACP) is on disk or loaded in memory, and if it's from disk, IO will make it slower than functional transformation between UTF-16 and UTF-8.

@theopolis
Copy link
Member

My performance concern is with the added string conversions, stringToWstring and wstringToString will at least double the memory impact for these functions. I'd love to see some of the more critical parts of code measured before and after to see if there's a noticeable change.

@farfella
Copy link
Contributor Author

Understood. And I agree-- let's perform some experiments to determine performance impact of these changes.

@directionless
Copy link
Member

I assume this is still in limbo.

@directionless directionless modified the milestones: 4.4.0, 4.5.0 May 27, 2020
@farfella
Copy link
Contributor Author

@directionless Yes, that's correct. I do not have enough sample data to perform comparative measurements. I think @theopolis wanted @alessandrogario to perform some tests, if he had time available. Conceptually, the string conversions ought to not double memory impact because we're doing this conversion explicitly now whereas before this conversion was happening implicitly when the ANSI version of the functions (*A) were performing conversion and then calling the Wide (*W) and then reconverting the string back to ANSI. You can observe by disassembling the *A function's implementation to see that it's performing MultiByteToWideChar, then calling *W version, and then calling WideCharToMultiByte.

@theopolis
Copy link
Member

Ok, I'll take another detailed pass at the changes here and if everything looks OK we can merge.

@farfella
Copy link
Contributor Author

farfella commented Jul 9, 2020

Great! Thanks @theopolis! Sorry, I've been a bit busy on my end with our own releases in July, so I haven't had a chance to look over any changes in osquery.

@theopolis
Copy link
Member

Great! Thanks @theopolis! Sorry, I've been a bit busy on my end with our own releases in July, so I haven't had a chance to look over any changes in osquery.

No worries, thanks for all the work thus far here! I can take care of rebasing the PR and resolving any conflicts that show up.

Copy link
Member

@theopolis theopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions, then this looks good!

osquery/process/windows/process.cpp Outdated Show resolved Hide resolved
@theopolis
Copy link
Member

Is there a way to prevent the use of *A functions?

@farfella
Copy link
Contributor Author

Is there a way to prevent the use of *A functions?

Haha, if it were that easy. :) I do not know of any way. However, what this PR does is if a programmer does not explicitly specify A suffix, e.g., just calls CreateFile, this will now be defined as CreateFileW. So, when reviewing, someone explicitly uses an *A function it will now stick out like a sore thumb.

@theopolis
Copy link
Member

@farfella, thanks for the insight/details. With this understanding I feel confident we can land this with the nitpick fills for NULL->nullptr. I am happy to rebase this on master and apply the change if you don't have time this week, let me know.

@farfella
Copy link
Contributor Author

I will have some time starting Wednesday night this week. Not sure if other *A functions have been added to master recently. I see r["mode"] = TEXT(meta->getMode()); in sleuthkit.cpp, which should have SQL_ prefix. Not sure if there are any others.

@theopolis
Copy link
Member

Right, it would be any code that was added since your original PR. This is my fault for letting the PR sit for so long.

@theopolis theopolis merged commit f79d7e3 into osquery:master Jul 24, 2020
@farfella
Copy link
Contributor Author

Sorry, I had a migraine last night, and didn't get a chance to work on this. I was in fact building right now, as you approved and fixed the remaining merge issues. :) Thank you, @theopolis !

@farfella farfella deleted the define_unicode_preprocessors_for_windows branch August 16, 2020 23:30
@farfella farfella restored the define_unicode_preprocessors_for_windows branch August 16, 2020 23:31
@B3DTech
Copy link

B3DTech commented Jan 19, 2021

Should this have resolved the following that's flooding my Windows event logs? Running 4.5.1

caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8" took=14.0018ms

caller=log.go:124 ts=2021-01-19T16:52:01.0215442Z caller=level.go:63 level=info caller=log.go:69 component=osquery level=stderr msg="I0119 11:52:01.021148 11232 registry.cpp:558] Failed to expand globs: Failed to open registry handle" caller=registry.cpp:558

@farfella
Copy link
Contributor Author

Hmm not sure. This seems to be a grpc marshalling error. Might not be related to osquery proper.

@directionless
Copy link
Member

The GRPC error is coming from https://github.com/kolide/launcher/, but it launcher isn't doing anything other than trying to marshal the data from osquery. So it's likely indicative of non-utf8 data.

Whether it's this, or an instance of #5288 is hard to know without seeing the logs it's trying to encode.

@B3DTech
Copy link

B3DTech commented Jan 29, 2021

It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should not use ANSI Windows APIs
5 participants