New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation output varies across runs #111

Open
MostafaSoliman opened this Issue May 8, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@MostafaSoliman
Copy link
Contributor

MostafaSoliman commented May 8, 2018

Hello Ivan,
I was following the tutorial for msxml fuzzing here "https://symeonp.github.io/all_posts.html". I have followed the same steps but WinAFL always warn me because somehow it sees different paths when executing the same sample, I think this results in false positive samples being placed in the queue.

I tried to investigate the issue, I used afl-showmap to print the bitmap of the same sample twice and I compared the output it is shown in the attachments.

Also, I tried drrun -t drcov twice on the same sample and loaded the coverage files in IDA to compare them, I didn't see any difference.

This is the code I used compiled by VS2017 optimization disabled.


#include <comdef.h>
#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#import "MSXML6.dll" rename_namespace(_T("MSXML2"))

extern "C" __declspec(dllexport)  int main(int argc, char** argv);
extern "C" __declspec(dllexport) _bstr_t fuzzme(wchar_t* filename);

// Macro that calls a COM method returning HRESULT value.
#define CHK_HR(stmt)        do { hr=(stmt); if (FAILED(hr)) goto CleanUp; } while(0)

void dump_com_error(_com_error &e)
{
	_bstr_t bstrSource(e.Source());
	_bstr_t bstrDescription(e.Description());

	printf("Error\n");
	printf("\a\tCode = %08lx\n", e.Error());
	printf("\a\tCode meaning = %s", e.ErrorMessage());
	printf("\a\tSource = %s\n", (LPCSTR)bstrSource);
	printf("\a\tDescription = %s\n", (LPCSTR)bstrDescription);
}



_bstr_t validateFile(_bstr_t bstrFile)
{
	// Initialize objects and variables.
	MSXML2::IXMLDOMDocument2Ptr pXMLDoc;
	MSXML2::IXMLDOMParseErrorPtr pError;
	_bstr_t bstrResult = L"";
	HRESULT hr = S_OK;

	// Create a DOMDocument and set its properties.
	CHK_HR(pXMLDoc.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER));

	pXMLDoc->async = VARIANT_FALSE;
	pXMLDoc->validateOnParse = VARIANT_TRUE;
	pXMLDoc->resolveExternals = VARIANT_TRUE;

	// Load and validate the specified file into the DOM.
	// And return validation results in message to the user.
	if (pXMLDoc->load(bstrFile) != VARIANT_TRUE)
	{
		pError = pXMLDoc->parseError;

		bstrResult = _bstr_t(L"Validation failed on ") + bstrFile +
			_bstr_t(L"\n=====================") +
			_bstr_t(L"\nReason: ") + _bstr_t(pError->Getreason()) +
			_bstr_t(L"\nSource: ") + _bstr_t(pError->GetsrcText()) +
			_bstr_t(L"\nLine: ") + _bstr_t(pError->Getline()) +
			_bstr_t(L"\n");
	}
	else
	{
		bstrResult = _bstr_t(L"Validation succeeded for ") + bstrFile +
			_bstr_t(L"\n======================\n") +
			_bstr_t(pXMLDoc->xml) + _bstr_t(L"\n");
	}

CleanUp:
	return bstrResult;
}

wchar_t* charToWChar(const char* text)
{
	size_t size = strlen(text) + 1;
	wchar_t* wa = new wchar_t[size];
	mbstowcs(wa, text, size);
	return wa;
}
_bstr_t fuzzme(wchar_t* filename)
{
	_bstr_t bstrOutput = validateFile(filename);
	//bstrOutput += validateFile(L"nn-notValid.xml");
	//MessageBoxW(NULL, bstrOutput, L"noNamespace", MB_OK);
	return bstrOutput;

}
int main(int argc, char** argv)
{
	if (argc < 2) {
		printf("Usage: %s <xml file>\n", argv[0]);
		return 0;
	}

	HRESULT hr = CoInitialize(NULL);
	if (SUCCEEDED(hr))
	{
		try
		{
			_bstr_t bstrOutput = fuzzme(charToWChar(argv[1]));
			//MessageBoxW(NULL, bstrOutput, L"noNamespace", MB_OK);
		}
		catch (_com_error &e)
		{
			dump_com_error(e);
		}
		CoUninitialize();
	}

	return 0;

}

I have seen a similar thread to this issue but it was marked as an issue in the application itself, and I don't think this is the case here as I am following the same steps in the tutorial.

Thanks,
Mostafa

drcov.XMLValidate.exe.11728.0000.proc.log
drcov.XMLValidate.exe.06916.0000.proc.log

afl-showmap
bitmap-compare-2
bitmap-compare-1
afl-fuzz
drrun

@symeonp

This comment has been minimized.

Copy link

symeonp commented May 8, 2018

Hi Ivan, apologies for hijacking this thread but it looks like finally someone found my blog useful more or less! 😅 (your comments are more than welcome though!)

Hey Mostafa, so to begin with from the attached coverage files it looks like you are running them on a 64bit version:

-- snip --
 32, 524288, C:\Windows\SysWOW64\uxtheme.dll
 33, 536576, C:\Windows\SysWOW64\clbcatq.dll
 34, 356352, C:\Windows\SysWOW64\shlwapi.dll
 35, 1409024, C:\Windows\SysWOW64\msxml6.dll
BB Table: 25077 bbs

My environment is a 32bit version, including the compiled harness where I'm getting the following coverage:

-- snip --
C:\Windows\System32\msxml6.dll
 30, 0x74ee0000, 0x74ee9000, 0x74ee1220, 0x000138c1, 0x4a5bdb2b, C:\Windows\System32\version.dll
 31, 0x762c0000, 0x764f5000, 0x762c3b90, 0x00238d09, 0x59548dc1, C:\Windows\System32\iertutil.dll
 32, 0x75be0000, 0x75beb000, 0x75be1992, 0x000126fb, 0x4a5bbf41, C:\Windows\System32\profapi.dll
 33, 0x75bf0000, 0x75c07000, 0x75bf1c9d, 0x0001bf8b, 0x4ce7ba28, C:\Windows\System32\userenv.dll
 34, 0x76500000, 0x767ab000, 0x76502a70, 0x002a6628, 0x595481fb, C:\Windows\System32\wininet.dll
 35, 0x75f50000, 0x7609b000, 0x75f52b80, 0x001412a4, 0x59548100, C:\Windows\System32\urlmon.dll
BB Table: 40745 bbs

Your coverage looks good honestly, but if you are running 64bit version run the 64bit version of drrun and afl-fuzz as well, and generally try not to mix 32bit and 64bit (maybe that's the reason you can't see any coverage with the Lighthouse plugin?)

Regarding the samples, before you try to start the whole fuzzing campaign I included the 'Coprus minimisation' section, where instead of trying to throw a few hundred files, try to minimise them and then start your campaign with the minimised cases as the initial seed files. As I mentioned, I had to create this simple bash liner (running via Cygwin):

$ for file in *; do printf "==== FILE: $file =====\n"; /cygdrive/c/xmlvalidate.exe $file ;sleep 1; done

because the way that winafl-cmin.py works is that it's expecting all the files to lead to the same path (that is either 'Validation Succeeded' or 'Validation Failed'. As such, try to start with seed files that either they get succeeded or they fail, don't mix the test cases! In short: Try to follow the KISS (Keep It Simple, Stupid) principle and run the debug version with one xml file that passes the validation:

C:\winafl\bin32>C:\DRIO6\bin32\drrun.exe -c winafl.dll -debug -coverage_module msxml6.dll -target_module xmlvalidate.exe
 -target_method main -fuzz_iterations 10 -nargs 2 -- C:\xmlvalidate.exe C:\xml_samples\nn-valid.xml

[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded
[+] Validation succeeded

Notice how the harness was indeed executed 10 times and I correctly hit the succeeded message.
Now if I open the .log file from winafl am getting:

-- snip --
Module loaded, bcrypt.dll
Module loaded, MSXML6.dll
Instrumenting MSXML6.dll with the 'bb' mode.      <=== That's what you're looking for
In OpenFileW, reading C:\xml_samples\nn-valid.xml
Module loaded, VERSION.dll
Module loaded, iertutil.dll
Module loaded, profapi.dll
Module loaded, USERENV.dll
Module loaded, WININET.dll
Module loaded, urlmon.dll
-- snip --
Module loaded, bcrypt.dll
Module loaded, MSXML6.dll
In OpenFileW, reading C:\xml_samples\nn-valid.xml
In OpenFileW, reading C:\xml_samples\nn.xsd
In post_fuzz_handler
Everything appears to be running normally.

This is a very good indication that my samples are well defined, and I can continue with the fuzzing scenario.

Also, I realised that the harness is a bit buggy, as in if you provide a file that doesn't exist you'll get back '[+] Validation failed' which shouldn't since the file does not exist at all :)

Hope that helps a bit and good luck!

Edit: I always to like to provide absolute paths than relative ones: From your last command, I'm assuming that both XMLValidate.exe and test.xml are located on d:\fuzzing\prog

@ivanfratric

This comment has been minimized.

Copy link
Contributor

ivanfratric commented May 8, 2018

Hey Mostafa,

That's a nice analysis. One thing to note about drcov is that it only captures a basic block once (when that basic block is executed for the first time), unlike afl/WinAFL that have a counter. Looking at the output of showmap, it's possible some basic blocks got executed different number of times during different runs. This would explain different output of showmap and the same output of drcov.

Unfortunately, it's difficult to tell more without knowing which basic blocks in which function(s) got executed different number of times and why. If you want to pursue this further, that would be an interesting information to have. Perhaps by comparing those variable blocks with what you have in the drcov logs could help you identify them. This is how WinAFL calculates the counter offset for a block: https://github.com/ivanfratric/winafl/blob/master/winafl.c#L275-L276

@MostafaSoliman

This comment has been minimized.

Copy link
Contributor

MostafaSoliman commented May 8, 2018

Thanks Symeonp I will try to unify to x64 or x32, by the way, the blog is awesome I am sad it only contains one tutorial :)

Thanks Ivan, in case unifying the architecture didn't solve it I will try to dig deep by inspecting the bitmaps that got executed different times.

@MostafaSoliman

This comment has been minimized.

Copy link
Contributor

MostafaSoliman commented May 8, 2018

Hi Ivan,
The numbers in the bitmap file, I thought they are addresses for instructions executed. but after a closer look, it is not. what are they and how can I map them to the instructions executed. Thanks

@ivanfratric

This comment has been minimized.

Copy link
Contributor

ivanfratric commented May 9, 2018

In the basic block coverage mode (default), the first number is (relative virtual address of basic block)%65536, see the lines of code I linked in the previous comment. The second number is the counter (how many time have the basic blocks corresponding to the first number been executed). S

If you want to get actual addresses, you can increase the MAP_SIZE in these places
https://github.com/ivanfratric/winafl/blob/ba9c460821aee5689216da100a67ad1c235475a5/config.h#L322
https://github.com/ivanfratric/winafl/blob/c37a71b4e24a5ed4e303395c6562db5119e46038/winafl.c#L25
to something larger than the size of coverage_module

@yoava333

This comment has been minimized.

Copy link
Contributor

yoava333 commented May 30, 2018

Another thing to watch for is that in windows 10 in Wow64 the stack alignment that the process starts with is not % 8 but % 4, which causes memcpy to execute different paths (AVX / SSE instruction require alignment). I start each harness by aligning the stack.

@MostafaSoliman

This comment has been minimized.

Copy link
Contributor

MostafaSoliman commented May 30, 2018

@yoava333 what is the best way to do that?

@yoava333

This comment has been minimized.

Copy link
Contributor

yoava333 commented May 30, 2018

I use something like this:

#include <iostream>

int __declspec(noinline) _main(int argc, char * argv[]) {
	int a = 0;
	
	printf("stack alignment = %d\n", ((size_t)&a) % 8);
	
	return 1;
}

int main(int argc, char * argv[]) {
	size_t a = 0;

	if (((size_t)&a) % 8 != 0) {
		printf("alignment != 8\n");
		alloca(4);
	}
	else {
		printf("alignment == 8\n");
	}

	return _main(argc, argv);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment