PESpy is a C#/PowerShell library for reverse engineering, analyzing and visualizing Microsoft compiler generated file formats.
Given a file, PESpy aims to
- Understand the meaning of every single byte within that file
- Support parsing all known entities, no matter how obscure
- Minimize abstractions, and mirror native type names wherever possible
- Be highly performant while still being ergonomic. Allocations need be as low as possible!
- Support all known symbol formats; COFF, OMF, CodeView, SYM, DBG, PDB files, DNDRB, NB00-NB10, RSDS - if symbols exist, PESpy will read and show them to you
- Unironically provide information at your fingertips. The whole entire file hierarchy is exposed via properties; simply open a file, and then poke around in the Locals window
- Support reading PE Files out of a remote debug target where the size of the PE File isn't known upfront
- Provide tools for performing various file operations, including
- Detecting file types
- Locating symbol files (no more
symsrv.dll!) - Resolving RPC Servers
- Manipulating Symbol Keys
- Parsing vftables
- Undecorating symbol names
- Reading and decompressing files contained in Windows installation media
- Be highly NativeAOT friendly
PESpy is capable of interfacing with the following file types
| Name | Description |
|---|---|
| PE | Portable Executable files, first seen in Windows NT 3.1 |
| PDB | "Old Style" (JG 1.0), MSF (JG 2.0, DS 7.0) and Portable PDB files |
| OBJ | Principally we are interested in *.obj files, but strictly speaking anything that uses COFF (such as *.exp, *.iobj, etc) can be opened |
| DOS | Simple DOS files with an IMAGE_DOS_HEADER and possible trailing CodeView data |
| NE | 16-bit New Executable files, as seen in 16-bit Windows and to a lesser extent in Windows 9x |
| LE | 32-bit Linear Executable files; specifically, the format used by VxD driver files |
| DBG | COFF based files containing debug metadata that has been split out of the main executable file |
| LIB | COFF based Archive libraries used by the linker, that potentially contain object files embedded within them |
| OMF | `*.obj`` files emitted by older compiler toolchains from the DOS era that use the Object Module Format, a precursor to COFF |
| OMFLIB | *.lib files emitted and consumed by older compiler toolchains from the DOS era that use OMF |
| OMFDBG | An older style *.dbg file whose entire contents is the raw OMF style CodeView section |
| SYM | *.sym files generated by mapsym.exe or by the compiler from parsing a *.map file |
Install-Package PESpyPESpy is available on both nuget.org and PowerShell Gallery. PESpy provides targets for both .NET 9.0 and .NET Standard, and is SourceLink compatible. In order to install PESpy from the PowerShell Gallery you must be running PowerShell 5.1+. PESpy is compatible with both Windows PowerShell and PowerShell Core.
PESpy's major selling point is, wherever it can, it tries to show you the true shape of the data that resides within a file. The following snippets show the various entry points to PESpy's key functionality. For extremely thorough documentation on all that PESpy has to offer, please see the wiki.
/* Retrieving locals in native code involves traversing the IMAGE_IMPORT_DESCRIPTOR entities, resolving various RVAs
* traversing a list of IMAGE_THUNK_DATA entities followed, checking various bit fields, resolving
* even more RVAs, before finally retrieving the strings you're after. That is what the data looks like. PESpy provides
* many mechanisms to simplify complex lookups, but it will never hide the underlying shape of the data to "make it easy" */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");
ImageImportDescriptor[]? importTable = peFile.ImportTable;
if (importTable != null)
{
foreach (var imageImportDescriptor in importTable)
{
/* Any field that is an RVA to another entity is modelled as a field of type RVA<T>. This type
* provides access to the original RVA that was listed in the field, whether the RVA could actually
* be resolved to a valid address, and the actual value that was read from that address */
RVA<AnsiString> dllName = imageImportDescriptor.Name;
if (!dllName.IsValid)
continue;
RVA<ImageThunkDataList> originalFirstThunk = imageImportDescriptor.OriginalFirstThunk;
if (!originalFirstThunk.IsValid)
continue;
/* A custom collection type prevents us from having to allocate a large array to access all
* of the thunks in the section. Note that the trailing "null" IMAGE_THUNK_DATA is also included
* as the last item in this list */
foreach (ImageThunkData entry in originalFirstThunk.Value)
{
//IMAGE_THUNK_DATA is defined as a union of four possible fields. PESpy tries to figure out
//which logical type the thunk represents, and stores this in an added Kind field
if (entry.Value == 0)
continue; //This is the trailing "null" entry which marks the end of this import's thunks
if (entry.Kind == ImageThunkData.DataKind.Name)
{
RVA<ImageImportByName> thunkName = entry.Name;
if (!thunkName.IsValid)
Console.WriteLine($"{dllName}: Invalid Name (0x{thunkName.ListedOffset})");
else
Console.WriteLine($"{dllName}: {thunkName}");
}
}
}
}PESpy's Locator class provides a manged implementation of the LOCATOR class found in mspdbcore, which also powers DIA
Locatorcan locate all kinds of symbols; PDBs (be they regular, Portable, Embedded or NGEN),*.dbgfiles (that may in turn point to*.pdbfiles) and even legacy*.symfiles- It knows how to read your symbol path; if
_NT_SYMBOL_PATHisn't set, it'll automatically use a symbol path that includesmsdl.microsoft.com - It can download symbols from remote HTTP servers and cascade them down your symbol path
- Provides various entry points for all kinds of different scenarios, with both synchronous and asynchronous modes available
- Allows specifying a callback to receive progress notifications
- Jumps through various hoops to be as low allocation as possible
- Fully portable, with zero reliance on
symsrv.dll
var pdbPath = Locator.LocatePDB("C:\\Windows\\system32\\ntdll.dll");Locator is such a small part of PESpy's surface area, but I'm amazed how often I use this; this has surprisingly become one of PESpy's best features for me!
/* PEFile provides various members (SymStoreKeys, GetSymStoreKey()) that provide identifiers for files
* that you can lookup on a symbol server. If you're writing unit tests for a diagnostic application that analyzes
* a certain DLL, you can potentialy "bookmark" that DLL by hardcoding its SymStoreKey, and then have your test re-download
* that file as needed so your test always produces the same result! */
var key = new SymStoreKey("coreclr.pdb/75099299D3D948A68B594FC4439DFA521/coreclr.pdb");
var pdbPath = Locator.LocatePDB(key);
/* The PDBFile class provides access to every single piece of functionality you might see in an MSF based PDB File.
* Every hash, every lookup, every struct since the introduction of MSF in Visual C++ 2.0 (1994) */
using var pdbFile = PDBFile.FromFile(pdbPath);
/* The native representation of a symbol is a SYMTYPE*. SymType is a zero cost abstraction over a pointer, but unlike
* a native SYMTYPE*, SymType uses insane debugger magic to show you all of the symbol's fields in the Locals window
* without you having to write any code */
foreach (SymType symType in pdbFile.EnumerateSymbols())
{
/* A SymType can be cast to a more specific symbol type (e.g. ProcSym32) based on the `SYM_ENUM_e` of its `rectyp`,
* or you can use extension methods that replicate the behavior of the various getters seen on `IDiaSymbol` */
if (symType.TryGetFramePointerPresent(out var framePointerPresent))
{
if (symType.rectyp == SYM_ENUM_E.S_GPROC32)
{
var pubSym32 = (PubSym32) symType;
/* Modern PDBs contain UTF-8 null terminated strings. But older PDBs use length prefixed "ST" strings.
* PESpy can use magic to figure out that the expected string format is, or you can just provide the PDBFile.
* ProcSym32's "name" property provides easy access to the symbol's name, but for high performance access
* you'll want to use the GetName method */
SymString name = pubSym32.GetName(pdbFile);
}
}
}/* In two lines of code, you can visualize the entire contents of a file: view all sections, the regions
* within those sections, how code and data intertwine, and the xrefs between everything. Explore
* the entire structure of a file right from within your debugger. Query offsets, RVAs and VAs to find
* exactly what is located at that address. Strings are automatically detected, and an interface is provided
* to facilitate tagging disassembled code */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");
/* Unless you say otherwise, GetView will automatically attempt to download symbols,
* so the first time you call this you may need to wait while symbols are downloaded.
* Secify a progress callback to receive notice of what is going on. See the wiki for
* more information on interfacing with views */
var view = peFile.GetView();For much more information on the usage of PESpy, please see the wiki