Skip to content

lordmilko/PESpy

Repository files navigation

PESpy

Build status NuGet Donate

PESpy is a C#/PowerShell library for reverse engineering, analyzing and visualizing Microsoft compiler generated file formats.

Given a file, PESpy aims to

  • Understand the meaning of every single byte within that file
  • Support parsing all known entities, no matter how obscure
  • Minimize abstractions, and mirror native type names wherever possible
  • Be highly performant while still being ergonomic. Allocations need be as low as possible!
  • Support all known symbol formats; COFF, OMF, CodeView, SYM, DBG, PDB files, DNDRB, NB00-NB10, RSDS - if symbols exist, PESpy will read and show them to you
  • Unironically provide information at your fingertips. The whole entire file hierarchy is exposed via properties; simply open a file, and then poke around in the Locals window
  • Support reading PE Files out of a remote debug target where the size of the PE File isn't known upfront
  • Provide tools for performing various file operations, including
    • Detecting file types
    • Locating symbol files (no more symsrv.dll!)
    • Resolving RPC Servers
    • Manipulating Symbol Keys
    • Parsing vftables
    • Undecorating symbol names
    • Reading and decompressing files contained in Windows installation media
  • Be highly NativeAOT friendly

PESpy is capable of interfacing with the following file types

Name Description
PE Portable Executable files, first seen in Windows NT 3.1
PDB "Old Style" (JG 1.0), MSF (JG 2.0, DS 7.0) and Portable PDB files
OBJ Principally we are interested in *.obj files, but strictly speaking anything that uses COFF (such as *.exp, *.iobj, etc) can be opened
DOS Simple DOS files with an IMAGE_DOS_HEADER and possible trailing CodeView data
NE 16-bit New Executable files, as seen in 16-bit Windows and to a lesser extent in Windows 9x
LE 32-bit Linear Executable files; specifically, the format used by VxD driver files
DBG COFF based files containing debug metadata that has been split out of the main executable file
LIB COFF based Archive libraries used by the linker, that potentially contain object files embedded within them
OMF `*.obj`` files emitted by older compiler toolchains from the DOS era that use the Object Module Format, a precursor to COFF
OMFLIB *.lib files emitted and consumed by older compiler toolchains from the DOS era that use OMF
OMFDBG An older style *.dbg file whose entire contents is the raw OMF style CodeView section
SYM *.sym files generated by mapsym.exe or by the compiler from parsing a *.map file

Installation

Install-Package PESpy

PESpy is available on both nuget.org and PowerShell Gallery. PESpy provides targets for both .NET 9.0 and .NET Standard, and is SourceLink compatible. In order to install PESpy from the PowerShell Gallery you must be running PowerShell 5.1+. PESpy is compatible with both Windows PowerShell and PowerShell Core.

Getting Started

PESpy's major selling point is, wherever it can, it tries to show you the true shape of the data that resides within a file. The following snippets show the various entry points to PESpy's key functionality. For extremely thorough documentation on all that PESpy has to offer, please see the wiki.

Enumerate All Imports

/* Retrieving locals in native code involves traversing the IMAGE_IMPORT_DESCRIPTOR entities, resolving various RVAs
 * traversing a list of IMAGE_THUNK_DATA entities followed, checking various bit fields, resolving
 * even more RVAs, before finally retrieving the strings you're after. That is what the data looks like. PESpy provides
 * many mechanisms to simplify complex lookups, but it will never hide the underlying shape of the data to "make it easy" */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");

ImageImportDescriptor[]? importTable = peFile.ImportTable;

if (importTable != null)
{
    foreach (var imageImportDescriptor in importTable)
    {
        /* Any field that is an RVA to another entity is modelled as a field of type RVA<T>. This type
         * provides access to the original RVA that was listed in the field, whether the RVA could actually
         * be resolved to a valid address, and the actual value that was read from that address */
        RVA<AnsiString> dllName = imageImportDescriptor.Name;

        if (!dllName.IsValid)
            continue;
            
        RVA<ImageThunkDataList> originalFirstThunk = imageImportDescriptor.OriginalFirstThunk;
        
        if (!originalFirstThunk.IsValid)
            continue;
        
        /* A custom collection type prevents us from having to allocate a large array to access all
         * of the thunks in the section. Note that the trailing "null" IMAGE_THUNK_DATA is also included
         * as the last item in this list */
        foreach (ImageThunkData entry in originalFirstThunk.Value)
        {
            //IMAGE_THUNK_DATA is defined as a union of four possible fields. PESpy tries to figure out
            //which logical type the thunk represents, and stores this in an added Kind field
            
            if (entry.Value == 0)
                continue; //This is the trailing "null" entry which marks the end of this import's thunks
            
            if (entry.Kind == ImageThunkData.DataKind.Name)
            {
                RVA<ImageImportByName> thunkName = entry.Name;
                
                if (!thunkName.IsValid)
                    Console.WriteLine($"{dllName}: Invalid Name (0x{thunkName.ListedOffset})");
                else
                    Console.WriteLine($"{dllName}: {thunkName}");
            }
        }
    }
}

Locate Symbol Files

PESpy's Locator class provides a manged implementation of the LOCATOR class found in mspdbcore, which also powers DIA

  • Locator can locate all kinds of symbols; PDBs (be they regular, Portable, Embedded or NGEN), *.dbg files (that may in turn point to *.pdb files) and even legacy *.sym files
  • It knows how to read your symbol path; if _NT_SYMBOL_PATH isn't set, it'll automatically use a symbol path that includes msdl.microsoft.com
  • It can download symbols from remote HTTP servers and cascade them down your symbol path
  • Provides various entry points for all kinds of different scenarios, with both synchronous and asynchronous modes available
  • Allows specifying a callback to receive progress notifications
  • Jumps through various hoops to be as low allocation as possible
  • Fully portable, with zero reliance on symsrv.dll
var pdbPath = Locator.LocatePDB("C:\\Windows\\system32\\ntdll.dll");

Locator is such a small part of PESpy's surface area, but I'm amazed how often I use this; this has surprisingly become one of PESpy's best features for me!

Enumerate All Symbols

/* PEFile provides various members (SymStoreKeys, GetSymStoreKey()) that provide identifiers for files
 * that you can lookup on a symbol server. If you're writing unit tests for a diagnostic application that analyzes
 * a certain DLL, you can potentialy "bookmark" that DLL by hardcoding its SymStoreKey, and then have your test re-download
 * that file as needed so your test always produces the same result! */
var key = new SymStoreKey("coreclr.pdb/75099299D3D948A68B594FC4439DFA521/coreclr.pdb");
var pdbPath = Locator.LocatePDB(key);

/* The PDBFile class provides access to every single piece of functionality you might see in an MSF based PDB File.
 * Every hash, every lookup, every struct since the introduction of MSF in Visual C++ 2.0 (1994) */
using var pdbFile = PDBFile.FromFile(pdbPath);

/* The native representation of a symbol is a SYMTYPE*. SymType is a zero cost abstraction over a pointer, but unlike
 * a native SYMTYPE*, SymType uses insane debugger magic to show you all of the symbol's fields in the Locals window
 * without you having to write any code */
foreach (SymType symType in pdbFile.EnumerateSymbols())
{
    /* A SymType can be cast to a more specific symbol type (e.g. ProcSym32) based on the `SYM_ENUM_e` of its `rectyp`,
     * or you can use extension methods that replicate the behavior of the various getters seen on `IDiaSymbol` */
    if (symType.TryGetFramePointerPresent(out var framePointerPresent))
    {
        if (symType.rectyp == SYM_ENUM_E.S_GPROC32)
        {
            var pubSym32 = (PubSym32) symType;
            
            /* Modern PDBs contain UTF-8 null terminated strings. But older PDBs use length prefixed "ST" strings.
             * PESpy can use magic to figure out that the expected string format is, or you can just provide the PDBFile.
             * ProcSym32's "name" property provides easy access to the symbol's name, but for high performance access
             * you'll want to use the GetName method */
            SymString name = pubSym32.GetName(pdbFile);
        }
    }
}

Visualize A File

/* In two lines of code, you can visualize the entire contents of a file: view all sections, the regions
 * within those sections, how code and data intertwine, and the xrefs between everything. Explore
 * the entire structure of a file right from within your debugger. Query offsets, RVAs and VAs to find
 * exactly what is located at that address. Strings are automatically detected, and an interface is provided
 * to facilitate tagging disassembled code */
using var peFile = PEFile.FromFile("C:\\Windows\\system32\\kernel32.dll");

/* Unless you say otherwise, GetView will automatically attempt to download symbols,
 * so the first time you call this you may need to wait while symbols are downloaded.
 * Secify a progress callback to receive notice of what is going on. See the wiki for
 * more information on interfacing with views */
var view = peFile.GetView();

For much more information on the usage of PESpy, please see the wiki

About

Reverse engineering toolkit for Microsoft compiler generated files

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages