diff --git a/csharp-il/.gitignore b/csharp-il/.gitignore new file mode 100644 index 000000000000..f81ecc73dffa --- /dev/null +++ b/csharp-il/.gitignore @@ -0,0 +1,13 @@ +obj/ +TestResults/ +*.manifest +*.pdb +*.suo +*.mdb +*.vsmdi +csharp.log +**/bin/Debug +**/bin/Release +*.tlog +.vs +*.user \ No newline at end of file diff --git a/csharp-il/README.md b/csharp-il/README.md new file mode 100644 index 000000000000..ee73ed224a72 --- /dev/null +++ b/csharp-il/README.md @@ -0,0 +1,217 @@ +# C# IL Extractor for CodeQL + +A CodeQL extractor that analyzes compiled .NET assemblies (DLL/EXE files) at the IL (Intermediate Language) level. + +## Overview + +This extractor enables CodeQL analysis of compiled C# code without requiring source code. It directly extracts IL instructions from .NET assemblies and creates a queryable database for control flow and call graph analysis. + +## Features + +- ✅ Extract from any .NET DLL/EXE file +- ✅ Capture complete IL instruction streams +- ✅ Track control flow (branches, loops) +- ✅ Build call graphs across assemblies +- ✅ Analyze exception handlers (try/catch/finally) +- ✅ Support for cross-assembly flow tracing + +## Quick Start + +### Prerequisites + +- .NET 8.0 SDK or later +- Mono.Cecil library (automatically restored via NuGet) + +### Build the Extractor + +```bash +cd csharp-il +dotnet build extractor/Semmle.Extraction.CSharp.IL +``` + +### Extract a DLL + +```bash +dotnet run --project extractor/Semmle.Extraction.CSharp.IL -- \ + path/to/your/assembly.dll \ + output.trap +``` + +### Try the Test Example + +```bash +# Extract the test assembly +dotnet run --project extractor/Semmle.Extraction.CSharp.IL -- \ + test-inputs/TestAssembly/bin/Debug/net8.0/TestAssembly.dll \ + test-inputs/TestAssembly.trap + +# View the results +head -100 test-inputs/TestAssembly.trap +``` + +## What Gets Extracted + +The extractor captures: + +1. **Assemblies**: Name, version, file location +2. **Types**: Classes, structs, interfaces, enums +3. **Methods**: Signatures, parameters, return types +4. **IL Instructions**: Opcodes, operands, offsets +5. **Control Flow**: Branch targets, fall-through paths +6. **Call Graph**: Method calls with qualified names +7. **Exception Handlers**: Try/catch/finally blocks + +## Database Schema + +The extractor creates a CodeQL database with the following structure: + +``` +assemblies(id, file, name, version) +types(id, full_name, namespace, name) +methods(id, name, signature, type_id) +il_instructions(id, opcode_num, opcode_name, offset, method) +il_branch_target(instruction, target_offset) +il_call_target_unresolved(instruction, target_method_name) +... +``` + +See `documentation/dbscheme-guide.md` for complete schema documentation. + +## Use Cases + +### Security Analysis +- Trace data flow through compiled libraries +- Find paths to sensitive API calls +- Analyze third-party dependencies + +### Code Understanding +- Build call graphs from compiled code +- Understand control flow in obfuscated assemblies +- Analyze library usage patterns + +### Cross-Assembly Analysis +- Trace execution across multiple DLLs +- Find inter-assembly dependencies +- Analyze full application stacks + +## Project Status + +**Current Phase**: Schema Complete ✅ + +- ✅ Phase 0: POC with Mono.Cecil +- ✅ Phase 1: TRAP File Extractor +- ✅ Phase 2: Database Schema +- ⬜ Phase 3: QL Library (In Progress) +- ⬜ Phase 4: Call Graph Predicates +- ⬜ Phase 5: Basic Blocks +- ⬜ Phase 6: End-to-End Testing + +See `wipStatus/CURRENT-STATUS.md` for detailed progress. + +## Directory Structure + +``` +csharp-il/ +├── extractor/ # IL extraction tool +│ └── Semmle.Extraction.CSharp.IL/ +├── ql/ # QL library (coming soon) +│ └── lib/ +│ └── semmlecode.csharp.il.dbscheme +├── test-inputs/ # Test assemblies +│ └── TestAssembly/ +├── documentation/ # Documentation +│ └── dbscheme-guide.md +└── wipStatus/ # Development notes + ├── CURRENT-STATUS.md + ├── PLAN.md + └── ... +``` + +## Example: Extracting TestAssembly + +The `test-inputs/TestAssembly` project contains example C# code with: +- If/else statements +- Method calls +- Loops +- Arithmetic operations + +After extraction, you can see the IL representation: + +```trap +types(3, "TestNamespace.SimpleClass", "TestNamespace", "SimpleClass") +methods(4, "SimpleMethod", "Void SimpleMethod()", 3) +il_instructions(13, 43, "brfalse.s", 9, 4) +il_branch_target(13, 26) +il_instructions(16, 39, "call", 17, 4) +il_call_target_unresolved(16, "System.Console.WriteLine") +``` + +## Design Philosophy + +### Simple Extraction, Smart Queries + +The extractor follows CodeQL best practices: + +- **Extractor**: Simple and fast - just write IL facts to TRAP files +- **QL Library**: Smart analysis - compute CFG, reachability, etc. at query time + +This architecture keeps extraction fast while enabling sophisticated analysis. + +### Why IL Instead of Decompilation? + +1. **Accurate**: IL is the ground truth, no decompiler errors +2. **Fast**: No expensive decompilation step +3. **Reliable**: Works on all .NET code, even obfuscated +4. **Complete**: Exact control flow and calling conventions + +## Documentation + +- `documentation/dbscheme-guide.md` - Complete schema reference +- `wipStatus/PLAN.md` - Project plan and approach +- `wipStatus/IMPLEMENTATION.md` - Implementation roadmap +- `wipStatus/CURRENT-STATUS.md` - Current progress + +## Contributing + +This is an experimental extractor under active development. Contributions welcome! + +Current focus areas: +- QL library implementation +- Basic block computation +- Call graph predicates +- Test query development + +## Technical Details + +### Technologies Used + +- **Language**: C# (.NET 8.0) +- **IL Parser**: Mono.Cecil +- **Target**: .NET Standard 2.0+ assemblies +- **Output Format**: CodeQL TRAP files + +### Limitations + +Currently extracts compiled IL only: +- ✅ Class and method names +- ✅ Control flow (branches, calls) +- ✅ Method signatures +- ❌ Local variable names (without PDB files) +- ❌ Source locations (without PDB files) + +These are sufficient for control flow and call graph analysis! + +## License + +Part of the CodeQL project. See LICENSE in repository root. + +## Contact + +For questions about this extractor, see the wipStatus documents or create an issue. + +--- + +**Quick Links**: +- [Current Status](wipStatus/CURRENT-STATUS.md) +- [Schema Guide](documentation/dbscheme-guide.md) +- [Implementation Plan](wipStatus/IMPLEMENTATION.md) diff --git a/csharp-il/codeql-extractor.yml b/csharp-il/codeql-extractor.yml new file mode 100644 index 000000000000..5d367c132203 --- /dev/null +++ b/csharp-il/codeql-extractor.yml @@ -0,0 +1,17 @@ +name: "csharpil" +aliases: + - "cil" + - "csharp-il" +display_name: "C# IL" +version: 0.0.1 +column_kind: "utf16" +build_modes: + - autobuild + - manual + - none +file_types: + - name: cil + display_name: C# IL sources + extensions: + - .dll + - .exe \ No newline at end of file diff --git a/csharp-il/downgrades/initial/semmlecode.csharp.il.dbscheme b/csharp-il/downgrades/initial/semmlecode.csharp.il.dbscheme new file mode 100644 index 000000000000..c38bfad4ca22 --- /dev/null +++ b/csharp-il/downgrades/initial/semmlecode.csharp.il.dbscheme @@ -0,0 +1,207 @@ +/* Database schema for C# IL extraction + * + * This schema defines the database structure for extracting and analyzing + * compiled C# assemblies at the IL (Intermediate Language) level. + * + * The extractor reads .NET DLL files and extracts: + * - Assembly and type metadata + * - Method signatures + * - IL instructions with opcodes and operands + * - Control flow information (branches) + * - Call graph information (method calls) + * - Exception handlers + */ + +/** EXTERNAL DATA **/ + +/** + * External data, loaded from CSV files during snapshot creation. + * This allows importing additional data into CodeQL databases. + */ +externalData( + int id: @externalDataElement, + string path: string ref, + int column: int ref, + string value: string ref +); + +/** FILES AND LOCATIONS **/ + +/** + * Files, including DLL/EXE assemblies and any referenced source files. + */ +files( + unique int id: @file, + string name: string ref +); + +/** + * Folders containing files. + */ +folders( + unique int id: @folder, + string name: string ref +); + +/** + * Container hierarchy for files and folders. + */ +@container = @folder | @file; + +containerparent( + int parent: @container ref, + unique int child: @container ref +); + +/** ASSEMBLIES AND TYPES **/ + +/** + * Compiled .NET assemblies. + * Each assembly represents a DLL file that has been extracted. + * The file field references the DLL/EXE file in the files table. + */ +assemblies( + unique int id: @assembly, + int file: @file ref, + string name: string ref, + string version: string ref +); + +/** + * Types defined in assemblies. + * Includes classes, structs, interfaces, enums, and delegates. + */ +types( + unique int id: @type, + string full_name: string ref, + string namespace: string ref, + string name: string ref +); + +/** METHODS **/ + +/** + * Methods defined in types. + * Includes instance methods, static methods, constructors, and property accessors. + */ +methods( + unique int id: @method, + string name: string ref, + string signature: string ref, + int type_id: @type ref +); + +/** IL INSTRUCTIONS **/ + +/** + * IL (Intermediate Language) instructions within method bodies. + * Each instruction represents a single IL opcode with its operand. + * + * The opcode_num is the numeric value from System.Reflection.Emit.OpCodes. + * The opcode_name is the mnemonic (e.g., "ldloc", "call", "br.s"). + * The offset is the byte offset of the instruction within the method body. + */ +il_instructions( + unique int id: @il_instruction, + int opcode_num: int ref, + string opcode_name: string ref, + int offset: int ref, + int method: @method ref +); + +/** + * Parent relationship between instructions and methods. + * The index represents the sequential position of the instruction (0-based). + * This allows instructions to be ordered even when offsets are non-sequential. + */ +#keyset[instruction, index] +il_instruction_parent( + int instruction: @il_instruction ref, + int index: int ref, + int parent: @method ref +); + +/** + * Branch target for branch instructions. + * The target_offset is the byte offset of the instruction that is the target of the branch. + * Used for control flow analysis. + */ +il_branch_target( + int instruction: @il_instruction ref, + int target_offset: int ref +); + +/** + * Unresolved method call targets. + * The target_method_name is the fully qualified name of the called method. + * These are stored as strings because they may reference methods in other assemblies + * that haven't been extracted yet. + */ +il_call_target_unresolved( + int instruction: @il_instruction ref, + string target_method_name: string ref +); + +/** + * String operands for IL instructions. + * Used for ldstr (load string) instructions. + */ +il_operand_string( + int instruction: @il_instruction ref, + string value: string ref +); + +/** + * Integer operands for IL instructions. + * Used for ldc.i4 (load constant int32) and similar instructions. + */ +il_operand_int( + int instruction: @il_instruction ref, + int value: int ref +); + +/** + * Long integer operands for IL instructions. + * Used for ldc.i8 (load constant int64) and similar instructions. + */ +il_operand_long( + int instruction: @il_instruction ref, + int value: int ref +); + +/** EXCEPTION HANDLERS **/ + +/** + * Exception handlers (try/catch/finally blocks) in methods. + * Each handler represents a try block with its associated catch/finally/fault handler. + * + * The handler_type indicates the type of handler: + * - "Catch": catch block for specific exception types + * - "Finally": finally block + * - "Fault": fault block (like finally but only runs on exception) + * - "Filter": exception filter block + * + * Offsets indicate the start and end positions of the try and handler blocks. + * An offset of -1 indicates the position is not applicable or not set. + */ +il_exception_handler( + unique int id: @il_exception_handler, + int method: @method ref, + string handler_type: string ref, + int try_start: int ref, + int try_end: int ref, + int handler_start: int ref, + int handler_end: int ref +); + +/** + * Union type representing all elements in the database. + */ +@element = @assembly | @type | @method | @il_instruction | @il_exception_handler | @externalDataElement; + +/** + * Union type representing elements that can be located in source code. + * For IL extraction, most elements are located in compiled assemblies, + * but this provides a hook for future source location mapping. + */ +@locatable = @type | @method; diff --git a/csharp-il/downgrades/qlpack.yml b/csharp-il/downgrades/qlpack.yml new file mode 100644 index 000000000000..2ffd6b94f29d --- /dev/null +++ b/csharp-il/downgrades/qlpack.yml @@ -0,0 +1,5 @@ +name: codeql/csharp-downgrades +groups: csharp +downgrades: . +library: true +warnOnImplicitThis: true diff --git a/csharp-il/extractor/Semmle.Extraction.CSharp.IL/ILExtractor.cs b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/ILExtractor.cs new file mode 100644 index 000000000000..5d29c981e646 --- /dev/null +++ b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/ILExtractor.cs @@ -0,0 +1,156 @@ +using Mono.Cecil; +using Mono.Cecil.Cil; +using Semmle.Extraction.CSharp.IL.Trap; + +namespace Semmle.Extraction.CSharp.IL; + +/// +/// Main extractor - reads DLL and writes TRAP files. +/// +public class ILExtractor +{ + private readonly TrapWriter trap; + private readonly Dictionary methodIds = new(); + private readonly Dictionary typeIds = new(); + + public ILExtractor(TrapWriter trapWriter) + { + trap = trapWriter; + } + + public void Extract(string dllPath) + { + Console.WriteLine($"Extracting {dllPath}..."); + + var assembly = AssemblyDefinition.ReadAssembly(dllPath); + + // Write file info + var fileId = trap.GetId(); + trap.WriteTuple("files", fileId, dllPath); + + // Write assembly info + var assemblyId = trap.GetId(); + trap.WriteTuple("assemblies", assemblyId, fileId, assembly.Name.Name, assembly.Name.Version.ToString()); + + foreach (var module in assembly.Modules) + { + foreach (var type in module.Types) + { + // Skip compiler-generated types for now + if (type.Name.Contains("<") || type.Name.StartsWith("<")) + continue; + + ExtractType(type); + } + } + + Console.WriteLine($"Extraction complete!"); + } + + private void ExtractType(TypeDefinition type) + { + var typeId = trap.GetId(); + typeIds[type.FullName] = typeId; + + // Write type info + trap.WriteTuple("types", typeId, type.FullName, type.Namespace, type.Name); + + foreach (var method in type.Methods) + { + // Skip some special methods + if (method.IsConstructor && method.IsStatic) + continue; + + ExtractMethod(method, typeId); + } + } + + private void ExtractMethod(MethodDefinition method, int typeId) + { + var methodId = trap.GetId(); + var methodKey = $"{method.DeclaringType.FullName}.{method.Name}"; + methodIds[methodKey] = methodId; + + // Write method info + var signature = GetMethodSignature(method); + trap.WriteTuple("methods", methodId, method.Name, signature, typeId); + + if (method.HasBody) + { + ExtractMethodBody(method, methodId); + } + } + + private void ExtractMethodBody(MethodDefinition method, int methodId) + { + var body = method.Body; + + // Write each IL instruction + var index = 0; + foreach (var instruction in body.Instructions) + { + var instrId = trap.GetId(); + + // Basic instruction info + trap.WriteTuple("il_instructions", + instrId, + (int)instruction.OpCode.Code, + instruction.OpCode.Name, + instruction.Offset, + methodId); + + // Parent relationship + trap.WriteTuple("il_instruction_parent", instrId, index, methodId); + + // Handle operand based on type + if (instruction.Operand is Instruction targetInstr) + { + // Branch target + trap.WriteTuple("il_branch_target", instrId, targetInstr.Offset); + } + else if (instruction.Operand is MethodReference methodRef) + { + // Method call - we'll resolve this in a second pass + var targetMethodName = $"{methodRef.DeclaringType.FullName}.{methodRef.Name}"; + trap.WriteTuple("il_call_target_unresolved", instrId, targetMethodName); + } + else if (instruction.Operand is string str) + { + trap.WriteTuple("il_operand_string", instrId, str); + } + else if (instruction.Operand is int i) + { + trap.WriteTuple("il_operand_int", instrId, i); + } + else if (instruction.Operand is long l) + { + trap.WriteTuple("il_operand_long", instrId, l); + } + + index++; + } + + // Exception handlers + if (body.HasExceptionHandlers) + { + foreach (var handler in body.ExceptionHandlers) + { + var handlerId = trap.GetId(); + trap.WriteTuple("il_exception_handler", + handlerId, + methodId, + handler.HandlerType.ToString(), + handler.TryStart.Offset, + handler.TryEnd?.Offset ?? -1, + handler.HandlerStart?.Offset ?? -1, + handler.HandlerEnd?.Offset ?? -1); + } + } + } + + private string GetMethodSignature(MethodDefinition method) + { + var parameters = string.Join(", ", method.Parameters.Select(p => $"{p.ParameterType.Name} {p.Name}")); + return $"{method.ReturnType.Name} {method.Name}({parameters})"; + } +} diff --git a/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Program.cs b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Program.cs new file mode 100644 index 000000000000..77f582db8c14 --- /dev/null +++ b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Program.cs @@ -0,0 +1,49 @@ +using Semmle.Extraction.CSharp.IL.Trap; + +namespace Semmle.Extraction.CSharp.IL; + +class Program +{ + static void Main(string[] args) + { + if (args.Length == 0) + { + Console.WriteLine("Usage: Semmle.Extraction.CSharp.IL [output.trap]"); + return; + } + + var dllPath = args[0]; + + if (!File.Exists(dllPath)) + { + Console.WriteLine($"Error: File not found: {dllPath}"); + return; + } + + var outputPath = args.Length > 1 + ? args[1] + : Path.ChangeExtension(dllPath, ".trap"); + + Console.WriteLine($"Extracting: {dllPath}"); + Console.WriteLine($"Output: {outputPath}"); + Console.WriteLine(new string('=', 80)); + Console.WriteLine(); + + try + { + using var trapWriter = new TrapWriter(outputPath); + var extractor = new ILExtractor(trapWriter); + + extractor.Extract(dllPath); + + Console.WriteLine(); + Console.WriteLine(new string('=', 80)); + Console.WriteLine($"TRAP file written to: {outputPath}"); + } + catch (Exception ex) + { + Console.WriteLine($"Error: {ex.Message}"); + Console.WriteLine(ex.StackTrace); + } + } +} diff --git a/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Semmle.Extraction.CSharp.IL.csproj b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Semmle.Extraction.CSharp.IL.csproj new file mode 100644 index 000000000000..85a9a3629b24 --- /dev/null +++ b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Semmle.Extraction.CSharp.IL.csproj @@ -0,0 +1,14 @@ + + + + Exe + net8.0 + enable + enable + + + + + + + diff --git a/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Trap/TrapWriter.cs b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Trap/TrapWriter.cs new file mode 100644 index 000000000000..379b77cc2aef --- /dev/null +++ b/csharp-il/extractor/Semmle.Extraction.CSharp.IL/Trap/TrapWriter.cs @@ -0,0 +1,88 @@ +using System.IO; + +namespace Semmle.Extraction.CSharp.IL.Trap; + +/// +/// Simple TRAP file writer - just writes tuples as text lines. +/// We'll create the schema later to match what we write here. +/// +public class TrapWriter : IDisposable +{ + private readonly TextWriter writer; + private readonly string trapFilePath; + private int nextId = 1; + + public TrapWriter(string outputPath) + { + trapFilePath = outputPath; + writer = new StreamWriter(trapFilePath); + } + + /// + /// Get a unique ID for an entity. + /// + public int GetId() + { + return nextId++; + } + + /// + /// Write a tuple to the TRAP file. + /// Format: predicate(arg1, arg2, ...) + /// + public void WriteTuple(string predicate, params object[] args) + { + writer.Write(predicate); + writer.Write('('); + + for (int i = 0; i < args.Length; i++) + { + if (i > 0) + writer.Write(", "); + + WriteValue(args[i]); + } + + writer.WriteLine(')'); + } + + private void WriteValue(object value) + { + switch (value) + { + case int i: + writer.Write(i); + break; + case long l: + writer.Write(l); + break; + case string s: + // Escape string and wrap in quotes + writer.Write('"'); + writer.Write(EscapeString(s)); + writer.Write('"'); + break; + case null: + writer.Write("null"); + break; + default: + writer.Write(value.ToString()); + break; + } + } + + private string EscapeString(string s) + { + // Basic escaping - may need to be more sophisticated + return s.Replace("\\", "\\\\") + .Replace("\"", "\\\"") + .Replace("\n", "\\n") + .Replace("\r", "\\r") + .Replace("\t", "\\t"); + } + + public void Dispose() + { + writer.Dispose(); + } +} diff --git a/csharp-il/ql/lib/qlpack.yml b/csharp-il/ql/lib/qlpack.yml new file mode 100644 index 000000000000..dab48ed4917b --- /dev/null +++ b/csharp-il/ql/lib/qlpack.yml @@ -0,0 +1,5 @@ +name: codeql/csharp-il-all +version: 0.0.1 +library: true +dbscheme: semmlecode.csharp.il.dbscheme +extractor: csharpil diff --git a/csharp-il/ql/lib/semmlecode.csharp.il.dbscheme b/csharp-il/ql/lib/semmlecode.csharp.il.dbscheme new file mode 100644 index 000000000000..8062f5158ed4 --- /dev/null +++ b/csharp-il/ql/lib/semmlecode.csharp.il.dbscheme @@ -0,0 +1,214 @@ +/* Database schema for C# IL extraction + * + * This schema defines the database structure for extracting and analyzing + * compiled C# assemblies at the IL (Intermediate Language) level. + * + * The extractor reads .NET DLL files and extracts: + * - Assembly and type metadata + * - Method signatures + * - IL instructions with opcodes and operands + * - Control flow information (branches) + * - Call graph information (method calls) + * - Exception handlers + */ + +/** SOURCE LOCATION PREFIX **/ + +/** + * The source location prefix for the snapshot. + */ +sourceLocationPrefix(string prefix : string ref); + +/** EXTERNAL DATA **/ + +/** + * External data, loaded from CSV files during snapshot creation. + * This allows importing additional data into CodeQL databases. + */ +externalData( + int id: @externalDataElement, + string path: string ref, + int column: int ref, + string value: string ref +); + +/** FILES AND LOCATIONS **/ + +/** + * Files, including DLL/EXE assemblies and any referenced source files. + */ +files( + unique int id: @file, + string name: string ref +); + +/** + * Folders containing files. + */ +folders( + unique int id: @folder, + string name: string ref +); + +/** + * Container hierarchy for files and folders. + */ +@container = @folder | @file; + +containerparent( + int parent: @container ref, + unique int child: @container ref +); + +/** ASSEMBLIES AND TYPES **/ + +/** + * Compiled .NET assemblies. + * Each assembly represents a DLL file that has been extracted. + * The file field references the DLL/EXE file in the files table. + */ +assemblies( + unique int id: @assembly, + int file: @file ref, + string name: string ref, + string version: string ref +); + +/** + * Types defined in assemblies. + * Includes classes, structs, interfaces, enums, and delegates. + */ +types( + unique int id: @type, + string full_name: string ref, + string namespace: string ref, + string name: string ref +); + +/** METHODS **/ + +/** + * Methods defined in types. + * Includes instance methods, static methods, constructors, and property accessors. + */ +methods( + unique int id: @method, + string name: string ref, + string signature: string ref, + int type_id: @type ref +); + +/** IL INSTRUCTIONS **/ + +/** + * IL (Intermediate Language) instructions within method bodies. + * Each instruction represents a single IL opcode with its operand. + * + * The opcode_num is the numeric value from System.Reflection.Emit.OpCodes. + * The opcode_name is the mnemonic (e.g., "ldloc", "call", "br.s"). + * The offset is the byte offset of the instruction within the method body. + */ +il_instructions( + unique int id: @il_instruction, + int opcode_num: int ref, + string opcode_name: string ref, + int offset: int ref, + int method: @method ref +); + +/** + * Parent relationship between instructions and methods. + * The index represents the sequential position of the instruction (0-based). + * This allows instructions to be ordered even when offsets are non-sequential. + */ +#keyset[instruction, index] +il_instruction_parent( + int instruction: @il_instruction ref, + int index: int ref, + int parent: @method ref +); + +/** + * Branch target for branch instructions. + * The target_offset is the byte offset of the instruction that is the target of the branch. + * Used for control flow analysis. + */ +il_branch_target( + int instruction: @il_instruction ref, + int target_offset: int ref +); + +/** + * Unresolved method call targets. + * The target_method_name is the fully qualified name of the called method. + * These are stored as strings because they may reference methods in other assemblies + * that haven't been extracted yet. + */ +il_call_target_unresolved( + int instruction: @il_instruction ref, + string target_method_name: string ref +); + +/** + * String operands for IL instructions. + * Used for ldstr (load string) instructions. + */ +il_operand_string( + int instruction: @il_instruction ref, + string value: string ref +); + +/** + * Integer operands for IL instructions. + * Used for ldc.i4 (load constant int32) and similar instructions. + */ +il_operand_int( + int instruction: @il_instruction ref, + int value: int ref +); + +/** + * Long integer operands for IL instructions. + * Used for ldc.i8 (load constant int64) and similar instructions. + */ +il_operand_long( + int instruction: @il_instruction ref, + int value: int ref +); + +/** EXCEPTION HANDLERS **/ + +/** + * Exception handlers (try/catch/finally blocks) in methods. + * Each handler represents a try block with its associated catch/finally/fault handler. + * + * The handler_type indicates the type of handler: + * - "Catch": catch block for specific exception types + * - "Finally": finally block + * - "Fault": fault block (like finally but only runs on exception) + * - "Filter": exception filter block + * + * Offsets indicate the start and end positions of the try and handler blocks. + * An offset of -1 indicates the position is not applicable or not set. + */ +il_exception_handler( + unique int id: @il_exception_handler, + int method: @method ref, + string handler_type: string ref, + int try_start: int ref, + int try_end: int ref, + int handler_start: int ref, + int handler_end: int ref +); + +/** + * Union type representing all elements in the database. + */ +@element = @assembly | @type | @method | @il_instruction | @il_exception_handler | @externalDataElement; + +/** + * Union type representing elements that can be located in source code. + * For IL extraction, most elements are located in compiled assemblies, + * but this provides a hook for future source location mapping. + */ +@locatable = @type | @method; diff --git a/csharp-il/semmlecode.csharp.il.dbscheme b/csharp-il/semmlecode.csharp.il.dbscheme new file mode 120000 index 000000000000..9f5fe8c5f019 --- /dev/null +++ b/csharp-il/semmlecode.csharp.il.dbscheme @@ -0,0 +1 @@ +ql/lib/semmlecode.csharp.il.dbscheme \ No newline at end of file diff --git a/csharp-il/test-inputs/TestAssembly/SimpleClass.cs b/csharp-il/test-inputs/TestAssembly/SimpleClass.cs new file mode 100644 index 000000000000..d2073a3759e4 --- /dev/null +++ b/csharp-il/test-inputs/TestAssembly/SimpleClass.cs @@ -0,0 +1,36 @@ +namespace TestNamespace +{ + public class SimpleClass + { + public void SimpleMethod() + { + var x = 5; + if (x > 0) + { + Console.WriteLine("positive"); + } + else + { + Console.WriteLine("negative"); + } + } + + public void CallsOtherMethod() + { + SimpleMethod(); + } + + public int Add(int a, int b) + { + return a + b; + } + + public void LoopExample() + { + for (int i = 0; i < 10; i++) + { + Console.WriteLine(i); + } + } + } +} diff --git a/csharp-il/test-inputs/TestAssembly/TestAssembly.csproj b/csharp-il/test-inputs/TestAssembly/TestAssembly.csproj new file mode 100644 index 000000000000..f3d056535e94 --- /dev/null +++ b/csharp-il/test-inputs/TestAssembly/TestAssembly.csproj @@ -0,0 +1,10 @@ + + + + net8.0 + enable + enable + Library + + + diff --git a/csharp-il/test-queries/flow-summary.ql b/csharp-il/test-queries/flow-summary.ql new file mode 100644 index 000000000000..d8b65357a71d --- /dev/null +++ b/csharp-il/test-queries/flow-summary.ql @@ -0,0 +1,13 @@ +/** + * @name Complete control flow analysis + * @description Shows instructions, branches, and calls for all methods + * @kind table + * @id csharp-il/complete-flow + */ + +from @method method, string method_name, string signature +where methods(method, method_name, signature, _) +select method_name, + count(@il_instruction insn | il_instruction_parent(insn, _, method)), + count(@il_instruction br | il_instructions(br, _, _, _, method) and il_branch_target(br, _)), + count(@il_instruction call | il_instructions(call, _, "call", _, method)) diff --git a/csharp-il/test-queries/list-calls.ql b/csharp-il/test-queries/list-calls.ql new file mode 100644 index 000000000000..a7af65d8a887 --- /dev/null +++ b/csharp-il/test-queries/list-calls.ql @@ -0,0 +1,14 @@ +/** + * @name List all method calls + * @description Lists all call instructions and their targets + * @kind table + * @id csharp-il/test-calls + */ + +from @il_instruction call_insn, string opcode, @method caller, string target_method, string caller_name +where + il_instructions(call_insn, _, opcode, _, caller) and + opcode = "call" and + il_call_target_unresolved(call_insn, target_method) and + methods(caller, caller_name, _, _) +select caller_name, target_method diff --git a/csharp-il/test-queries/list-methods.ql b/csharp-il/test-queries/list-methods.ql new file mode 100644 index 000000000000..5b4be6fca8f7 --- /dev/null +++ b/csharp-il/test-queries/list-methods.ql @@ -0,0 +1,10 @@ +/** + * @name List all methods + * @description Simple test query to list all methods in the database + * @kind table + * @id csharp-il/test-methods + */ + +from @method id, string name, string signature, @type type_id +where methods(id, name, signature, type_id) +select name, signature diff --git a/csharp-il/test-queries/trace-branches.ql b/csharp-il/test-queries/trace-branches.ql new file mode 100644 index 000000000000..a8b1147b030c --- /dev/null +++ b/csharp-il/test-queries/trace-branches.ql @@ -0,0 +1,13 @@ +/** + * @name Control flow with branches + * @description Shows all branch instructions and their targets + * @kind table + * @id csharp-il/test-branches + */ + +from @il_instruction branch_insn, string opcode, int offset, @method method, string method_name, int target_offset +where + il_instructions(branch_insn, _, opcode, offset, method) and + il_branch_target(branch_insn, target_offset) and + methods(method, method_name, _, _) +select method_name, opcode, offset, target_offset diff --git a/csharp-il/test-queries/trace-simple-method.ql b/csharp-il/test-queries/trace-simple-method.ql new file mode 100644 index 000000000000..4d5762ce8ade --- /dev/null +++ b/csharp-il/test-queries/trace-simple-method.ql @@ -0,0 +1,15 @@ +/** + * @name Control flow trace for SimpleMethod + * @description Shows the complete IL instruction sequence with control flow + * @kind table + * @id csharp-il/trace-simplemethod + */ + +from @il_instruction insn, int idx, int opcode_num, string opcode, int offset, @method method, string method_name +where + methods(method, method_name, _, _) and + method_name = "SimpleMethod" and + il_instruction_parent(insn, idx, method) and + il_instructions(insn, opcode_num, opcode, offset, method) +select idx, offset, opcode, opcode_num +order by idx diff --git a/csharp-il/tools/autobuild.cmd b/csharp-il/tools/autobuild.cmd new file mode 100644 index 000000000000..a76b44354fd4 --- /dev/null +++ b/csharp-il/tools/autobuild.cmd @@ -0,0 +1,4 @@ +@echo off +REM For C# IL, autobuild and buildless extraction are the same - just extract the DLLs +call "%~dp0index.cmd" +exit /b %ERRORLEVEL% diff --git a/csharp-il/tools/autobuild.sh b/csharp-il/tools/autobuild.sh new file mode 100755 index 000000000000..3e9f4bd1b55a --- /dev/null +++ b/csharp-il/tools/autobuild.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +set -eu + +if [[ -z "${CODEQL_EXTRACTOR_CSHARPIL_ROOT}" ]]; then + export CODEQL_EXTRACTOR_CSHARPIL_ROOT="$(dirname "$(dirname "${BASH_SOURCE[0]}")")" +fi + +# For C# IL, autobuild and buildless extraction are the same - just extract the DLLs +exec "${CODEQL_EXTRACTOR_CSHARPIL_ROOT}/tools/index.sh" diff --git a/csharp-il/tools/index.cmd b/csharp-il/tools/index.cmd new file mode 100644 index 000000000000..3c4097e49de7 --- /dev/null +++ b/csharp-il/tools/index.cmd @@ -0,0 +1,39 @@ +@echo off +setlocal enabledelayedexpansion + +if "%CODEQL_EXTRACTOR_CSHARPIL_ROOT%"=="" ( + for %%i in ("%~dp0..") do set "CODEQL_EXTRACTOR_CSHARPIL_ROOT=%%~fi" +) + +set "TRAP_DIR=%CODEQL_EXTRACTOR_CSHARPIL_TRAP_DIR%" + +echo C# IL Extractor: Starting extraction +echo Source root: %CD% +echo TRAP directory: %TRAP_DIR% + +set "EXTRACTOR_PATH=%CODEQL_EXTRACTOR_CSHARPIL_ROOT%\extractor\Semmle.Extraction.CSharp.IL\bin\Debug\net8.0\Semmle.Extraction.CSharp.IL.exe" + +if not exist "%EXTRACTOR_PATH%" ( + echo ERROR: Extractor not found at %EXTRACTOR_PATH% + echo Please build the extractor first with: dotnet build extractor\Semmle.Extraction.CSharp.IL + exit /b 1 +) + +set FILE_COUNT=0 + +for /r %%f in (*.dll *.exe) do ( + echo Extracting: %%f + + set "ASSEMBLY_PATH=%%f" + set "TRAP_NAME=!ASSEMBLY_PATH:\=_!" + set "TRAP_NAME=!TRAP_NAME:/=_!" + set "TRAP_NAME=!TRAP_NAME::=_!" + set "TRAP_FILE=%TRAP_DIR%\!TRAP_NAME!.trap" + + "%EXTRACTOR_PATH%" "%%f" "!TRAP_FILE!" || echo Warning: Failed to extract %%f + + set /a FILE_COUNT+=1 +) + +echo C# IL Extractor: Completed extraction of %FILE_COUNT% assemblies +exit /b 0 diff --git a/csharp-il/tools/index.sh b/csharp-il/tools/index.sh new file mode 100755 index 000000000000..dbe337d334d7 --- /dev/null +++ b/csharp-il/tools/index.sh @@ -0,0 +1,53 @@ +#!/bin/bash + +set -eu + +if [[ -z "${CODEQL_EXTRACTOR_CSHARPIL_ROOT:-}" ]]; then + export CODEQL_EXTRACTOR_CSHARPIL_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +fi + +# Get the trap directory from CodeQL environment +TRAP_DIR="${CODEQL_EXTRACTOR_CSHARPIL_TRAP_DIR}" +SRC_ARCHIVE="${CODEQL_EXTRACTOR_CSHARPIL_SOURCE_ARCHIVE_DIR}" + +echo "C# IL Extractor: Starting extraction" +echo "Source root: $(pwd)" +echo "TRAP directory: ${TRAP_DIR}" + +# Ensure TRAP directory exists +mkdir -p "${TRAP_DIR}" + +# Find all DLL and EXE files in the source root +EXTRACTOR_PATH="${CODEQL_EXTRACTOR_CSHARPIL_ROOT}/extractor/Semmle.Extraction.CSharp.IL/bin/Debug/net8.0/Semmle.Extraction.CSharp.IL" + +if [[ ! -f "${EXTRACTOR_PATH}" ]]; then + echo "ERROR: Extractor not found at ${EXTRACTOR_PATH}" + echo "Please build the extractor first with: dotnet build extractor/Semmle.Extraction.CSharp.IL" + exit 1 +fi + +# Extract all DLL and EXE files +FILE_COUNT=0 +find . -type f \( -name "*.dll" -o -name "*.exe" \) | while read -r assembly; do + echo "Extracting: ${assembly}" + + # Normalize the assembly path (remove leading ./) + normalized_path="${assembly#./}" + + # Create a unique trap file name based on the assembly path + TRAP_FILE="${TRAP_DIR}/$(echo "${assembly}" | sed 's/[^a-zA-Z0-9]/_/g').trap" + + # Run the extractor + "${EXTRACTOR_PATH}" "${assembly}" "${TRAP_FILE}" || echo "Warning: Failed to extract ${assembly}" + + # Copy the assembly to the source archive + ARCHIVE_PATH="${SRC_ARCHIVE}/${normalized_path}" + ARCHIVE_DIR="$(dirname "${ARCHIVE_PATH}")" + mkdir -p "${ARCHIVE_DIR}" + cp "${assembly}" "${ARCHIVE_PATH}" + echo "Archived: ${assembly} -> ${ARCHIVE_PATH}" + + FILE_COUNT=$((FILE_COUNT + 1)) +done + +echo "C# IL Extractor: Completed extraction of ${FILE_COUNT} assemblies"