codeql-lab: Centralized Git Repository for CodeQL Development

Overview

codeql-lab is a consolidated Git repository that collects all relevant CodeQL components, resources, and tooling into a single version-controlled location.

Purpose

The goal of this repository is to provide an integrated development environment (“lab”) for CodeQL research, experimentation, and custom query development. It simplifies setup by maintaining all required submodules, configuration files, and datasets in one place.

Repository Location

The primary repository is hosted at: https://github.com/hohn/codeql-lab

Intended Use Cases

Local experimentation with CodeQL queries and libraries.
End-to-end testing of custom model data and query logic. This includes writing and validating custom data flow models, adjusting model coverage, and confirming that query results behave as expected across controlled datasets. The lab setup supports rapid iteration on QL logic, helping detect unintended changes and enabling reproducible evaluations of taint tracking, control flow, or API usage patterns.
Structured collaboration and controlled updates across all CodeQL-related artifacts.
Simplified onboarding and reproducible setup for new contributors or analysis environments.

Prerequisites

Working with this repository assumes prior experience with:

Git, Bash, and standard Unix command-line tools. These are used throughout and are required for setup and day-to-day tasks. Tools such as ripgrep, GNU Bash, and grep/regex workflows are assumed.
At least one supported programming language, such as C, C++, Java, Python, Go, or Ruby. A solid understanding of the target language is necessary to interpret analysis results and write effective queries. See general background on programming languages if needed.
Basic familiarity with program structure concepts, including abstract syntax trees (ASTs), control-flow graphs (CFGs), and data-flow graphs (DFGs). These are core to how CodeQL models code behavior.
Optional but helpful: familiarity with structural or functional programming languages (e.g. Lisp or OCaml) can make working with CodeQL’s query language and type system more intuitive. See overview of functional programming for related context.

Repository Layout

Core Structure

Repository is based on: https://github.com/github/vscode-codeql-starter.git
All development work is done on the branch: qllab

CodeQL version is pinned via the ql/ submodule:

commit 4d681f05bd671f8b5e31624f16a2b4d75e61c071 (tag: codeql-cli/v2.22.0)

A prebuilt CodeQL CLI binary is included:

1104625939  assets/codeql-osx64.zip

Project-specific repositories can be added directly under the root. Example: the C dataflow workshop in ./codeql-dataflow-sql-injection-c

Additional Structure Notes

The original upstream README.md is preserved at ./README-vscode-codeql-starter.md

Possible Reading Orders

Data Flow

Debugging data flow config (instead of taint flow), Java

We can illustrate taint-flow debugging in the Java SQL injection sample

Debugging data flow config (instead of taint flow), C

A corresponding example for C is planned, using a simplified query to trace value propagation in ~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c. Unlike Java, C may require manual modeling even to visualize basic flows.

Modeling

There are two primary approaches to modeling: direct use of CodeQL predicates and the models-as-data system. The models-as-data system is implemented in QL but relies on external YAML files that are interpreted at query evaluation time.

The model editor provides a GUI for managing YAML-based models, but the underlying format is identical to that used by the models-as-data system. In C and other cases where GUI support is limited or unavailable, we write these YAML models manually and invoke them directly from queries.

When YAML models are written directly, the use of GPT-based tooling becomes very natural. GPTs can extract function signatures, parameter semantics, and flow annotations from documentation or code examples, then generate valid YAML model entries automatically.

As diagram:

                                  +----------------------+
                                  |     Modeling in      |
                                  |       CodeQL         |
                                  +----------+-----------+
                                             |
              +------------------------------+------------------------------+
              |                                                             |
     +--------v--------+                                          +---------v---------+
     | Direct CodeQL   |                                          |  Models-as-Data   |
     | (QL predicates) |                                          |  (YAML + QL eval) |
     +--------+--------+                                          +---------+---------+
              |                                                             |
              |                                                             |
   +----------v----------+                                  +---------------v---------------+
   | Manual customization|                                  |     YAML models via GUI       |
   | via Customizations.qll                                 |    (Model Editor frontend)    |
   +----------+----------+                                  +---------------+---------------+
              |                                                             |
              |                                                             |
    +---------v---------+                                       +-----------v-----------+
    | Java: built-in     |                                      | Java: Jedis + Console |
    | includes .qll hook |                                      | GUI modeling examples |
    +--------------------+                                      +------------------------+
              |
              | Manual setup needed for:
              v
     +------------------------+
     |   C / C++: requires    |
     |   cpp.qll patch +      |
     |   Customizations.qll   |
     +------------------------+
              |
              v
+-------------------------------+
| Use models-as-data directly   |
| (YAML only, no editor)        |
+-------------------------------+
              |
              v
+-------------------------------+
| GPT-assisted YAML generation |
| from docs, code, or examples |
+-------------------------------+

Review: SQLite Injection Workshop, Java

We begin with a recap of the Java-based injection example, focusing on the vulnerable code in AddUser.java. Following that, we examine a fully manual CodeQL query available in full-query.ql, which was written to explicitly trace tainted data through the program. Next, we explore the out-of-the-box query SqlTainted.ql included in the standard CodeQL packs, and conclude with an inspection of the relevant base classes and framework modeling in Illustrations.ql.

Customizations via codeql (Java)

To customize CodeQL for Java, we identify and extend base classes to add custom flow sources and sinks. A general explanation of this approach is available in the file README.org, particularly the section supplement codeql: Add to FlowSource or a subclass. For Java, java.qll includes Customizations.qll, which provides extension points for custom flow modeling – this structure is common across most CodeQL-supported languages, with the notable exception of C. Further details on this customization process can be found in incoming.codeql-customizations-workshop.md.

Customizations via Model Editor: Jedis Example (Java Redis client)

The Jedis example is a straightforward case with no unexpected behavior. Although the library contains many functions, they follow a simple and repetitive pattern, making it ideal for large-scale modeling. The CodeQL model editor can be used to efficiently define sources and sinks for such cases. A detailed explanation is provided OK in Modeling Jedis as a Dependency in Model Editor, while validation of OK the modeled sink is discussed in Verifying the Modeled Sink. Finally, the query-level usage of these models can be seen OK in Identify usage of injection-related models in existing queries.

Customizations via Model Editor: Single-function case (Java SQLite sample)

We extend the Java SQLite example using the model editor, with both the necessary data and specification already available. This example highlights a subtle issue with the model editor: the method java.io.Console.readLine() is already modeled as a taint step and therefore does not appear in the editor interface, even though we need it modeled as a source. This requires special handling. The relevant extensions are defined in ./.github/codeql/extensions/sqlite-db/codeql-pack.yml, and the extension data is provided in ./.github/codeql/extensions/sqlite-db/models/sqlite.model.yml. A detailed OK explanation is available in Using sqlite to illustrate models-as-data.

To support this, we explain how the “models-as-data” system works internally. A diagnostic query can be used to enumerate currently recognized sources and sinks. From there, the relevant entry points – such as QL classes and predicates – can be identified by inspecting representative queries like SqlTainted.ql.

Review: SQLite Injection Workshop (C)

This is the C version of the injection workshop, based on ./codeql-dataflow-sql-injection-c/add-user.c. It serves as the basis for both the “models-as-data” manual modeling and the extension via Customizations.qll.

(PARTIAL) Use models-as-data QL code directly (no graphical editor)

This section focuses on using the models-as-data system without the graphical model editor. While model definition files and supporting data already exist, we manually write YAML files to add or override flow behavior. This approach is especially relevant for C, where graphical tooling is limited or nonexistent.

As reinforcement, we reuse the C version of the SQLite injection workshop:

The code sample is at ~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/add-user.c.
The accompanying query is ~/work-gh/codeql-lab/codeql-dataflow-sql-injection-c/SqlInjection.ql.

For structural reference, see the Java version’s documentation (not the editor interface): Using sqlite to illustrate models-as-data. There is no separate C-specific walkthrough because the YAML structure and logic are nearly identical.

For workshop use, we extend the example by modeling key functions manually:

Add a source model for: count = read(STDIN_FILENO, buf, BUFSIZE);
Add a sink model for: rc = sqlite3_exec(db, query, NULL, 0, &zErrMsg);

We demonstrate how to define YAML-based models for standard functions like read() and verify their effect using the out-of-the-box query: SqlTainted.ql.

As an additional teaching case, we introduce the higher-level, redundant function char* get_user_info() as a custom source—even though it internally calls a function already modeled as a source—to illustrate how user-defined extensions affect propagation logic.

(PARTIAL) Extending Queries with Customizations.qll for C

The manual YAML modeling approach described earlier works well for isolated or prototype cases. However, for idiomatic, large-scale, or reusable CodeQL analysis, it is often preferable to define custom dataflow logic directly in QL—using Customizations.qll.

Most CodeQL-supported languages (e.g., Java, Python) include built-in support for this mechanism. For example, Java’s primary entry point java.qll automatically imports Customizations.qll, exposing extension points for user-defined sources, sinks, and flow steps.

In contrast, C and C++ do not support this out of the box. To enable it, you must manually patch the language pack and (optionally) rebuild the CodeQL bundle.

This section is partially complete: we document the required source-level QL changes, but the bundling process is still pending.

To enable Customizations.qll support for C/C++, perform the following:

Modify ql/cpp/ql/lib/cpp.qll to import your Customizations.qll module.
Create and populate ql/cpp/ql/lib/Customizations.qll with new source/sink/flow logic.
For full deployment: Rebuild the CodeQL bundle to include the updated QL files.
- This allows portable use in CLI runs and IDE workflows.
- Once bundled, C/C++ customization behaves like any other supported language.
For workshops and local development: No bundling is needed.
- If you run queries directly from the modified source tree, the changes take effect immediately.

A working demonstration of this modification (without bundling) is provided in: ./codeql-dataflow-sql-injection-c/README.org

CodeQL Bundling

This section will provide a detailed walkthrough of the CodeQL bundling process using the CLI tool at https://github.com/advanced-security/codeql-bundle. This tool enables custom pack composition and is necessary when extending language libraries (e.g., adding `Customizations.qll` support for C/C++).

While the official tool is somewhat of a black box, we will demystify the underlying structure and show how to build, inspect, and deploy custom bundles from source. Notes and scripts will be collected in file:codeql-bundling/README.org::XX: continue.

Tool Setup

Some scripts are used here, found in ./bin/. To ensure the ones written in Python have access to prerequites, set up a virtual environment via

# 1. Create the virtualenv
python3 -m venv ~/codeql-lab/venv

# 2. Install any packages
source ~/codeql-lab/venv/bin/activate
pip install pyyaml

For any of these scripts to work, add them to the PATH via

export PATH="$HOME/codeql-lab/bin:$PATH"

Name		Name	Last commit message	Last commit date
Latest commit History 399 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
bin		bin
codeql-bundling		codeql-bundling
codeql-custom-queries-actions		codeql-custom-queries-actions
codeql-custom-queries-cpp		codeql-custom-queries-cpp
codeql-custom-queries-csharp		codeql-custom-queries-csharp
codeql-custom-queries-go		codeql-custom-queries-go
codeql-custom-queries-java		codeql-custom-queries-java
codeql-custom-queries-javascript		codeql-custom-queries-javascript
codeql-custom-queries-python		codeql-custom-queries-python
codeql-custom-queries-ruby		codeql-custom-queries-ruby
codeql-dataflow-sql-injection-c		codeql-dataflow-sql-injection-c
codeql-duckdb-c		codeql-duckdb-c
codeql-jedis-java		codeql-jedis-java
codeql-sqlite-java		codeql-sqlite-java
extern		extern
ql @ 4d681f0		ql @ 4d681f0
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README-vscode-codeql-starter.md		README-vscode-codeql-starter.md
README.org		README.org
WHERE-FROM		WHERE-FROM
codeql-modeling.monojson		codeql-modeling.monojson
codeql-modeling.svg		codeql-modeling.svg
codeql-modeling.txt		codeql-modeling.txt
codeql-workspace.yml		codeql-workspace.yml
qllab.code-workspace		qllab.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

codeql-lab: Centralized Git Repository for CodeQL Development

Overview

Purpose

Repository Location

Intended Use Cases

Prerequisites

Repository Layout

Core Structure

Additional Structure Notes

Possible Reading Orders

Data Flow

Debugging data flow config (instead of taint flow), Java

Debugging data flow config (instead of taint flow), C

Modeling

Review: SQLite Injection Workshop, Java

Customizations via codeql (Java)

Customizations via Model Editor: Jedis Example (Java Redis client)

Customizations via Model Editor: Single-function case (Java SQLite sample)

Review: SQLite Injection Workshop (C)

(PARTIAL) Use models-as-data QL code directly (no graphical editor)

(PARTIAL) Extending Queries with Customizations.qll for C

CodeQL Bundling

Tool Setup

About

Uh oh!

Releases

Packages

Contributors 37

Languages

License

hohn/codeql-lab

Folders and files

Latest commit

History

Repository files navigation

codeql-lab: Centralized Git Repository for CodeQL Development

Overview

Purpose

Repository Location

Intended Use Cases

Prerequisites

Repository Layout

Core Structure

Additional Structure Notes

Possible Reading Orders

Data Flow

Debugging data flow config (instead of taint flow), Java

Debugging data flow config (instead of taint flow), C

Modeling

Review: SQLite Injection Workshop, Java

Customizations via codeql (Java)

Customizations via Model Editor: Jedis Example (Java Redis client)

Customizations via Model Editor: Single-function case (Java SQLite sample)

Review: SQLite Injection Workshop (C)

(PARTIAL) Use models-as-data QL code directly (no graphical editor)

(PARTIAL) Extending Queries with Customizations.qll for C

CodeQL Bundling

Tool Setup

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 37

Languages

Packages