CodeRail: A Guardrail for Code Vulnerabilities

CodeRail is a tool designed to act as a guardrail against introducing vulnerabilities in generated code. It uses a pipeline of checks and balances, including static analysis and dependency scanning, to identify and automatically remediate potential security issues before they are accepted.

How It Works

The tool follows a multi-step pipeline to ensure the generated code is as secure as possible, incorporating a Chain of Thought (CoT) process for more robust vulnerability remediation.

Flowchart

graph TD
    A[Start: User Prompt] --> B{Input Moderation};
    B -- Malicious Prompt --> X[Reject];
    B -- Safe Prompt --> C[LLM Generates Code & Dependencies];
    C --> D{Schema Validation};
    D -- Invalid JSON/Schema --> X;
    D -- Valid --> E[Static Analysis Semgrep];
    E --> E2[Secret Detection];
    E2 --> F[Dependency Scanning OSV];
    F --> G{Vulnerabilities, Secrets, or CVEs Found?};
    G -- No --> H[Accept Code];
    G -- Yes --> I[**CoT Repair**: Analyze, Plan, Execute];
    I --> J{Re-validate Repaired Code};
    J -- Still Vulnerable --> K[Reject or Flag for Manual Review];
    J -- Fixed --> H;
    H --> Z[End: Secure Code Output];
    K --> Z;
    X --> Z;

Workflow Steps

Input Moderation: The user's prompt is first checked for any malicious keywords or phrases (e.g., "reverse shell", "exploit"). If found, the request is blocked.
Code Generation: A Large Language Model (LLM) generates code, dependencies, and other metadata based on the user's prompt. The output is expected to be in a structured JSON format.
Schema Validation: The JSON output from the LLM is validated against a predefined schema (guardrails_schema.yml) to ensure it has the correct structure (e.g., files, dependencies).
Static Analysis: The generated code is scanned using semgrep with a custom ruleset (semgrep/ruleset.yml) to find common security vulnerabilities (identified by CWEs).
Secret Detection: The code is scanned for hardcoded secrets, such as API keys and private keys, using regular expressions.
Dependency Scanning: The list of dependencies is checked against the Open Source Vulnerability (OSV) database to find any known CVEs.
Decision & Repair (Chain of Thought):
- If no critical vulnerabilities, secrets, or CVEs are found, the code is accepted.
- If issues are found, the tool initiates a Chain of Thought (CoT) repair process. Instead of just asking for a patch, the LLM is prompted to:
  1. Analyze: Explain the vulnerability.
  2. Plan: Outline the steps to fix it.
  3. Execute: Generate the complete, patched code.
Re-validation: The patched code is run through the semgrep and secret scanners again.
- If the patch resolves the critical issues, the code is accepted.
- If issues persist, the patch is considered failed, and the code is flagged for manual review.
Output: The final result, including the code, findings, the LLM's reasoning (from the CoT process), and the action taken (accept, patch, or fail), is returned as a JSON object.

How to Run

1. Installation

First, install the necessary dependencies. It is recommended to use a virtual environment.

# Create and activate a virtual environment (optional but recommended)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# If you are using Ollama, make sure it is installed and the model is pulled
# ollama pull llama2

2. Create a Prompt File

Create a text file containing the prompt you want to send to the LLM. For example, my_prompt.txt:

Create a Python Flask application with a single endpoint `/` that returns "Hello, World!".

3. Execute the Tool

Run the main.py script with the required arguments.

python3 src/main.py --prompt-file my_prompt.txt

Command-Line Arguments

--backend: The LLM backend to use. Choices: ollama, hf (Hugging Face). Default: hf.
--model: The model name to use (e.g., gpt2, llama2). Default: gpt2.
--prompt-file: (Required) Path to the file containing the user prompt.
--schema: Path to the guardrails schema file. Default: guardrails_schema.yml.
--rules: Path to the Semgrep ruleset file. Default: semgrep/ruleset.yml.

Example with Ollama:

python3 src/main.py --backend ollama --model llama2 --prompt-file my_prompt.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
semgrep		semgrep
src		src
.gitignore		.gitignore
README.md		README.md
env.sh		env.sh
guardrails_schema.yml		guardrails_schema.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeRail: A Guardrail for Code Vulnerabilities

How It Works

Flowchart

Workflow Steps

How to Run

1. Installation

2. Create a Prompt File

3. Execute the Tool

Command-Line Arguments

About

Uh oh!

Releases

Packages

Languages

sonith17/CodeRail

Folders and files

Latest commit

History

Repository files navigation

CodeRail: A Guardrail for Code Vulnerabilities

How It Works

Flowchart

Workflow Steps

How to Run

1. Installation

2. Create a Prompt File

3. Execute the Tool

Command-Line Arguments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages