Skip to content

fix(db): persist CPE column in postgres and mysql writers#2494

Merged
Mzack9999 merged 4 commits into
devfrom
fix/2487-store-cpe-in-db
May 13, 2026
Merged

fix(db): persist CPE column in postgres and mysql writers#2494
Mzack9999 merged 4 commits into
devfrom
fix/2487-store-cpe-in-db

Conversation

@dogancanbakir
Copy link
Copy Markdown
Member

@dogancanbakir dogancanbakir commented May 12, 2026

Proposed changes

Closes #2487. Add cpe column to postgres and mysql schemas with idempotent migration for existing tables and JSON binding in InsertBatch. MongoDB unaffected.

Proof

Fresh and pre-existing tables verified on postgres 17 and mysql 8.4. go vet and unit tests pass.

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Chores
    • Enhanced database schema and migration to add and backfill a new CPE JSON column for MySQL and PostgreSQL.
    • Updated batch insertion to include and persist CPE data during bulk writes.
  • Bug Fixes / Reliability
    • Improved input streaming and resolver file handling to process input line-by-line and surface per-line errors with clearer context.

Review Change Stack

Adds cpe column to both SQL schemas, idempotent migration for
existing tables, and JSON binding in InsertBatch.
@auto-assign auto-assign Bot requested a review from Mzack9999 May 12, 2026 14:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Walkthrough

MySQL and PostgreSQL drivers add a cpe JSON/JSONB column to table DDLs, runtime migration steps to add the column when missing, and update batch insert logic to marshal and bind CPE data during record insertion. Input streaming and resolver file handling now use line-iterator helpers that surface per-line errors.

Changes

CPE Column Database Support

Layer / File(s) Summary
Schema definition and migration for cpe column
internal/db/mysql.go, internal/db/postgres.go
Table DDLs declare cpe (MySQL JSON, Postgres JSONB). MySQL adds ensureColumn(ctx, column, definition) which queries information_schema.columns and runs ALTER TABLE ... ADD COLUMN when missing; Postgres runs ALTER TABLE ... ADD COLUMN IF NOT EXISTS cpe JSONB during schema setup.
Batch insert cpe column binding
internal/db/mysql.go, internal/db/postgres.go
InsertBatch INSERT column lists and VALUES placeholders include cpe. Per-row code marshals r.CPE into cpeJSON and passes it as an additional argument to ExecContext.

Input streaming and resolver handling

Layer / File(s) Summary
ValidateOptions: resolver file line iteration
runner/options.go
Resolver file processing now iterates with fileutil.Lines(resolver) and wraps per-line errors with Couldn't process resolver file \"<resolver>\".
streamInput: file/stdin line streaming
runner/runner.go
streamInput() switches file, glob, and stdin branches to fileutil.Lines(...) / fileutil.LinesReader(...), returning early on the first read error while preserving trimming, dedupe, and trySend behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through schema, column set free,
JSON for MySQL, JSONB for PG,
Marshalled CPE tucked in every row,
Lines now read one-by-one where inputs flow,
A rabbit cheers — database seeds grow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to runner/options.go and runner/runner.go for line-iterator processing appear unrelated to the CPE column persistence objective. Review whether runner changes are necessary for this PR or should be addressed in a separate pull request to maintain scope clarity.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding CPE column persistence to Postgres and MySQL database writers.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #2487: CPE column creation in Postgres/MySQL schemas, idempotent migration support, and CPE value binding in batch inserts.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2487-store-cpe-in-db

Comment @coderabbitai help to get the list of available commands and usage tips.

@dogancanbakir dogancanbakir self-assigned this May 12, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/db/mysql.go (1)

176-194: 💤 Low value

Consider restricting allowed column definitions to prevent SQL injection.

The definition parameter is interpolated directly into the SQL statement without validation. While currently called only with hardcoded "JSON", this pattern could become a risk if the function is reused with dynamic input.

Static analysis flagged this as a potential SQL injection vector. Since table and column names are already properly quoted via quoteIdentifier, the practical risk is low given current usage.

♻️ Optional: validate definition against allowed types
+var allowedColumnTypes = map[string]bool{
+	"JSON": true,
+}
+
 func (m *mysqlDatabase) ensureColumn(ctx context.Context, column, definition string) error {
+	if !allowedColumnTypes[definition] {
+		return fmt.Errorf("unsupported column definition: %s", definition)
+	}
 	var count int
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/db/mysql.go` around lines 176 - 194, The ensureColumn function
currently interpolates the definition string directly into an ALTER TABLE SQL
and should validate it first to avoid injection; update ensureColumn to allow
only a small whitelist (e.g., "JSON", "TEXT", fixed-size "VARCHAR(n)" matching a
strict regex like ^VARCHAR\([1-9][0-9]?\)$) or explicit allowed tokens, reject
or return an error for anything else, and also reject characters like ';' or
'--'. Keep using quoteIdentifier for m.cfg.TableName and column, and only call
fmt.Sprintf("ALTER TABLE %s ADD COLUMN %s %s", ...) with the
validated/normalized definition so dynamic input cannot inject SQL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/db/mysql.go`:
- Around line 176-194: The ensureColumn function currently interpolates the
definition string directly into an ALTER TABLE SQL and should validate it first
to avoid injection; update ensureColumn to allow only a small whitelist (e.g.,
"JSON", "TEXT", fixed-size "VARCHAR(n)" matching a strict regex like
^VARCHAR\([1-9][0-9]?\)$) or explicit allowed tokens, reject or return an error
for anything else, and also reject characters like ';' or '--'. Keep using
quoteIdentifier for m.cfg.TableName and column, and only call fmt.Sprintf("ALTER
TABLE %s ADD COLUMN %s %s", ...) with the validated/normalized definition so
dynamic input cannot inject SQL.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7527d076-4683-4924-8b5b-4c790eb6b290

📥 Commits

Reviewing files that changed from the base of the PR and between 31af5c0 and 7ff4ad0.

📒 Files selected for processing (2)
  • internal/db/mysql.go
  • internal/db/postgres.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/db/mysql.go`:
- Around line 169-176: The current ad-hoc migration in m.ensureColumn uses a
COUNT(*) then ALTER TABLE which races during concurrent upgrades; update
ensureColumn to make the schema change idempotent by performing the ALTER TABLE
ADD COLUMN using "ADD COLUMN IF NOT EXISTS" when supported or, if not, by
executing the ALTER TABLE and treating MySQL error 1060 (duplicate column) as a
non-fatal success. Locate the ensureColumn implementation and change the logic
that does COUNT(*) + ALTER to instead attempt the ALTER directly and
ignore/handle the duplicate-column error (or prefer ADD COLUMN IF NOT EXISTS) so
concurrent callers do not return an error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f88c6546-557c-4a37-a6df-28d34beb904f

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff4ad0 and f545560.

📒 Files selected for processing (2)
  • internal/db/mysql.go
  • internal/db/postgres.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/db/postgres.go

Comment thread internal/db/mysql.go
Comment on lines +169 to +176
// Back-compat for databases whose schema was created before CPE support.
// New installs already get this column via the CREATE TABLE above; this
// path only matters for in-place upgrades.
// TODO: replace these ad-hoc ensureColumn calls with a proper migration
// framework (e.g. golang-migrate / goose) once more schema changes accumulate.
if err := m.ensureColumn(ctx, "cpe", "JSON"); err != nil {
return fmt.Errorf("failed to ensure cpe column: %w", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd internal/db && head -200 mysql.go | tail -50

Repository: projectdiscovery/httpx

Length of output: 1521


🏁 Script executed:

cat -n internal/db/mysql.go | sed -n '169,200p'

Repository: projectdiscovery/httpx

Length of output: 1329


🏁 Script executed:

# Check the import structure to understand current state
cat -n internal/db/mysql.go | head -20

Repository: projectdiscovery/httpx

Length of output: 526


🏁 Script executed:

# Verify MySQL error code 1060
web_search

Repository: projectdiscovery/httpx

Length of output: 117


🌐 Web query:

MySQL error code 1060 duplicate column

💡 Result:

MySQL error code 1060 (ER_DUP_FIELDNAME, SQLSTATE 42S21) occurs with the message "Duplicate column name '%s'".[1][2][3] It is triggered when attempting to define a column with a name that already exists in the table or result set, enforcing unique column names within tables and query results.[4][5][1] Common scenarios: - CREATE TABLE with duplicate column definitions, e.g., two 'email' columns.[4] - ALTER TABLE ADD COLUMN where the column already exists.[6] - SELECT * from joined tables with matching column names (e.g., both have 'id').[7] - CREATE VIEW without unique column aliases.[8] - Subqueries or INSERT...SELECT without column aliases.[9][10] Fixes: 1. Check the error message for the specific duplicate column name '%s'. 2. For CREATE/ALTER: Remove or rename the duplicate (e.g., ADD COLUMN new_email).[4][6] 3. For SELECT/JOIN/VIEW: Use aliases, e.g., SELECT t1.id AS t1_id, t2.id AS t2_id.[7][8] Example: SELECT t.id AS tips_id FROM tips t LEFT JOIN tip_usage ON tip_usage.tip_id = t.id GROUP BY t.id;[7] 4. Verify table schema with SHOW CREATE TABLE or DESCRIBE table to confirm existing columns.[6] This behavior is consistent across MySQL versions including 8.0+ and 9.x, and also in MariaDB.[3][1][2]

Citations:


🏁 Script executed:

# Check if errors package is already imported
rg "^import|\"errors\"" internal/db/mysql.go

Repository: projectdiscovery/httpx

Length of output: 75


🏁 Script executed:

# Verify go-sql-driver/mysql exports MySQLError type
# Search for usage patterns in the codebase
rg "MySQLError|mysql\.MySQLError" --type go

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

# Check the actual go-sql-driver/mysql module to understand the error type
web_search

Repository: projectdiscovery/httpx

Length of output: 117


🌐 Web query:

go-sql-driver mysql MySQLError type documentation

💡 Result:

In github.com/go-sql-driver/mysql, MySQL server errors are represented by the struct type MySQLError (pointer receiver form is used): - Type: type MySQLError struct { Number uint16; SQLState [1]byte; Message string } [2] - String formatting: func (me *MySQLError) Error string {... } [2] - Error matching: func (me *MySQLError) Is(err error) bool { if merr, ok:= err.(*MySQLError); ok { return merr.Number == me.Number } return false } [2] So, for documentation/usage you typically: 1) Import the driver and run queries via database/sql. [3] 2) Type-assert / extract *mysql.MySQLError from returned errors when you want the MySQL error number and SQLState; the driver supports matching on MySQLError, including via errors.Is-style number matching because MySQLError defines Is. [2][4] Example (extract typed MySQL error): var mysqlErr *mysql.MySQLError if errors.As(err, &mysqlErr) { log.Printf("MySQL Error %d (%s): %s", mysqlErr.Number, string(mysqlErr.SQLState[:]), mysqlErr.Message) } This works because MySQLError carries Number/SQLState/Message. [2] Notes on exported sentinel errors: - The driver also defines various exported error variables (e.g., ErrInvalidConn, ErrMalformPkt, ErrNoTLS, etc.) for non–server-returned error scenarios, and these can be checked directly. [2] If you want the exact code definition, see errors.go in the driver repository. [2]

Citations:


Make the column migration safe for concurrent upgrades.

The COUNT(*) + ALTER TABLE sequence at lines 181-199 is racy. If two instances start against the same pre-existing table, both can observe the column missing; one ADD COLUMN succeeds and the other fails with MySQL error 1060 (duplicate column), so the migration is not idempotent during rolling deploys.

💡 Suggested fix
 import (
 	"context"
 	"database/sql"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"strings"
 
-	_ "github.com/go-sql-driver/mysql"
+	mysql "github.com/go-sql-driver/mysql"
 	"github.com/projectdiscovery/httpx/runner"
 )
@@
 func (m *mysqlDatabase) ensureColumn(ctx context.Context, column, definition string) error {
 	var count int
 	err := m.db.QueryRowContext(ctx,
 		`SELECT COUNT(*) FROM information_schema.columns
 		 WHERE table_schema = DATABASE() AND table_name = ? AND column_name = ?`,
 		m.cfg.TableName, column,
 	).Scan(&count)
 	if err != nil {
 		return err
 	}
 	if count > 0 {
 		return nil
 	}
 	_, err = m.db.ExecContext(ctx,
 		fmt.Sprintf("ALTER TABLE %s ADD COLUMN %s %s",
 			quoteIdentifier(m.cfg.TableName), quoteIdentifier(column), definition),
 	)
+	if err != nil {
+		var mysqlErr *mysql.MySQLError
+		if errors.As(err, &mysqlErr) && mysqlErr.Number == 1060 {
+			return nil
+		}
+	}
 	return err
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/db/mysql.go` around lines 169 - 176, The current ad-hoc migration in
m.ensureColumn uses a COUNT(*) then ALTER TABLE which races during concurrent
upgrades; update ensureColumn to make the schema change idempotent by performing
the ALTER TABLE ADD COLUMN using "ADD COLUMN IF NOT EXISTS" when supported or,
if not, by executing the ALTER TABLE and treating MySQL error 1060 (duplicate
column) as a non-fatal success. Locate the ensureColumn implementation and
change the logic that does COUNT(*) + ALTER to instead attempt the ALTER
directly and ignore/handle the duplicate-column error (or prefer ADD COLUMN IF
NOT EXISTS) so concurrent callers do not return an error.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@runner/runner.go`:
- Around line 746-749: In the loops iterating "for item, err := range
fileutil.Lines(r.options.InputFile)" do not silently return on err; instead
surface the error (log it via the runner's logger or return it up the call
chain). Replace the bare "if err { return }" with code that logs a descriptive
message including r.options.InputFile and the error (or wraps and returns the
error) so read errors at the fileutil.Lines call are visible and fail the scan;
apply the same change to the other two occurrences handling errors from
fileutil.Lines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b2c1de0-cbba-4122-a14c-37ce9652b7c3

📥 Commits

Reviewing files that changed from the base of the PR and between f545560 and 8e6bb79.

📒 Files selected for processing (2)
  • runner/options.go
  • runner/runner.go

Comment thread runner/runner.go
Comment on lines +746 to +749
for item, err := range fileutil.Lines(r.options.InputFile) {
if err != nil {
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not swallow stream read errors silently.

At Line 748, Line 765, and Line 778, read errors cause an immediate return with no log/error signal. That can truncate scans silently and is very hard to diagnose.

Proposed fix
@@
-               for item, err := range fileutil.Lines(r.options.InputFile) {
+               for item, err := range fileutil.Lines(r.options.InputFile) {
                    if err != nil {
+                       gologger.Error().Msgf("Could not read input file '%s': %s\n", r.options.InputFile, err)
                        return
                    }
@@
-               for item, err := range fileutil.Lines(file) {
+               for item, err := range fileutil.Lines(file) {
                    if err != nil {
+                       gologger.Error().Msgf("Could not read input file '%s': %s\n", file, err)
                        return
                    }
@@
-           for item, err := range fileutil.LinesReader(os.Stdin) {
+           for item, err := range fileutil.LinesReader(os.Stdin) {
                if err != nil {
+                   gologger.Error().Msgf("Could not read input from stdin: %s\n", err)
                    return
                }

Also applies to: 763-766, 776-779

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@runner/runner.go` around lines 746 - 749, In the loops iterating "for item,
err := range fileutil.Lines(r.options.InputFile)" do not silently return on err;
instead surface the error (log it via the runner's logger or return it up the
call chain). Replace the bare "if err { return }" with code that logs a
descriptive message including r.options.InputFile and the error (or wraps and
returns the error) so read errors at the fileutil.Lines call are visible and
fail the scan; apply the same change to the other two occurrences handling
errors from fileutil.Lines.

@Mzack9999 Mzack9999 merged commit 0dc2c9f into dev May 13, 2026
14 checks passed
@Mzack9999 Mzack9999 deleted the fix/2487-store-cpe-in-db branch May 13, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPE (Common Platform Enumeration) is not saved in the database.

2 participants