Skip to content

Feature: SQL Window Functions → MongoDB $setWindowFields (ROW_NUMBER, RANK, DENSE_RANK, RUNNING SUM/AVG) #4

@hoangsonww

Description

@hoangsonww

Summary

Add translation support for common SQL window functions into MongoDB’s $setWindowFields (MongoDB ≥ 5.0). Cover ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n) (best-effort), running/tumbling SUM/AVG/MIN/MAX/COUNT, and partitioned order-aware windows. Provide graceful fallback/errors for unsupported server versions.

Motivation

  • Window functions are standard in analytics SQL; many migrations stall when they hit OVER (PARTITION BY … ORDER BY …).
  • MongoDB’s $setWindowFields is the canonical equivalent; supporting it unlocks powerful reporting in-place without ETL.

Scope (v1)

Supported SQL patterns:

  • ROW_NUMBER() OVER (PARTITION BY … ORDER BY …)
  • RANK() / DENSE_RANK() over partition+order
  • Running aggregates: SUM/AVG/MIN/MAX/COUNT with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • Basic frame variants: CURRENT ROW, UNBOUNDED PRECEDING; (optional) N PRECEDING where feasible
  • NTILE(n) (emit bucket index via computed percentile approximation)

Out of scope (follow-ups):

  • Window frames with FOLLOWING, RANGE (value-based) frames, and complex expressions
  • LAG/LEAD (separate issue)

API

from sql_mongo_converter import sql_to_pipeline

pipeline = sql_to_pipeline("""
  SELECT
    u.country,
    u.user_id,
    ROW_NUMBER() OVER (PARTITION BY u.country ORDER BY u.created_at) AS rn,
    RANK()       OVER (PARTITION BY u.country ORDER BY u.total_spend DESC) AS rnk,
    SUM(u.total_spend) OVER (
      PARTITION BY u.country
      ORDER BY u.created_at
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_spend
  FROM users u
  WHERE u.status = 'ACTIVE';
""", options={
  "mongoServerVersion": "6.0",   # enable $setWindowFields
  "emitCompatibilityNotes": True
})

Example Output (abridged)

{
  "collection": "users",
  "pipeline": [
    {"$match": {"status": "ACTIVE"}},
    {"$setWindowFields": {
      "partitionBy": "$country",
      "sortBy": {"created_at": 1},
      "output": {
        "rn": {"$documentNumber": {}},               // ROW_NUMBER
        "running_spend": {
          "$sum": "$total_spend",
          "window": {"documents": ["unbounded","current"]}
        }
      }
    }},
    {"$setWindowFields": {
      "partitionBy": "$country",
      "sortBy": {"total_spend": -1},
      "output": {
        "rnk": {"$rank": {}}                         // RANK (DENSE_RANK → $denseRank)
      }
    }},
    {"$project": {"country": 1, "user_id": 1, "rn": 1, "rnk": 1, "running_spend": 1, "_id": 0}}
  ]
}

Note: multiple $setWindowFields stages may be emitted to accommodate different ORDER BY specs per window.

Translation Rules

  • PARTITION BYpartitionBy: <expr>
  • ORDER BYsortBy: { … }
  • ROW_NUMBER"$documentNumber": {}
  • RANK"$rank": {}, DENSE_RANK"$denseRank": {}
  • Running aggregates → accumulator + window.documents: ["unbounded","current"]
  • N PRECEDING (best-effort) → window.documents: [-N, "current"] (warn when not supported)
  • NTILE(n) (approx): compute bucket index using $rank/$denseRank + derived math; document approximation caveats.

Dialect & Validation

  • Gate feature behind options.mongoServerVersion >= 5.0; otherwise raise a clear, actionable error with a suggestion to precompute or upgrade.
  • Validate that each window has a single ORDER BY and compatible frame; emit warnings for unsupported frames.

Tests

  • Row number by partition (stable order)
  • Rank vs dense_rank with ties
  • Running sum/avg by time order
  • Mixed windows (different ORDER BYs) on same query
  • Version guard (4.4 → friendly error)
  • NTILE(n) sanity checks on uniform partitions

README / Docs

  • New section: “Window Functions → $setWindowFields”
  • Matrix table (SQL function → Mongo expression)
  • Two full copy-paste examples + caveats (frames, server version)
  • Performance note: ensure supporting indexes on partitionBy and sortBy keys

Acceptance Criteria

  • sql_to_pipeline() translates listed window functions into valid $setWindowFields stages.
  • Multiple windows with different ORDER BY are supported (multiple stages emitted).
  • Version gating & helpful error/warning messages implemented.
  • Comprehensive unit tests + README examples added.
  • Backward compatibility preserved for existing APIs.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationduplicateThis issue or pull request already existsenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is neededquestionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions