Skip to content

docs(skills): audit vectorization + cuDF compatibility in review skill#1283

Merged
lmeyerov merged 1 commit intomasterfrom
docs/review-skill-vectorization-cudf-checks
May 4, 2026
Merged

docs(skills): audit vectorization + cuDF compatibility in review skill#1283
lmeyerov merged 1 commit intomasterfrom
docs/review-skill-vectorization-cudf-checks

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented May 4, 2026

Summary

Adds three audit checks to agents/skills/review/SKILL.md's "Pygraphistry-specific review checks" section so reviewers consistently catch GFQL/compute regressions:

  1. Vectorization — flag apply(axis=1), iterrows(), itertuples(), per-row Python loops, scalar-in-loop access on hot row paths.
  2. Mutation (pygraphistry is pure-functional) — flag df[col]=v, df.loc[mask,c]=v, cell-wise mutation in loops, inplace=True, del df[col], df.append(...), mutating-then-returning-input. Inline pure alternatives (df.assign, df.where, df.drop, engine-polymorphic df_concat(engine), etc.).
  3. cuDF compatibility — flag pandas-only APIs, .to_pandas() round-trips, is pd.NA comparisons, dtype string matches, signatures locked to pd.DataFrame instead of DataFrameT.

Plus: a list of engine-polymorphic helpers reviewers should recommend (df_cons, df_concat, df_to_engine, safe_merge, resolve_engine, template_df_cons, DataFrameT); paired-cuDF-coverage rule; flag-wording template; false-positive guard distinguishing hot-row paths from control-plane code.

Why now

The skill currently has one GPU-related line ("if the PR touches GPU code, run GPU CI on dgx-spark") that fires AFTER a code change ships. There's nothing pre-CI guiding a reviewer to flag, e.g., a new df.apply(fn, axis=1) on the row pipeline (cuDF-incompatible), a new df[col] = v mutation pattern, an inplace=True, or a helper signature locking the DataFrame param to pd.DataFrame.

Pygraphistry #1279 (free-form intermediate MATCH admit, in review) made these assumptions concrete enough to codify. This PR is independent of #1279.

What's in / out

In: a #### Vectorization & engine compatibility (GFQL / row pipeline / compute) subsection (~53 LOC under "Pygraphistry-specific review checks"). Three terse severity tables, engine-polymorphic-helpers list, paired-cuDF rule, flag wording, FP guard.

Out: GPU-validation-evidence rule (line 172-174) unchanged. No changes to other skill phases.

Test plan

  • Single-file 53-line addition; reads cleanly with surrounding skill structure.
  • CI green (push pending).

Refs

@lmeyerov lmeyerov force-pushed the docs/review-skill-vectorization-cudf-checks branch from 243163f to c597214 Compare May 4, 2026 05:11
Adds a "Vectorization & engine compatibility (GFQL / row pipeline /
compute)" subsection under "Pygraphistry-specific review checks" so
reviewers consistently catch:

* Per-row Python loops (`apply(axis=1)`, `iterrows`, `itertuples`,
  `loc[i, col] = ...` in a loop) on hot row paths — both a perf cliff
  on pandas and an outright break on cuDF.
* cuDF compatibility regressions — pandas-only API patterns, GPU↔CPU
  round-trips that signal missing vectorization, signatures locked to
  `pd.DataFrame` instead of the engine-neutral `DataFrameT`.
* When a code change requires paired cuDF coverage vs. when CPU-only
  is acceptable.

Severity rubric and false-positive guards included so reviewers
distinguish hot row paths from control-plane code and don't flag
pre-existing repo debt as new findings.

The pre-existing GPU-validation-evidence rule (line 172-174) keeps its
current scope; this subsection adds the static-review-time checks that
should fire BEFORE running GPU CI.

Catalyst: pygraphistry #1279 (free-form intermediate MATCH admit) made
the vectorization + cuDF assumptions concrete enough to codify.
@lmeyerov lmeyerov force-pushed the docs/review-skill-vectorization-cudf-checks branch from c597214 to b0865bc Compare May 4, 2026 05:23
@lmeyerov lmeyerov merged commit 5e1ac3d into master May 4, 2026
93 checks passed
@lmeyerov lmeyerov deleted the docs/review-skill-vectorization-cudf-checks branch May 4, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant