Fix commitment discount eligibility fetch (Retail Prices API pagination)#2164
Fix commitment discount eligibility fetch (Retail Prices API pagination)#2164RolandKrummenacher wants to merge 14 commits into
Conversation
The weekly Update Commitment Discount Eligibility workflow has been failing. Two independent problems: 1. The Azure Retail Prices API now rejects the multi-field $orderby the script sent (HTTP 400), and its NextPageLink generator decrements $top per page into negative values (also 400). $orderby/$top are undocumented anyway. Rewrote the fetch to follow NextPageLink verbatim, shard by serviceFamily, and repeat each shard unioning by meterId until the set is stable -- which makes a full run reproducible (was ~24% run-to-run drift) and complete (captures meters the old single $skip walk silently dropped). Added a completeness guard that aborts before writing if the run is materially incomplete, and switched to writing the freshly-fetched set (retired meters age out instead of accumulating forever). 2. GitHub Actions cannot open pull requests in this org, so the gh pr create step always failed. The workflow now pushes a branch and surfaces a one-click create-PR link in the job summary for a maintainer to open. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerated with the corrected fetch. Net change vs the previously published file: +6,630 meters the old broken pagination missed, -35,768 retired meters no longer returned by the API (the old preserve-forever cache had hoarded them), and 166 corrected flags. 121,301 -> 92,163 rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to make the weekly “Update Commitment Discount Eligibility” automation reliable again by fixing Azure Retail Prices API pagination handling in Update-CommitmentDiscountEligibility.ps1 and updating the GitHub Actions workflow to push an update branch + provide a manual PR creation link (since Actions cannot open PRs in this org).
Changes:
- Reworks the Retail Prices API fetch to follow
NextPageLinkverbatim, shard queries byserviceFamily, and repeat/union results until stable; adds a completeness guard to prevent overwriting open data with incomplete results. - Updates the workflow to push a uniquely named branch and write a “Create pull request” compare-link to the job summary instead of calling
gh pr create. - Regenerates the open-data dataset (per PR description).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/scripts/Update-CommitmentDiscountEligibility.ps1 |
Implements sharded/repeated API traversal with convergence + completeness guards and writes a fresh (non-cached) dataset. |
.github/workflows/opendata-commitment-eligibility.yml |
Removes PR-creation permissions/step; pushes a branch and surfaces a PR link in the job summary. |
src/open-data/CommitmentDiscountEligibility.csv |
Regenerated eligibility data output (not shown in diff snippet, but described in PR). |
On network/DNS/timeout errors Invoke-RestMethod's exception has no .Response, so reading .Headers['Retry-After'] indexed $null and threw "Cannot index into a null array" inside the catch -- bypassing retry/backoff and aborting the run. Guard against a null .Response; treat it as a retryable "Network error". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks @copilot-pull-request-reviewer. Addressed: Retry/backoff null
$r = @{ Keys = @{ a=1; b=2; c=3 }; NonConverged = @('x') }
$r.Keys.GetType().Name # Hashtable
$r.Keys.Count # 3 (not 2)
$r.Keys.ContainsKey('a') # TrueThis also matches the end-to-end runs: |
flanakin
left a comment
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.8] PR Review
Summary: Solid, well-diagnosed fix. The root-cause analysis (multi-field $orderby 400s, $top decrementing negative via NextPageLink, and unstable total order dropping a scattered fraction of rows) is accurate, and the shard-and-union-to-stable approach with a completeness guard is a sound workaround. The prior Copilot review was addressed correctly (null .Response guard is real; the .Keys defense is right — hashtable key access shadows the .Keys member). I verified the regenerated CSV: 92,163 rows, LF, no BOM, fully quoted, ordinally sorted by MeterId — matches the PR claims. PowerShell parses clean; the only PSScriptAnalyzer warnings are pre-existing Write-Host usage (consistent with the original script) plus one false positive noted below. No blockers.
⚠️ Should fix (2)
-le 1000single-page heuristic hardcodes the API page size — if the server default changes, a paginated shard could be falsely treated as complete, silently re-introducing the partial-miss bug.- Completeness guard checks only the aggregate total, not per-shard health, so a systematically under-fetching shard can be masked by growth elsewhere.
💡 Suggestions (1)
PSReviewUnusedParameterfalse positive on$CollectKey(used only inside the nested-OnItemclosure, which the analyzer can't trace). Resolvable with a suppression attribute if CI treats this rule as an error.
- Detect single-page shards by NextPageLink page count instead of a hardcoded -le 1000 page size, so a server page-size change can't mask a real partial-miss. - Add a per-shard completeness guard: persist per-family counts in a CommitmentDiscountEligibility.shardcounts.json sidecar and abort before write if any shard falls below its MaxShrinkFraction floor, so a single under-fetching family can't be masked by growth elsewhere. First run (no baseline) skips the check, mirroring the aggregate guard. - Push the sidecar alongside the CSV so the baseline survives each fresh CI checkout. - Suppress the PSReviewUnusedParameter false positive on $CollectKey (used inside the -OnItem closure the analyzer can't trace). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @flanakin. Addressed all three in 7037999: 1. 2. Completeness guard was aggregate-only — added a per-shard guard. The published CSV carries only 3. Verified: script parses clean; |
flanakin
left a comment
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.8] Follow-up review — $ShardList is a silent allowlist (live data loss today)
Digging into how $ShardList is managed surfaced a current correctness bug, not a hypothetical. I verified everything below against the live Retail Prices API.
The bug: because every shard is a positive serviceFamily eq '<x>' filter and the union of shards is the whole dataset, any family not in $ShardList is never queried. A full scan of priceType eq 'Reservation' / meterRegion='primary' today returns the family AI + Machine Learning with 168 meters — and it is not in $ShardList, so those meters are silently dropped right now. The -MaxShrinkFraction guard can't catch it: 168 is ~0.2% of the total, far under the 15% floor. Renames (these families get renamed periodically) fail the same silent way.
API constraints I confirmed while looking for a cheap canary:
serviceFamily in (...)andnot (serviceFamily in (...))are both rejected (BadRequest) — the API supports only a small OData subset (eq,ne,and/or).- A negative
ne-AND canary caps at ~20 clauses (21+ →BadRequest), so it can't cover the 26-family list in one request. - There is no hard result cap — I paged a single shard past
$skip=49000andNextPageLinkwas still being served; traversal ends only atNextPageLink=null. So a discovery scan that just collects the distinct family set is safe (the ordering instability drops scattered rows, never an entire family).
Recommended fix (also makes the job faster):
Fetches are latency-bound (~600ms/page; ~2.5× faster at 5 concurrent in my test), and the shards are independent — so run them with bounded parallelism (ForEach-Object -Parallel -ThrottleLimit 5-8; unbounded will trip the 429s the script already backs off on). Two viable shapes:
- Discover then fan out: one quick scan for
DISTINCT serviceFamily, then shard over the discovered set in parallel. Self-correcting — picks upAI + Machine Learningand future families automatically; the parallelism pays for the extra pass. - Keep the list + reconcile (smallest change): fan out the known shards in parallel and, riding along on the responses you're already pulling, collect every
serviceFamilyseen; abort if a seen family isn't in$ShardList. No extra requests — converts today's silent drop into a loud, actionable failure.
Either closes the silent-data-loss hole; #1 removes the maintenance burden entirely. Happy to expand on whichever direction you prefer.
The hardcoded $ShardList was a silent allowlist: each shard is a positive `serviceFamily eq '<x>'` filter, so any family not in the list was never queried and never counted. flanakin confirmed against the live API that `AI + Machine Learning` (168 Reservation meters) exists today and was missing from the list, so those meters were dropped -- and the -MaxShrinkFraction aggregate guard can't catch it (~0.2%, far under the 15% floor). Replace the hardcoded list with runtime discovery (Get-ServiceFamily): one scan per price type collects the distinct serviceFamily set, then we shard over the discovered families. A discovery scan is a long traversal exposed to the API's unstable paging order, but that only drops scattered rows, never an entire family (each family has many meters), so the family SET is reliable; the per-family union-to-stable fetch still recovers the dropped rows. The shard list is now self-correcting -- new and renamed families are picked up automatically. Also harden Get-ShardShortfall to iterate the BASELINE families instead of the current run's, so a family that disappears entirely (discovery miss or rename) is caught as a 100% shortfall against its baseline rather than silently skipped. Note: reconciling seen families against the list (the smaller suggested option) can't detect the gap -- positive `eq` filters mean responses only ever contain families already in the list, so an omitted family is never "seen". Discovery is required to close the hole. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @flanakin — great catch, and verified. Fixed in 9894db4 by going with option 1 (discover then shard). What changed: the hardcoded On option 2 (reconcile against the list): I don't think it can actually close the hole — every shard is a positive Belt-and-suspenders: I also hardened On parallelism: I kept the fetch sequential for now. Data refresh: I didn't regenerate the committed CSV here — the |
…on (review)
The SuppressMessageAttribute named the parameter ('CollectKey') as the
target, which does not silence PSReviewUnusedParameter on PSScriptAnalyzer
1.x (only later versions honor it). Open Data CI may run an older analyzer
and fail the build. Use the empty-target form flanakin verified, which
suppresses the rule across analyzer versions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discovery surfaced 'AI + Machine Learning', but its sharded fetch still returned 0 meters: the family name was interpolated into the $filter unencoded, and in a query string an unescaped '+' is decoded server-side to a space, so `serviceFamily eq 'AI + Machine Learning'` matched nothing and the 168 meters flanakin flagged were still silently dropped. Escape the value with [uri]::EscapeDataString so '+' becomes %2B (and spaces %20); the server decodes it back to the literal before OData parsing. Verified against the live API: raw '+' returns 0 rows, the encoded form returns 168. AI + Machine Learning is the only currently discovered family containing '+'; space-only families are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Regenerated with the discovery-based + URL-encoded script. Net +119 meters (92,163 -> 92,282) from families the old hardcoded allowlist never queried -- most notably AI + Machine Learning (82 RI-eligible meters), plus Blockchain, Microsoft 365 Copilot, Windows 365, and others surfaced by runtime serviceFamily discovery. Also adds the per-shard baseline sidecar (CommitmentDiscountEligibility.shardcounts.json) that the completeness guard reads to detect a single family under-fetching. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@flanakin heads-up on a follow-up since your last review — regenerating the data locally surfaced that the discovery rewrite alone didn't actually close the blocker, so I want to flag it explicitly. After switching to dynamic discovery,
Fixed in ff6892d by escaping the value with Regenerated the open data in 166d47b: +119 net meters (92,163 → 92,282) — AI + Machine Learning (82 RI-eligible) plus Blockchain, Microsoft 365 Copilot, Windows 365, and other families the old hardcoded allowlist never queried. Also added the per-shard baseline sidecar ( All four of your review threads are addressed/resolved. Ready for another look when you have a moment. |
Use $cachedShardCounts['Reservation']/['Consumption'] instead of dot access. Dot access on a hashtable works on PowerShell 7 (CI uses pwsh), so the per-shard guard was functioning -- verified against the generated baseline -- but bracket indexing is clearer for a -AsHashtable object and removes the ambiguity a reviewer flagged. Null-safe when no baseline exists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
.github/workflows/opendata-commitment-eligibility.yml:32
- The workflow’s “No changes detected” check only considers CommitmentDiscountEligibility.csv. If the shard baseline JSON changes but the CSV does not, the job will exit early and won’t push the updated shardcounts file (even though it is added/committed later). Include CommitmentDiscountEligibility.shardcounts.json in both the diff and untracked-file checks.
# Check for changes (handles both new file and modified file)
if git diff --quiet --exit-code -- src/open-data/CommitmentDiscountEligibility.csv 2>/dev/null && \
! git ls-files --others --exclude-standard | grep -q 'CommitmentDiscountEligibility.csv'; then
echo "No changes detected"
Addresses the latest Copilot review: - Tiny-family discovery miss: a single discovery traversal is exposed to the ordering drop, and some families have only 1-2 meters, so the only row of a tiny family could be missed and the family omitted entirely. Union the discovered set with the prior baseline's families (Add-BaselineShard) so a previously-seen family is always queried. - Completeness floor used Floor(), which rounds the bound down -- for a baseline of 1 it floored to 0, so a family vanishing to 0 would pass. Use Ceiling() in both the per-shard and aggregate guards so the fractional bound is preserved and small-baseline regressions are caught. - Make MaxShrinkFraction an explicit parameter of Get-ShardShortfall instead of an implicit script-scope read (clearer dependency, testable). - Add Pester unit tests for the guard arithmetic and shard-list union. The script runs the live API top-to-bottom when dot-sourced, so the tests extract just the pure functions via the AST and evaluate those. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion - The workflow's change-detection gate only checked the CSV, so a run that updated only the per-shard baseline (counts shifted, no eligibility change) would report "No changes" and never surface it. Check the shardcounts.json sidecar too, for both modified and untracked states. - Constrain -MaxShrinkFraction to 0..1 with [ValidateRange]: a value >1 made the floor negative and silently disabled the completeness guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Logs the open-data refresh fix making the commitment discount eligibility dataset reproducible and complete (#2164). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lity-pagination # Conflicts: # docs-mslearn/toolkit/changelog.md
flanakin
left a comment
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.8] PR Review (re-review)
Summary: The implementation has been substantially reworked and every prior finding is addressed — dynamic serviceFamily discovery unioned with the prior-run baseline (no more hardcoded allowlist; AI + Machine Learning is now captured, 82 RI meters), URL-encoding of the + in the shard filter, a per-shard completeness guard that iterates the baseline so a vanished/under-fetching family is caught, page-count-based single-page detection replacing the brittle -le 1000 heuristic, [ValidateRange(0,1)] on -MaxShrinkFraction, the verified empty-target lint suppression, and a new unit-test file. I verified locally: script and tests parse, PSScriptAnalyzer is clean (only pre-existing Write-Host), 8/8 Pester tests pass, the CSV is well-formed (LF, no BOM, fully quoted, ordinally sorted), and the sidecar round-trips. Nicely done. One should-fix below; no blockers.
⚠️ Should fix (1)
- The new
CommitmentDiscountEligibility.shardcounts.jsonis an internal guard-baseline artifact, butPackage-Toolkit.ps1packagessrc/open-data/*.json, so it would ship in the public release alongside real reference data. Recommend excluding it from packaging or relocating the sidecar.
| # that systematically under-fetches (which an aggregate-only check could miss when | ||
| # growth elsewhere masks it). Committed/pushed beside the CSV so it survives the | ||
| # fresh checkout of each scheduled CI run. | ||
| $ShardCountPath = [System.IO.Path]::ChangeExtension($OutputPath, 'shardcounts.json') |
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.8]
This sidecar is an internal operational artifact (per-shard baseline counts for the completeness guard), but it lands in src/open-data/ where Package-Toolkit.ps1's Copy-OpenDataFiles copies src/open-data/*.json verbatim into the public release package (next to legitimate reference data like ResourceTypes.json). So CommitmentDiscountEligibility.shardcounts.json would ship to customers as if it were reference data, which it isn't.
It also matches the src/open-data/*.json trigger in opendata-ci.yml, so each weekly auto-update fires Open Data CI — harmless (Build-OpenData only processes the CSV allowlist), but worth being aware of.
The packaging leak is the real one. A couple of options:
- Exclude this file in
Copy-OpenDataFiles(e.g.-Exclude '*.shardcounts.json'), or - Relocate the sidecar outside the packaged glob (e.g. a non-
open-datapath, or a dotfile/subfolder the packager skips) — though that complicates the workflow's checkout/commit step, so an exclude is likely simpler.
Not a blocker, but it ships an internal build-state file publicly until resolved.
Summary
The weekly Update Commitment Discount Eligibility workflow has been failing on every run. This fixes two independent root causes and regenerates the open-data file.
1. The fetch was broken and non-reproducible
Update-CommitmentDiscountEligibility.ps1paged the Azure Retail Prices API with an undocumented multi-field$orderbyand an explicit$top:$orderbywith HTTP 400 (fails on page 1 — the recent ~30s failures).NextPageLinkwhile sending$topis also broken: the API'sNextPageLinkdecrements$topper page into negatives (1000 → 0 → -1000), returning HTTP 400.NextPageLink/$skippagination has no stable total order across page requests — a single long traversal silently drops a scattered ~2–24% of rows, so two identical runs disagreed (~76% Jaccard).Fix: follow
NextPageLinkverbatim (no$orderby, no$top), shard each query byserviceFamilyso traversals stay short, and repeat each shard unioning bymeterIduntil the set is stable. A full run is now reproducible (two runs byte-identical) and more complete (captures meters the old walk dropped). Added a completeness guard (-MaxShrinkFraction, default 0.15) that aborts before writing if a run is materially incomplete, and switched to writing the freshly-fetched set so retired meters age out instead of accumulating forever.2. The workflow tried to open a PR, which Actions can't do here
gh pr createfailed every run (Actions aren't permitted to open PRs in this org — confirmed via run history: theCreate PRstep failed while the fetch succeeded). The workflow now pushes a dated branch and surfaces a one-click "create PR" link in the job summary for a maintainer to open (which then triggers Open Data CI and normal review).Data change
Regenerated
CommitmentDiscountEligibility.csv: 121,301 → 92,163 rowsTest plan
MeterId, correct columns🤖 Generated with Claude Code