purview-v1.11.2#62
Merged
Merged
Conversation
|
Thanks for the contribution, @Rance9! ✅ This PR is from an approved team member and will be reviewed normally. |
jordankingisalive
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release Notes: v1.11.x
Release Information
Script Download & Support
Download the script below. For questions or issues, refer to the documentation.
Overview
v1.11.2
Version 1.11.2 redesigns the output destination model around symmetric per-data-type switch pairs, extends cross-run append/merge to every data stream PAX produces, and introduces Microsoft Fabric Lakehouse Delta-table output. Existing v1.11.1 behavior is preserved when none of the new switches are used.
Unified Per-Data-Type Destination Model
A symmetric
-OutputPath*/-Append*switch pair is provided for each output stream — Purview audit (-OutputPath/-AppendFile), EntraUsers / MAC licensing (-OutputPathUserInfo/-AppendUserInfo), Microsoft Agent 365 catalog (-OutputPathAgent365Info/-AppendAgent365Info), and run log (-OutputPathLog). Storage tier is inferred from each path's form: drive-rooted absolute paths resolve to Local,https://...sharepoint.com/...URLs resolve to SharePoint, andhttps://...onelake.dfs.fabric.microsoft.com/...Lakehouse/...URLs resolve to Fabric. UNC paths are rejected on every destination switch, and every destination supplied to a single run must resolve to the same storage tier. The legacy-OutputPathSPand-OutputPathFabricswitches are removed — express remote destinations via any-OutputPath*value whose form is a SharePoint or OneLake URL.Per-Dimension Append and Cross-Run Merge for All Outputs
-AppendFilenow works across all rollup modes (-Rollup,-RollupPlusRaw) on all three storage tiers. Two new switches —-AppendUserInfoand-AppendAgent365Info— extend the same union-merge contract to the EntraUsers and Agent 365 catalog outputs respectively. Every append-mode run emits a standardRetained / New / Departed / Unionmerge tally for each merged stream; the merge is union-only — rows are never dropped from the target.Departedrows are kept in the merged file withIn_Latest_Append=FALSE. Three provenance columns (Date_Added,Latest_Append_Date,In_Latest_Append) are appended to any merged file so analysts can see when each row first appeared and whether it was present in the most recent run. The CopilotInteraction rollup Fact CSV additionally gains two raw identity columns (Message_Id_Raw,ThreadId_Raw) so per-run integer surrogate keys remain stable across appends.Microsoft Fabric Lakehouse Delta-Table Output
When any
-OutputPath*value resolves to a Fabric OneLake URL, customer-visible outputs are written as Delta tables under the LakehouseTables/namespace — queryable directly from the Fabric SQL endpoint and consumable by Direct Lake Power BI semantic models. Table names are evergreen (CSV basename with the_YYYYMMDD_HHMMSSrun-timestamp stripped), so the same table is overwritten run after run while CSV filenames continue to carry the timestamp suffix. Schema evolution is automatic viaschema_mode='merge'so dynamic-ExplodeDeepcolumns are absorbed as new nullable columns on subsequent appends; mode mismatches across runs into the same target table are rejected at pre-flight. Thedeltalake>=0.15Python package is auto-installed on first use, mirroring the existingorjsoninstall pattern. Resume artifacts are mirrored to durable OneLake storage at<Lakehouse>/Files/.pax_resume/<RunTimestamp>/so resume survives ephemeral container restarts.Operational Hardening for Noninteractive Hosts
A new noninteractive-host detector and a bootstrap-log infrastructure layer harden PAX for execution inside Azure Container Apps Jobs, Windows services, scheduled tasks, and CI runners. The bootstrap log opens at the first executable line of the script body so pre-flight failures leave a readable log file behind; at log finalization the bootstrap content migrates into the final resolved log path.
Fabric / ACA Deployment Helpers (
fabric_resources/)A new top-level
fabric_resources/folder ships two supported Fabric on-ramps and the shared prereqs script: a top-level overview / path decision guide, a Path A local-run README (laptop, on-prem server, or Azure VM with managed identity), a Path B Dockerfile and ACA Job deploy helper (with the mandatory Azure Files mount for the bootstrap-log volume), a shared scope-grant script, and a compatibility matrix.Switch Surface Simplification
Alongside the new features above, v1.11.2 includes a focused streamlining pass that retires several optional features whose real-world adoption was narrow but whose code paths added a disproportionate amount of script complexity, test surface, and documentation overhead. Sharpening PAX around the workflows the majority of customers actually run leaves a smaller, more readable codebase and frees subsequent versions to land core improvements faster. Retired feature areas include the DSPM-for-AI activity-set helper, the in-script schema-explosion modes, native Excel workbook output, offline replay mode, the Microsoft Agent 365 catalog enrichment, and the separate remote-destination switches (now folded into a single tier-inferring
-OutputPath). See Switch Surface Simplification (v1.11.2) for the per-feature replacement path and rationale. The legacyC:\Temp\default on-OutputPathis also removed —-OutputPathis required for normal runs and may be omitted only when-OnlyUserInfois used (in which case-OutputPathUserInfocarries the EntraUsers destination).v1.11.1
Version 1.11.1 is a large functional release. It introduces three flagship capabilities — the
-Rollup/-RollupPlusRawpost-processor, Microsoft Agent 365 catalog enrichment, and remote output destinations (SharePoint and Microsoft Fabric / OneLake) — alongside a newManagedIdentityauth mode for Azure-hosted unattended runs and major reliability and authentication hardening. Existing Purview audit-log processing behavior is unchanged when none of the new switches are used.Rollup Post-Processor (
-Rollup/-RollupPlusRaw)The new
-Rollupand-RollupPlusRawswitches turn PAX into an end-to-end pipeline: as soon as the audit export succeeds, an embedded Python post-processor runs against the raw CSV(s) and emits rolled-up CSVs shaped specifically for the Microsoft Copilot Growth ROI Advisory Team's Power BI templates published at https://github.com/microsoft/Analytics-Hub. This collapses what was previously a multi-step, manual hand-off (run PAX → locate raw CSV → run a separate Python script → load into Power BI) into a single command line.Highlights:
-Rollupdeletes the raw CSV(s) on processor success (only the rollup output remains);-RollupPlusRawkeeps the raw CSV(s) alongside the rollup output. Mutually exclusive.Purview_CopilotInteraction_Processorv3.0.0.-IncludeUserInfois auto-enabled because this processor consumes both the Purview CSV and the Entra users CSV. Target Analytics-Hub dashboards: AI-in-One and AI Business Value.-IncludeM365Usagerun → embeddedPurview_M365_Usage_Bundle_Explosion_Processorv2.1.0.-CombineOutputis auto-enabled so a single combined Purview CSV is fed to the processor. Target Analytics-Hub dashboard: M365 Usage Analytics..ps1. At runtime the selected source is materialized into.pax_incremental\PAX_<Label>_<RunTimestamp>.py, executed, and reaped by the function'sfinallyblock plus an end-of-run safety-net sweep. No external Python files to ship or maintain.python→py -3.13/-3.12/-3.11/-3.10→python3). If none is found it attempts a per-user silent install of Python 3.13 via winget (Python.Python.3.13), falling back to the python.org offline installer.orjsonis installed best-effort for ~5–10× faster JSON parsing; both processors fall back to stdlibjsonon import failure.-Rollupvs-RollupPlusRaw); the raw CSV(s) already on disk remain the canonical successful artifact. The audit run is never marked failed because of a rollup failure.rollupMode(None/Rollup/RollupPlusRaw) andprocessorMode(None/CopilotInteraction/M365Bundle). On-Resume, the original rollup intent is restored automatically; if the resume command line passes a rollup switch explicitly, last-write-wins (override logged in yellow).-Rollupis compatible with-IncludeAgent365Infoand never deletes theAgent365_<timestamp>.csv— Analytics-Hub dashboards consume it as a companion input alongside the rollup output.See Rollup Post-Processor:
-Rollup/-RollupPlusRaw(v1.11.1) below for the full feature matrix, blocked combinations, and examples.Microsoft Agent 365 Catalog Enrichment (
-IncludeAgent365Info/-OnlyAgent365Info)A pair of new switches —
-IncludeAgent365Info(audit run + Agent 365 enrichment) and-OnlyAgent365Info(Agent 365 enrichment only) — produce a dedicatedAgent365_<timestamp>.csv(orAgents365Excel tab) whose 28-column schema matches the manual Agent 365 dashboard export. Data is sourced from the Microsoft Graph Agent Package Management API (https://graph.microsoft.com/beta/copilot/admin/catalog/packages). Available to tenants enrolled in the Microsoft Agent 365 Frontier program; signed-in caller must hold AI Administrator (preferred) or Global Administrator.Remote Output Destinations — SharePoint & Microsoft Fabric / OneLake (
-OutputPathSP/-OutputPathFabric)Two new mutually-exclusive parameters extend
-OutputPath(local directory) with first-class remote destinations so PAX can publish directly into a SharePoint document library or a Microsoft Fabric Lakehouse without an intermediate local copy.-OutputPathSP <SharePointFolderUrl>— Uploads every customer-visible artifact (CSV, XLSX, run log, metrics JSON) directly to a SharePoint Online document-library folder via Microsoft Graph (createUploadSessionfor files >4 MiB,PUT /contentfor small files). Folder hierarchy is created server-side if missing. RequiresSites.ReadWrite.All+Files.ReadWrite.Allon the same identity used for the audit phase.-OutputPathFabric <OneLakeUrl>— Uploads to a Fabric Lakehouse / WarehouseFilespath via the OneLake DFS REST surface (ADLS Gen2 create → append → flush). Requires Azure RBACStorage Blob Data Contributoron the workspace plus Fabric portalContributormembership; for service-principal / managed-identity runs the tenant setting "Service principals can use Fabric APIs" must be enabled.exit 1— no partial artifacts, no stack trace.$env:TEMP\PAX_<RunTimestamp>\) PAX uses internally is never surfaced to the customer..pax_checkpoint_<RunTimestamp>.json,*_PARTIAL.csv,.pax_incremental/*.jsonl) are always written to the local scratch folder and are never mirrored remotely.-Resumeis a same-host operation — re-run from the same machine that produced the checkpoint. Only customer-visible final artifacts upload at end of run.Managed-Identity Authentication for Azure-Hosted Runs (
-Auth ManagedIdentity)New sixth value on the
-AuthValidateSet for Azure-hosted headless execution (Container Apps Jobs, Functions, App Service, VMs). Supports system-assigned and user-assigned identities (the latter viaAZURE_CLIENT_ID) and binds both the Microsoft Graph and Azure (storage) contexts to the same identity, so a single managed identity drives both the audit pull and the Fabric upload. Failures (missing identity, missing consent, IMDS unreachable) exit cleanly with no interactive fallback.-IncludeAgent365Infoand-OnlyAgent365Infoare blocked under ManagedIdentity (no interactive sign-in surface for the Agent 365 delegated-only API).Reliability & Authentication Hardening
-IncludeM365Usageor DSPM bundles).AADSTS70002,invalid handle) and remove silent fallback to interactive sign-in in unattended scheduled-task scenarios.What's New
v1.11.2
Unified Destination Model and Per-Stream Append Targets (v1.11.2)
-OutputPathSP/-OutputPathFabricwith a symmetric per-data-type destination + append switch pair for each output stream. Storage tier is inferred from each path's form, so the same switch surface targets Local, SharePoint, and Fabric destinations interchangeably.-OutputPath/-AppendFile(Purview audit),-OutputPathUserInfo/-AppendUserInfo(EntraUsers / MAC licensing),-OutputPathAgent365Info/-AppendAgent365Info(Microsoft Agent 365 catalog — currently gated; see below),-OutputPathLog(run log; no paired append switch).C:\Exports\Foo.csvorC:\Exports\) → Local.https://...sharepoint.com/...URL → SharePoint.https://...onelake.dfs.fabric.microsoft.com/...Lakehouse/...URL → Fabric. UNC paths (\\server\share\...) are rejected on every destination switch.-OutputPathLogis the one minor exception in that it accepts a FabricFiles/target even when the data destinations are Tables/* paths (logs are not tabular data).-OutputPath*/-Append*switch pair must be supplied; both bound or neither bound is rejected. Out-of-scope streams (e.g. UserInfo when neither-IncludeUserInfonor-OnlyUserInfois set) reject either side being bound.-OutputPath*value may be a folder (the script auto-defaults the basename) or a full path including basename.-OutputPathSP <url>→-OutputPath <url>(and/or-OutputPathUserInfo/-OutputPathAgent365Info/-OutputPathLog).-OutputPathFabric <url>→-OutputPath <url>(same pattern).Cross-Run Append and Merge Behavior (v1.11.2)
-AppendFileextended to all rollup modes (-Rollup,-RollupPlusRaw). New-AppendUserInfoand-AppendAgent365Infoswitches add per-dimension append to the EntraUsers and Agent 365 catalog outputs. Behavior identical across Local, SharePoint, and Fabric tiers.RecordId. Event-level exploded CSV (-AppendFile -RollupPlusRaw, 153-column shape) →RecordId. CopilotInteraction rollup Fact CSV (-AppendFile -Rollup) →Message_Id_Raw. EntraUsers CSV →PersonId_Normalized. M365 Bundle rollup CSVs → native (sum / min / max aggregates are associative; processor pre-seeds its accumulator). Agent 365 catalog →AgentId.Date_Added(YYYY-MM-DD, immutable after first write),Latest_Append_Date(YYYY-MM-DD, updated on every append),In_Latest_Append(TRUE/FALSE).Message_Id_Raw,ThreadId_Raw) so the per-runMessage_Id/ThreadIdinteger surrogates remain stable across appends. TheUserKeysurrogate stays stable without a corresponding raw column on the Fact CSV — the processor loads{PersonId_Normalized → UserKey}from the merged Users CSV at fact-write time.Retained=X New=Y Departed=Z Union=W.Retained= rows present in both the target and this run (target'sDate_Addedpreserved, last-write-wins on other columns,In_Latest_Append=TRUE).New= rows present only in this run (assignedDate_Added = <run date>,In_Latest_Append=TRUE).Departed= rows present only in the target (retained verbatim in the merged file withIn_Latest_Append=FALSE; the merge is union-only — rows are never dropped).Union= the merged total.-AppendUserInfo, the EntraUsers CSV the audit phase writes is the pristine raw snapshot for that run;Merge-UsersCsvreads it alongside the-AppendUserInfotarget and writes the union to the-AppendUserInfopath. On non-rollup paths the pristine raw is removed after the merge succeeds so the destination folder holds a single EntraUsers CSV at run-end. Rollup paths defer cleanup to the rollup post-processor's existing raw-retention logic (-Rollupdeletes;-RollupPlusRawretains).Raw Purview CSV: <path>andRaw EntraUsers CSV: <path>; after the merge completes anAppended to: <url>line is emitted for each merged stream, immediately preceded by the merge-statistics line.Microsoft Fabric Lakehouse Delta-Table Output (v1.11.2)
-OutputPath*value resolves to a Fabric OneLake URL. Customer-visible CSV outputs are written as Delta tables under the LakehouseTables/namespace; operational artifacts (run log, metrics JSON) and non-tabular artifacts land underFiles/.https://<region>-onelake.dfs.fabric.microsoft.com/<workspace>/<lakehouse>.Lakehouse), the explicit…/Tablessuffix (legacy non-Schemas Lakehouse),…/Tables/<schema>(Schemas-mode Lakehouse — current Fabric default, typicallydbo), and…/Files/...(for-OutputPathLogand non-tabular artifacts). The<schema>segment must match[A-Za-z_][A-Za-z0-9_]*; malformed schema segments are rejected at parameter validation._YYYYMMDD_HHMMSSrun-timestamp stripped. Tables are therefore evergreen (the same table is overwritten run after run) while the CSV filenames continue to carry the timestamp suffix. Example:Purview_Audit_UsageActivity_CopilotInteraction_<ts>.csv→ tablePurview_Audit_UsageActivity_CopilotInteraction.-AppendFile,-AppendUserInfo,-AppendAgent365Info) read the existing table into a scratch CSV via thedeltalakePython library, feed it to the embedded processor as a seed input, and write the merged result back as a Delta overwrite. The merge logic is identical between local CSV and Fabric Delta destinations.schema_mode='merge'so dynamic columns produced by-ExplodeDeep(theCopilotEventData.*parent-key namespace) are absorbed as new nullable columns on subsequent appends. A schema-mode mismatch across runs into the same target table (for example, Standard 8-column vs.-ExplodeArrays153-column) is detected at pre-flight and rejected with a clear error before any audit query is issued.,;{}()\n\t=in column names. At Delta-write time only (the on-disk CSV is untouched), each forbidden character is replaced with_, with numeric disambiguation on collision. Example:Has license→Has_licenseandLicense Status→License_Statusin the Delta table; the CSV consumed by the PBIP semantic model keeps the original names..pax_checkpoint_<RunTimestamp>.json,.pax_incremental/*.jsonl,*_PARTIAL.csv) are mirrored to a durable OneLake path at<Lakehouse>/Files/.pax_resume/<RunTimestamp>/. Working copies stay on container-local temp for fast I/O; after every checkpoint write the script uploads the artifact set to the mirror inside the same atomic block. On startup an in-progress run is detected by scanning the mirror and the local working copy is hydrated from it before any work begins; on successful completion the mirror is deleted. Local and SharePoint tiers do not mirror — their resume artifacts live in$PSScriptRootonly.Refresh-FabricTokenIfNeededso no independent authentication is needed inside thedeltalakelibrary.deltalakeauto-installdeltalake>=0.15Python package is verified on first use; if absent, a quiet per-userpip installruns once. On failure a clear actionable error is emitted. Offline / locked-down hosts can pre-install the package manually and PAX will skip the install step.-ExportWorkbook-OutputPath*value that resolves to Fabric while-ExportWorkbookis set is rejected at parameter validation.Checkpoint Schema and Resume Behavior (v1.11.2)
The checkpoint snapshot persists the new destination and append fields (
outputPath,outputPathUserInfo,outputPathAgent365Info,outputPathLog,appendFile,appendUserInfo,appendAgent365Info,rollup,rollupPlusRaw, and the feature-flag context the resumed run uses for path resolution) plus new compatibility metadata (checkpointSchemaVersion,compatibilityMinimumVersion,createdByVersion,createdUtc,checkpointType). A resumed run requires only-Resume "<full checkpoint path>"(plus optional auth overrides); the resume command line rejects any destination switch, so the checkpoint is the sole source of truth. The destination-pair XOR validation and the parse-time tier-inference / path-validation pipeline are re-run against the restored values on every resume so SharePoint and Fabric tiers re-engage the correct upload paths. Banners, parameter snapshots, output-files / log-file display lines, and date-range display are patched in place from the restored values after Read-Checkpoint completes so the displayed run identity always reflects the original run's intent rather than the resume command line's parse-time defaults. The legacyincludeDSPMForAIfield is ignored with a one-line warning when reading v1.11.1 checkpoints; v1.11.2 checkpoints opened by v1.11.1 are rejected by the existing checkpoint-version guard.Fabric / ACA Deployment Helpers —
fabric_resources/(v1.11.2)A new top-level folder shipped alongside the script. Contents:
fabric_resources/README.mdfabric_resources/CompatibilityMatrix.mdfabric_resources/LocalRun/README.mdabfss://is not accepted); verification checklist.fabric_resources/Dockerfile/PAX.DockerfilePSModulePath; durable bootstrap-log volume/pax-logs; optional build-time supply-chain verification of the released script.fabric_resources/Deploy/Deploy-PAXAcaJob.ps1$LASTEXITCODEguards on every mutatingazcall.fabric_resources/Deploy/README.mdfabric_resources/Prereqs/Grant-PAXPermissions.ps1-IncludeM365Usageto add the workload-specific scopes. Idempotent re-runs stay silent; real failures surface clearly.Switch Surface Simplification (v1.11.2)
A focused streamlining pass retires several optional features whose real-world adoption was narrow but whose code paths added a disproportionate amount of script complexity, test surface, and documentation overhead. Sharpening PAX around the workflows the majority of customers actually run leaves a smaller, more readable codebase, a cleaner first-time-user experience, and a faster path forward for the features customers depend on most. Each retired feature is summarized below with its replacement path and the reasoning behind retirement.
-IncludeDSPMForAI(and the-DSPMOutputModeselector)-ActivityTypes.-ActivityTypesthat obscured what was actually being queried. Direct-ActivityTypesinvocation makes audit scope explicit and self-documenting.-ExplodeArrays,-ExplodeDeep,-ExplosionThreads-ExportWorkbook(including the Excel append / multi-tab modes)ImportExcelmodule dependency, a separate file-naming scheme, tab-collision handling, and append-time structural-error recovery that few consumers exercised. CSV is the more portable, more performant, and more analyst-friendly default.-RAWInputCSV-OutputPathSP,-OutputPathFabric-OutputPath. The storage tier is inferred from the URL form (local folder vs.*.sharepoint.comURL vs.onelake.dfs.fabric.microsoft.comURL).-OutputPathwhose target is inferred from the path shape. One switch, one mental model, three destinations.-IncludeAgent365Info,-OnlyAgent365Info,-OutputPathAgent365Info,-AppendAgent365InfoDefault-value change: The legacy
-OutputPathdefault ofC:\Temp\is removed.-OutputPathis required for normal runs and may be omitted only when-OnlyUserInfois used (in which case-OutputPathUserInfocarries the EntraUsers destination).Checkpoint compatibility: Legacy fields for retired switches (
includeDSPMForAI,explodeArrays,explodeDeep,explosionThreads,rawInputCSV,exportWorkbook,outputPathSP,outputPathFabric,includeAgent365Info,onlyAgent365Info,outputPathAgent365Info,appendAgent365Info) are ignored with a single one-line warning when reading a v1.11.1 checkpoint. v1.11.2 checkpoints opened by v1.11.1 are rejected by the existing checkpoint-version guard.v1.11.1
Microsoft Agent 365 Catalog Enrichment:
-IncludeAgent365Info/-OnlyAgent365Info(v1.11.1)https://graph.microsoft.com/beta/copilot/admin/catalog/packages).-IncludeAgent365Info(audit run + Agent 365) or-OnlyAgent365Info(Agent 365 only — skips the audit pull).Agent365_<timestamp>.csv— 28 columns matching the manual export schema, UTF-8 with BOM, dates formattedyyyy-MM-dd HH:mm:ssZ(UTC).Agents365tab appended after theEntraUserstab when-ExportWorkbookis used.Date createdandCreated bycolumns are populated via a single narrow audit query through the existing/security/auditLog/queriesinfrastructure (~5–30 seconds added per run, independent of tenant size). In-OnlyAgent365Infomode these two columns are intentionally left blank.Authentication Behavior — Agent 365 (v1.11.1)
The Agent 365 catalog endpoint requires two independent permission gates, both of which must be satisfied:
CopilotPackages.Read.AllandApplication.Read.All. The endpoint is delegated only — there is no app-only equivalent.WebLogin,DeviceCode,Credential,Silent)-Auth AppRegistration+-IncludeAgent365Info(dual-context run)-Auth AppRegistration+-OnlyAgent365InfoFrontier Enrollment Probe (v1.11.1)
PAX performs an eager Frontier enrollment / role probe at startup. Tenants not enrolled in the Microsoft Agent 365 Frontier program (or callers without AI Administrator / Global Administrator role) receive an informational banner with a Microsoft Learn URL. The Agent 365 CSV / tab is silently skipped at end of run; the rest of the run continues unaffected.
All output banners, parameter snapshots, output-mode displays, Excel tab-list builders, and run-completion summaries honor the probe result — so unavailable tenants never see references to a file or tab that won't be produced.
Parameter Compatibility — Agent 365 (v1.11.1)
-IncludeAgent365Infoand-OnlyAgent365Infoare mutually exclusive.-RAWInputCSV(replay mode) and-UseEOM.-IncludeAgent365InfoIS compatible with-Resume. The checkpoint persists the flag, so the audit phase resumes from the checkpoint and Agent 365 enrichment runs automatically at end of run — no need to specify the switch again on the resume command line.-IncludeUserInfois allowed with both new switches (still emits theEntraUsersCSV alongside).-OnlyAgent365Infois rejected when combined with any of:-IncludeM365Usage,-IncludeCopilotInteraction,-IncludeDSPMForAI,-AgentsOnly,-ExcludeAgents,-CombineOutput,-OnlyUserInfo,-AppendFile,-Auth AppRegistration,-Resume.Permissions Display Banner (v1.11.1)
The startup
QUERY MODE: Microsoft Graph Security APIbanner now reports the effective auth context for the run asAPP-ONLY (application permissions),DELEGATED (interactive user sign-in), orDUAL-CONTEXT RUN(when-Auth AppRegistrationis combined with-IncludeAgent365Info, with explicit Phase 1 / Phase 2 sub-lines).Each Graph permission line is tagged with one of
[App-only],[Delegated], or[Role]so the reader knows exactly where to grant it. The Agent 365 block always uses[Delegated]Graph scopes plus a[Role]line listing AI Administrator (preferred) / Global Administrator (alternative). The connection-success message and the AppRegistration + Agent 365 informational banner use matching Phase 1 / Phase 2 vocabulary.Reliability — Audit-Query Poll 4-Hour Ceiling (v1.11.1)
The audit-query polling loop has been extended from a 5-minute hard timeout to a 4-hour ceiling with periodic heartbeat status messages and exponential backoff between polls.
Large-tenant audit queries — especially with
-IncludeM365Usageor DSPM bundles — that previously failed with a premature "query timed out" error now run to completion. No behavior change for fast queries.Rollup Post-Processor:
-Rollup/-RollupPlusRaw(v1.11.1)-Rollup(deletes raw CSV(s) on processor success — only rollup output remains) or-RollupPlusRaw(keeps raw CSV(s) alongside the rollup output). Mutually exclusive.-ActivityTypes 'CopilotInteraction') → embeddedPurview_CopilotInteraction_Processorv3.0.0;-IncludeUserInfois auto-enabled because this processor consumes both the Purview CSV and the Entra users CSV (EntraUsers_MAClicensing_<timestamp>.csv). Target Analytics-Hub dashboards: AI-in-One and AI Business Value.-IncludeM365Usagerun → embeddedPurview_M365_Usage_Bundle_Explosion_Processorv2.1.0;-CombineOutputis auto-enabled by-IncludeM365Usageso a single combined Purview CSV is fed to the processor. Target Analytics-Hub dashboard: M365 Usage Analytics.-IncludeAgent365Infois compatible with rollup. The resultingAgent365_<timestamp>.csvis a point-in-time snapshot of the live tenant catalog at the moment the script runs (sourced from the Microsoft Graph Package Management API/beta/copilot/admin/catalog/packages, a current-inventory call with no historical / as-of semantic). It is not filtered by-StartDate/-EndDate, returns all currently-cataloged items regardless of age (deleted items are not retrievable), and is consumed by the same Analytics-Hub dashboards as a companion input alongside the rollup output. Always retained —-Rollupnever deletes it. Created / Created By columns are populated via a separate Unified Audit Log join bounded by tenant audit retention (180 days E3 / 1 year E5 / up to 10 years with Audit Premium) and the run's date window (default 30 days). Note the temporal mismatch: rollup data spans the audit window, Agent 365 reflects catalog state at run time..ps1, preserving the single-file distribution. At runtime the selected source is materialized to a tempPAX_<Label>_<RunTimestamp>.pyinside.pax_incremental(UTF-8 no-BOM), executed, and deleted in the function'sfinallyblock. The end-of-run cleanup sweep and the outerfinally-block safety-net both reap any leftoverPAX_*_<RunTimestamp>.pyscoped to the current run.python→py -3.13/-3.12/-3.11/-3.10→python3). If none is found, PAX attempts a per-user silent install of Python 3.13 — winget (Python.Python.3.13) first, then the python.org offline installer (https://www.python.org/ftp/python/3.13.1/python-3.13.1-amd64.exe) as a fallback.orjsonis installed best-effort for ~5–10× faster JSON parsing; both processors fall back to stdlibjsonon import failure.-Rollupvs-RollupPlusRaw). It does NOT throw past this point and does NOT mark the audit run as failed — the raw CSV(s) already on disk are the canonical successful artifact.rollupMode(None/Rollup/RollupPlusRaw) andprocessorMode(None/CopilotInteraction/M365Bundle). On-Resume, the original rollup intent is restored when the resume command line does not pass-Rollupor-RollupPlusRaw; if the resume command line DOES pass a rollup switch, last-write-wins (resume value overrides checkpoint, override logged in yellow). The processor mode is re-derived after the full resume merge so a checkpoint-restored-IncludeM365Usagecorrectly maps toM365Bundleeven when the resume CLI omits it.-Rollupor-RollupPlusRawis combined with any of:-UseEOM,-ExportWorkbook,-OnlyUserInfo,-OnlyAgent365Info,-IncludeDSPMForAI,-RAWInputCSV,-AppendFile, or-ExcludeCopilotInteractionwithout-IncludeM365Usage.-UseEOMis set (previously the guard exempted-RAWInputCSVand-Resume). Rollup is incompatible with both-UseEOMand-RAWInputCSV, so the new guard correctly forces PS 7+ for any rollup run.Remote Output Destinations:
-OutputPathSP/-OutputPathFabric(v1.11.1)Filespath — no intermediate local copy required.-OutputPathSP <SharePointFolderUrl>or-OutputPathFabric <OneLakeUrl>. Exactly one of-OutputPath,-OutputPathSP,-OutputPathFabricmay be specified per run; mutually exclusive at parameter validation.https://<tenant>.sharepoint.com/sites/<site>/<library>[/<sub>...]. Resolved at startup to a Graph drive item; missing folder segments are created server-side.https://[<region>-]onelake.dfs.fabric.microsoft.com/<workspace>/<item>.Lakehouse/Files[/<sub>...](Lakehouse or Warehouse). Usesx-ms-version: 2021-06-08against the ADLS Gen2 DFS surface.Sites.ReadWrite.All+Files.ReadWrite.Allon the auth identity.Storage Blob Data Contributoron the workspace + Fabric portalContributormembership. Service-principal / managed-identity runs additionally require the tenant setting "Service principals can use Fabric APIs".Connect-PurviewAudit, before any audit query is issued. Resolves the URL, verifies reachability, creates the destination folder hierarchy server-side for SharePoint. Failures abort with a single structured Cause / Action banner classified by HTTP status (401 / 403 / 404), auth context (delegated vs app-only), and destination class (workspace RBAC vs Fabric portal role vs IMDS unreachable). No stack trace, no partial artifacts.Refresh-FabricTokenIfNeeded,Invoke-FabricWebRequest) mirrors the existing Graph token-refresh design. Proactive refresh ≥50 minutes age (below 60-minute issuance), 5-minute expiry buffer, single transparent 401 retry. Long-running Fabric uploads (multi-hour audit windows,-IncludeM365Usagebundles) no longer fail mid-stream.$script:OutputDirectoryis transparently redirected to a per-run scratch folder under$env:TEMP\PAX_<RunTimestamp>\so all existing local-write code paths work unmodified. Each artifact uploads immediately after the local writer closes the handle. Scratch folder is removed on successful completion; failed runs preserve it for diagnostics..pax_checkpoint_<RunTimestamp>.json,*_PARTIAL.csv,.pax_incremental/*.jsonl) are always written to the local scratch folder and are never mirrored remotely.-Resumeis a same-host operation — re-run from the same machine that produced the checkpoint. Only customer-visible final artifacts upload at end of run.fabric_resourcesfolder distributed alongside the script for the Azure Container Apps Job runbook (Dockerfile, deployment templates, permission scripts, README) and step-by-step Fabric / OneLake configuration.-Auth ManagedIdentity(v1.11.1)New sixth value on the
-AuthValidateSet for Azure-hosted headless execution.Connect-MgGraph -Identity(system-assigned) orConnect-MgGraph -Identity -ClientId $env:AZURE_CLIENT_ID(user-assigned). The same Graph application permissions required by-Auth AppRegistration(AuditLogsQuery.Read.All,Directory.Read.All, etc.) must be consented to the managed-identity service principal viaNew-MgServicePrincipalAppRoleAssignment.Connect-AzAccount -Identity(system-assigned) orConnect-AzAccount -Identity -AccountId $env:AZURE_CLIENT_ID(user-assigned). Used when-OutputPathFabricis in effect.AZURE_CLIENT_IDswitches both Graph and Az connect calls to the user-assigned identity automatically. The identity binding is logged at startup.(managed identity)qualifier when no UPN is available on the Graph context.-OnlyAgent365Info(Agent Package Management API is delegated-only) and-IncludeAgent365Info(no interactive sign-in surface for the dual-context Phase 2 step). Validation emits explicit error messages.fabric_resourcesfor the Azure Container Apps Job Dockerfile, deployment script, permission-grant script, and README for unattended scheduled runs.CSV Filename Convention (v1.11.1)
CSV output filenames now consistently identify the activity-type shape of the run.
-CombineOutput:Purview_Audit_UsageActivity_CombinedActivityTypes_<timestamp>.csvPurview_Audit_UsageActivity_<ActivityType>_<timestamp>.csv(e.g.Purview_Audit_UsageActivity_CopilotInteraction_20260511_191638.csv).Excel filenames, Excel tab names, and the
EntraUsers_*/Agent365_*filenames are unchanged. The rollup-input glob continues to match both patterns so-Rollup/-RollupPlusRawworkflows are unaffected.Bug Fixes
v1.11.2
(v1.11.2) Append-merge correctness across all storage tiers and rollup modes. Resolves several edge cases where
-AppendFile/-AppendUserInfocould either leave the customer's target untouched, replace it with a header-only file, or report inverted merge statistics. The streaming-merge fast path now reliably emits the canonicalRetained / New / Departed / Uniontally; resume runs that find no skipped partitions preserve in-flight JSONL streaming state; header-only emissions on zero-records runs are skipped under-AppendFileso the customer's target byte-shape is unchanged; header-only CSV emissions now match the quoted-everywhere shape of populated CSVs; the zero-records early-exit branch emits the pipeline summary inline; and the embedded Python rollup processor no longer drops source rows whoseMessage_Idis seeded from the-AppendFiletarget (the dedup belongs to the PowerShell post-merge step, not the rollup loop).(v1.11.2) Pristine-raw EntraUsers separation under
-AppendUserInfoproduces a single clean file at run-end. Resolves a build issue where a non-rollup-AppendUserInforun left two EntraUsers CSVs on disk (the merged union at the target leaf and a spurious_raw-suffixed companion). The pristine raw is now removed after the merge succeeds (matching-AppendFile's single-file outcome); the_rawsuffix path is reserved for the genuine same-leaf collision case where a customer-supplied-AppendUserInfoleaf and the natural raw leaf coincide.(v1.11.2)
-OnlyUserInfo+-AppendUserInfoand-OnlyAgent365Info+-AppendAgent365Infoare valid combinations. An earlier v1.11.2 validation block rejected these pairings as mutually exclusive. The underlying writer code already supported them — the validator was the only blocker. The remaining-AppendFile+-Only*blockers (which do reflect a genuine logic conflict — theOnly*switches suppress audit-log retrieval, so there is no activity data to merge) keep their behavior with clearer error wording.(v1.11.2) Resume runs display the original run's parameters in startup banners and the parameter snapshot. The startup banner, parameter snapshot, output-files display, log-file line, authentication context, and date range now reflect the values restored from the checkpoint rather than the resume command line's parse-time defaults. The graceful-exit resume hint additionally includes the
-ClientSecretslot for AppRegistration auth (the secret is intentionally never persisted to the checkpoint, so the hint is the only place the customer is reminded to re-supply it on resume).(v1.11.2) Resume-mode wording distinguishes "QueryId reused from checkpoint" from a true cold start. Earlier builds emitted the same
WARNING: No partial data found … Will start fresh data collection.line in both cases, misleading operators into thinking PAX was re-issuing the Purview audit query when it was actually continuing to poll the original server-side query.(v1.11.2) Checkpoint persists the customer-supplied
-OutputPath, not the per-run scratch directory. Resolves a regression where resume runs that originally targeted SharePoint or Fabric reclassified themselves as Local at end-of-run and left every artifact in scratch. New-run and resume-run code paths now both source the persisted value from the parse-time / restore-time canonical destination map, not the scratch redirect.(v1.11.2)
-OutputPathinferred from the dominant in-scope stream when omitted. When the customer pinned a destination via-AppendFile/-AppendUserInfo/-AppendAgent365Infobut did not also pass-OutputPath, secondary artifacts (run log, rollup scratch shards, embedded-processor temp files) previously leaked into the script's own folder ($PSScriptRoot). They now follow the dominant Append target's folder, with the inferred path announced in a single INFO host line so operators can see where the run is staging.(v1.11.2) Run-log filename canonicalized on non-rollup
-AppendFileruns. The log filename now consistently follows thePurview_Audit_<currentRunTs>.logshape (matching non-append runs) instead of inheriting the AppendFile target's leaf or original timestamp. Subsequent-AppendFileruns against the same target no longer overwrite each other's logs, and the end-of-run "Output files created" listing correctly surfaces the run's log.(v1.11.2) Path display in banners and summary lines resolves to the actual customer-visible URL. The
OUTPUT DESTINATIONSbanner resolves every row (Purview audit, EntraUsers, Agent 365, run log) to a full file URL; post-streaming-merge summary URLs no longer carry a baked-in_PARTIAL.csvsuffix; rollup intermediate-CSV delete log lines show the actual local scratch path with an explicit(scratch only; not uploaded)qualifier; and the rollup seed-from URL surfaces the canonical customer-supplied path verbatim instead of a synthesized phantom URL.(v1.11.2) Local end-of-run "Output files created" listing surfaces appended targets. Append-mode runs that previously appeared to produce only the log file now list every in-place merge target with a trailing
[appended]marker so customers can distinguish newly-created files from in-place merge updates.(v1.11.2)
.pax_incremental/rollup seed JSONs reaped on every run. The post-rollup cleanup sweep now removes the per-run seed JSON files used by-Rollup -AppendFile/-Rollup -AppendUserInfoto pre-seed the embedded Python processor's surrogate-INT maps, so the.pax_incremental/directory itself is reliably removed at end of run regardless of whether the run used append seeds.(v1.11.2) Pre-rollup EntraUsers append-merge deferred when the Python rollup will redo it. Under
-AppendUserInfo -Rollup(or-AppendUserInfo -RollupPlusRaw) in CopilotInteraction mode, the PowerShell-side pre-rollup merge and the post-rollup Python-side merge were both writing the same target back-to-back with potentially inconsistent stats. The PowerShell-side merge is now skipped on this combination and replaced with a single info line; the post-rollup merge runs as the sole writer.(v1.11.2) Append/merge tally legend banner emitted once at run start. A new one-time
APPEND/MERGE TALLY LEGENDbanner explains theRetained / New / Departed / Unionvocabulary — especially theDepartedcount, which is naturally read as "rows removed" but in fact means "rows kept withIn_Latest_Append=FALSEbecause they did not surface in the current run's audit window." Gated on-Append*being bound so non-append runs see no extra output.(v1.11.2) Partition poll-loop network-message flood suppression. During oscillating connectivity (blip → recover → blip → recover), the recovery side of the partition poll loop emitted a green
[NET] Connectivity restored after <N> minutesbanner on every successful poll while the matching transient-issue line on the error side was throttled to silence by the existing 60-second guard. The recovery banner now uses the same throttling logic so sustained outages still print exactly one paired transient/recovered cycle while short blips are silently absorbed.(v1.11.2) Cosmetic log-output cleanups. The embedded M365 Bundle rollup summary uses plain-ASCII
->digraphs instead of the Unicode RIGHTWARDS ARROW (which mojibake'd to→on Windows hosts whose console code page defaulted to cp437 / cp1252). The graceful-exitPartitions: X/Y completestatus line is now emitted as a single log entry instead of fragmenting into multiple timestamped log entries under the script'sWrite-Hostproxy. TheSave-CheckpointToDiskhelper self-gates on$script:CheckpointEnabledso any caller forgetting the gate is a silent no-op rather than a wrong write. NewAssert-MetricsShapeandAssert-PartitionStatusEntryhelpers freeze the required-field contracts of the$script:metricsand per-partition$script:partitionStatusentries, asserted once at init.(v1.11.2) Inline merge derivation hardened against a PowerShell-7 parameter-set issue. A
Split-Path -LiteralPath … -Parentcall inside the non-rollup-AppendFilestreaming-merge inline merge block was unreachable in PowerShell 7 (the-Parentswitch is not defined on the-LiteralPathparameter set), causing the script to abort immediately after the streaming-merge writer landed rows in the_PARTIAL.csvscratch but BEFORE the inlineMerge-FactCsvcall ran. Replaced with[System.IO.Path]::GetDirectoryName($OutputFile)so the inline merge runs to completion on all three storage tiers.v1.11.1
The following authentication and certificate-handling fixes apply to
-Auth AppRegistrationflows:(v1.11.1) Ephemeral PFX key loading. PFX certificate loading now uses
X509KeyStorageFlags.EphemeralKeySetso the script never persists a private key to the local machine's user profile. Resolves failures in environments where the user account has no write access to the per-user MachineKeys folder.(v1.11.1) Certificate pinning for the run. The cached
X509Certificate2object is pinned on$script:scope and reused for all token refreshes within a run. TheDispose()call that previously lived in thefinallyblock — which invalidated the credential'sSafeCertContextand produced intermittentinvalid handletoken-refresh failures — has been removed. EphemeralKeySet ensures no on-disk artifacts to clean up.(v1.11.1) Token refresh certificate reuse. Token refresh now binds to the same
X509Certificate2instance acquired at initial connect, eliminating a class of intermittentAADSTS70002errors observed when MSAL re-resolved the cert by subject name.(v1.11.1) No interactive fallback under
-Auth AppRegistration. When AppRegistration auth fails (bad thumbprint, expired cert, missing tenant consent, etc.) the script now exits cleanly with a clear error rather than silently falling back to interactive browser sign-in. App-only runs are expected to be fully unattended — silent fallback masked misconfigurations and produced wrong-identity results in scheduled-task scenarios.(v1.11.1) App-only scope-warning suppression. Suppressed the spurious "the following scopes are not granted" warnings emitted by
Connect-MgGraphunder app-only auth, where scopes are not the relevant permission model (app roles are). Reduces log noise without altering behavior.(v1.11.1) Clean pre-flight failure exit for remote destinations. When the
-OutputPathSP/-OutputPathFabricpre-flight probe fails, the run aborts cleanly withexit 1immediately after the structured Cause / Action banner. No trailingScript failed: …line, no PowerShell stack trace, no_PARTIALartifact rename, no local or remote partial artifacts. The structured banner is the entire failure output.(v1.11.1) Eliminated duplicate upload-failure WARNINGs.
Invoke-OutputUploadno longer emits its own per-call failure WARNING; every caller (upload sweep, metrics, log file, checkpoint mirror) now handles its own messaging with caller-specific context. Previously each upload failure produced two log lines — a generic inner WARNING followed by the caller's own message.(v1.11.1) Managed-identity "Connected as" line no longer shows
$null. The startup banner'sConnected asline now falls back to the managed-identity client ID (with(managed identity)qualifier) when no UPN is available on the Graph context.(v1.11.1) Checkpoint and incremental-file cleanup on successful runs. Successful runs now reliably remove the
.pax_checkpoint_<RunTimestamp>.jsonfile and this run's.pax_incremental\Part*_<RunTimestamp>_*.jsonlfiles (the.pax_incrementaldirectory is also removed when empty). Three independent regressions were fixed: (1)Complete-CheckpointRunwas early-returning whenever the intermediate_PARTIAL.csvhad already been deleted by CSV-split or-ExportWorkbookpaths, orphaning the checkpoint — restructured so the missing partial file only skips the rename and checkpoint deletion always proceeds; (2) the JSONL cleanup wildcard (*_<RunTimestamp>_*records.jsonl) did not match per-page memory-flush files (Part{N}_<RunTimestamp>_qid-<QueryId>_<JobRunId>.jsonl) added in v1.10.7 — pattern broadened to*_<RunTimestamp>_*.jsonl, kept strictly per-run via the embedded run timestamp; (3) cleanup lived in the maintryblock, so any late-stage exception (Agent 365 phase, output summary) bypassed it — added an idempotent safety-net cleanup in thefinallyblock, gated on the same success criteria as the_PARTIALlog-rename. Per-run scoping is preserved end-to-end: only files matching the current run's timestamp are deleted.Known Considerations
v1.11.2
(v1.11.2) Storage tier inferred from each path's form: Drive-rooted absolute paths resolve to Local;
https://...sharepoint.com/...URLs resolve to SharePoint;https://...onelake.dfs.fabric.microsoft.com/...Lakehouse/...URLs resolve to Fabric. UNC paths (\\server\share\...) are rejected on every destination switch.(v1.11.2) Same-tier-per-run rule: Every
-OutputPath*value in a single invocation must resolve to the same storage tier.-OutputPathLogis the one exception in that it accepts a FabricFiles/target even when the data destinations are Tables/* paths (logs are not tabular).(v1.11.2) Destination switch pair XOR (per stream in scope): For each output stream in scope, exactly one of the
-OutputPath*/-Append*switch pair must be supplied; both bound or neither bound is rejected. Out-of-scope streams reject either side being bound.