[BUG] Pivot cache triggers Excel repair

(Disclaimer, I used AI to figure this out and generate the bug report.)

# Bug report: Pivot cache generated by `@protobi/exceljs` triggers Excel repair

## Summary
When generating a workbook with a Pivot Table using `@protobi/exceljs`, Excel opens the file with:

`Repaired Records: PivotTable report from /xl/pivotCache/pivotCacheDefinition1.xml part (PivotTable cache)`

The same script and data generated with `exceljs` opens cleanly.

## Environment
- macOS
- Node/TypeScript script
- Workbook contains:
  - a data worksheet
  - a pivot worksheet

## Repro
1. Generate workbook using `exceljs` and unzip output to `temp/excel-orig`.
2. Generate workbook using `@protobi/exceljs` and unzip output to `temp/excel-protobi`.
3. Open both `.xlsx` files in desktop Excel.

## Expected
Both files open without repair prompts.

## Actual
Only the `@protobi/exceljs` output triggers a repair for `pivotCacheDefinition1.xml`.

## Findings from XML comparison
The pivot table definition is essentially the same between outputs (`xl/pivotTables/pivotTable1.xml`).

The key difference is pivot cache serialization:

- `exceljs` writes many non-pivot cache fields in a lightweight form in `pivotCacheDefinition1.xml`, e.g.:
  - `<sharedItems containsSemiMixedTypes="0" containsString="0" containsNumber="1" containsInteger="1" />`
- `@protobi/exceljs` writes fully enumerated `<sharedItems count="..."> ... </sharedItems>` for many fields, including very large text fields (Description / File Names / Dependency Roots).

Also:
- `exceljs` cache records are mostly inline strings (`<s v="..."/>`) in `pivotCacheRecords1.xml`.
- `@protobi/exceljs` cache records are index-based (`<x v="..."/>`) and rely on huge shared item tables.

This expanded shared-items representation appears to be what causes Excel to repair the pivot cache definition.

## Why this seems to be a library issue
- Input data and pivot configuration are the same.
- `pivotTable1.xml` is comparable across both outputs.
- Repair points specifically to pivot cache definition.
- Reducing non-used fields in `sharedItems` to lightweight forms avoids repair in downstream normalization.

## Minimal standalone repro script
This is a minimal script pattern (no project-specific data required):

```ts
import * as ExcelJS from '@protobi/exceljs';

async function main() {
  const workbook = new ExcelJS.Workbook();

  const dataSheet = workbook.addWorksheet('Data');
  dataSheet.columns = [
    { header: 'Repo', key: 'repo', width: 20 },
    { header: 'Group', key: 'group', width: 20 },
    { header: 'Severity', key: 'severity', width: 20 },
    { header: 'Description', key: 'description', width: 80 },
    { header: 'Counter', key: 'count', width: 10 },
  ];

  for (let index = 0; index < 1500; index++) {
    dataSheet.addRow({
      repo: `repo-${index % 50}`,
      group: `group-${index % 100}`,
      severity: ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'INFO'][index % 5],
      description:
        `long-text-${index}-` +
        'x'.repeat(500 + (index % 200)),
      count: 1,
    });
  }

  const pivotSheet = workbook.addWorksheet('Summary');
  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  (pivotSheet as any).addPivotTable({
    sourceSheet: dataSheet,
    rows: ['Repo', 'Group'],
    columns: ['Severity'],
    values: ['Counter'],
    metric: 'sum',
  });

  await workbook.xlsx.writeFile('repro-protobi.xlsx');
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});
```

Open `repro-protobi.xlsx` in desktop Excel and inspect whether a repair prompt references `pivotCacheDefinition`.

## Patch currently used as workaround
After writing the workbook to disk, we patch the generated `.xlsx` zip and normalize pivot cache definitions.

```ts
import AdmZip from 'adm-zip';

export const normalizeProtobiPivotCache = (filePath: string) => {
  const zip = new AdmZip(filePath);
  const entries = zip.getEntries();

  const usedFieldIndexes = new Set<number>();

  for (const entry of entries) {
    if (!/^xl\/pivotTables\/pivotTable\d+\.xml$/.test(entry.entryName)) {
      continue;
    }

    const pivotXml = entry.getData().toString('utf8');
    const fieldRefRx = /<(?:field\s+x|dataField\s+fld)="(\d+)"/g;
    let fieldRefMatch;
    while ((fieldRefMatch = fieldRefRx.exec(pivotXml))) {
      usedFieldIndexes.add(Number(fieldRefMatch[1]));
    }
  }

  let modified = false;

  for (const entry of entries) {
    if (
      !/^xl\/pivotCache\/pivotCacheDefinition\d+\.xml$/.test(entry.entryName)
    ) {
      continue;
    }

    let cacheXml = entry.getData().toString('utf8');

    let fieldIndex = -1;
    const updated = cacheXml.replace(
      /<cacheField([^>]*)>([\s\S]*?)<\/cacheField>/g,
      (full, attrs, body) => {
        fieldIndex += 1;

        if (usedFieldIndexes.has(fieldIndex)) {
          const normalizedBody = body.replace(
            /<sharedItems([^>]*)>([\s\S]*?)<\/sharedItems>/g,
            (_siFull: string, siAttrs: string, siBody: string): string => {
              const itemCount =
                (siBody.match(/<(?:s|n|b|d|e|m)(?:\s|\/>)/g) || []).length ||
                (siBody.match(/<(?:s|n|b|d|e|m)\b/g) || []).length;
              const attrsWithoutCount = siAttrs
                .replace(/\s*count="[^"]*"/g, '')
                .trim();
              const nextAttrs = attrsWithoutCount
                ? ` ${attrsWithoutCount} count="${itemCount}"`
                : ` count="${itemCount}"`;
              return `<sharedItems${nextAttrs}>${siBody}</sharedItems>`;
            }
          );

          return `<cacheField${attrs}>${normalizedBody}</cacheField>`;
        }

        const sharedItemsMatch = body.match(
          /<sharedItems([^>]*)>([\s\S]*?)<\/sharedItems>/
        );

        if (!sharedItemsMatch) {
          return full;
        }

        const sharedItemsBody = sharedItemsMatch[2] || '';
        const hasBlank =
          sharedItemsBody.includes('<s v="" />') ||
          sharedItemsBody.includes('<s v=""/>');
        const hasLongText = sharedItemsBody.length > 5000;

        let newSharedItems = '<sharedItems/>';
        if (hasLongText) {
          newSharedItems = '<sharedItems longText="1"/>';
        } else if (hasBlank) {
          newSharedItems = '<sharedItems containsBlank="1"/>';
        }

        const normalizedBody = body.replace(
          /<sharedItems[^>]*>[\s\S]*?<\/sharedItems>/,
          newSharedItems
        );
        return `<cacheField${attrs}>${normalizedBody}</cacheField>`;
      }
    );

    if (updated !== cacheXml) {
      modified = true;
      cacheXml = updated;
      zip.updateFile(entry, Buffer.from(cacheXml, 'utf8'));
    }
  }

  if (modified) {
    zip.writeZip(filePath);
  }
};
```

## Suggested fix direction
When writing pivot cache definitions:
- Prefer lightweight `<sharedItems .../>` for non-pivoted / high-cardinality / long-text fields.
- Avoid fully materializing massive shared item lists when not needed for row/column/value axes.
- Ensure generated cache definition is compatible with strict Excel parser behavior.

## Notes on attachments
I cannot include sample workbook files in the report. If needed, I can share redacted XML snippets from the relevant pivot cache parts.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pivot cache triggers Excel repair #42

Bug report: Pivot cache generated by `@protobi/exceljs` triggers Excel repair

Summary

Environment

Repro

Expected

Actual

Findings from XML comparison

Why this seems to be a library issue

Minimal standalone repro script

Patch currently used as workaround

Suggested fix direction

Notes on attachments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Pivot cache triggers Excel repair #42

Description

Bug report: Pivot cache generated by @protobi/exceljs triggers Excel repair

Summary

Environment

Repro

Expected

Actual

Findings from XML comparison

Why this seems to be a library issue

Minimal standalone repro script

Patch currently used as workaround

Suggested fix direction

Notes on attachments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug report: Pivot cache generated by `@protobi/exceljs` triggers Excel repair