(Disclaimer, I used AI to figure this out and generate the bug report.)
Bug report: Pivot cache generated by @protobi/exceljs triggers Excel repair
Summary
When generating a workbook with a Pivot Table using @protobi/exceljs, Excel opens the file with:
Repaired Records: PivotTable report from /xl/pivotCache/pivotCacheDefinition1.xml part (PivotTable cache)
The same script and data generated with exceljs opens cleanly.
Environment
- macOS
- Node/TypeScript script
- Workbook contains:
- a data worksheet
- a pivot worksheet
Repro
- Generate workbook using
exceljs and unzip output to temp/excel-orig.
- Generate workbook using
@protobi/exceljs and unzip output to temp/excel-protobi.
- Open both
.xlsx files in desktop Excel.
Expected
Both files open without repair prompts.
Actual
Only the @protobi/exceljs output triggers a repair for pivotCacheDefinition1.xml.
Findings from XML comparison
The pivot table definition is essentially the same between outputs (xl/pivotTables/pivotTable1.xml).
The key difference is pivot cache serialization:
exceljs writes many non-pivot cache fields in a lightweight form in pivotCacheDefinition1.xml, e.g.:
<sharedItems containsSemiMixedTypes="0" containsString="0" containsNumber="1" containsInteger="1" />
@protobi/exceljs writes fully enumerated <sharedItems count="..."> ... </sharedItems> for many fields, including very large text fields (Description / File Names / Dependency Roots).
Also:
exceljs cache records are mostly inline strings (<s v="..."/>) in pivotCacheRecords1.xml.
@protobi/exceljs cache records are index-based (<x v="..."/>) and rely on huge shared item tables.
This expanded shared-items representation appears to be what causes Excel to repair the pivot cache definition.
Why this seems to be a library issue
- Input data and pivot configuration are the same.
pivotTable1.xml is comparable across both outputs.
- Repair points specifically to pivot cache definition.
- Reducing non-used fields in
sharedItems to lightweight forms avoids repair in downstream normalization.
Minimal standalone repro script
This is a minimal script pattern (no project-specific data required):
import * as ExcelJS from '@protobi/exceljs';
async function main() {
const workbook = new ExcelJS.Workbook();
const dataSheet = workbook.addWorksheet('Data');
dataSheet.columns = [
{ header: 'Repo', key: 'repo', width: 20 },
{ header: 'Group', key: 'group', width: 20 },
{ header: 'Severity', key: 'severity', width: 20 },
{ header: 'Description', key: 'description', width: 80 },
{ header: 'Counter', key: 'count', width: 10 },
];
for (let index = 0; index < 1500; index++) {
dataSheet.addRow({
repo: `repo-${index % 50}`,
group: `group-${index % 100}`,
severity: ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'INFO'][index % 5],
description:
`long-text-${index}-` +
'x'.repeat(500 + (index % 200)),
count: 1,
});
}
const pivotSheet = workbook.addWorksheet('Summary');
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(pivotSheet as any).addPivotTable({
sourceSheet: dataSheet,
rows: ['Repo', 'Group'],
columns: ['Severity'],
values: ['Counter'],
metric: 'sum',
});
await workbook.xlsx.writeFile('repro-protobi.xlsx');
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
Open repro-protobi.xlsx in desktop Excel and inspect whether a repair prompt references pivotCacheDefinition.
Patch currently used as workaround
After writing the workbook to disk, we patch the generated .xlsx zip and normalize pivot cache definitions.
import AdmZip from 'adm-zip';
export const normalizeProtobiPivotCache = (filePath: string) => {
const zip = new AdmZip(filePath);
const entries = zip.getEntries();
const usedFieldIndexes = new Set<number>();
for (const entry of entries) {
if (!/^xl\/pivotTables\/pivotTable\d+\.xml$/.test(entry.entryName)) {
continue;
}
const pivotXml = entry.getData().toString('utf8');
const fieldRefRx = /<(?:field\s+x|dataField\s+fld)="(\d+)"/g;
let fieldRefMatch;
while ((fieldRefMatch = fieldRefRx.exec(pivotXml))) {
usedFieldIndexes.add(Number(fieldRefMatch[1]));
}
}
let modified = false;
for (const entry of entries) {
if (
!/^xl\/pivotCache\/pivotCacheDefinition\d+\.xml$/.test(entry.entryName)
) {
continue;
}
let cacheXml = entry.getData().toString('utf8');
let fieldIndex = -1;
const updated = cacheXml.replace(
/<cacheField([^>]*)>([\s\S]*?)<\/cacheField>/g,
(full, attrs, body) => {
fieldIndex += 1;
if (usedFieldIndexes.has(fieldIndex)) {
const normalizedBody = body.replace(
/<sharedItems([^>]*)>([\s\S]*?)<\/sharedItems>/g,
(_siFull: string, siAttrs: string, siBody: string): string => {
const itemCount =
(siBody.match(/<(?:s|n|b|d|e|m)(?:\s|\/>)/g) || []).length ||
(siBody.match(/<(?:s|n|b|d|e|m)\b/g) || []).length;
const attrsWithoutCount = siAttrs
.replace(/\s*count="[^"]*"/g, '')
.trim();
const nextAttrs = attrsWithoutCount
? ` ${attrsWithoutCount} count="${itemCount}"`
: ` count="${itemCount}"`;
return `<sharedItems${nextAttrs}>${siBody}</sharedItems>`;
}
);
return `<cacheField${attrs}>${normalizedBody}</cacheField>`;
}
const sharedItemsMatch = body.match(
/<sharedItems([^>]*)>([\s\S]*?)<\/sharedItems>/
);
if (!sharedItemsMatch) {
return full;
}
const sharedItemsBody = sharedItemsMatch[2] || '';
const hasBlank =
sharedItemsBody.includes('<s v="" />') ||
sharedItemsBody.includes('<s v=""/>');
const hasLongText = sharedItemsBody.length > 5000;
let newSharedItems = '<sharedItems/>';
if (hasLongText) {
newSharedItems = '<sharedItems longText="1"/>';
} else if (hasBlank) {
newSharedItems = '<sharedItems containsBlank="1"/>';
}
const normalizedBody = body.replace(
/<sharedItems[^>]*>[\s\S]*?<\/sharedItems>/,
newSharedItems
);
return `<cacheField${attrs}>${normalizedBody}</cacheField>`;
}
);
if (updated !== cacheXml) {
modified = true;
cacheXml = updated;
zip.updateFile(entry, Buffer.from(cacheXml, 'utf8'));
}
}
if (modified) {
zip.writeZip(filePath);
}
};
Suggested fix direction
When writing pivot cache definitions:
- Prefer lightweight
<sharedItems .../> for non-pivoted / high-cardinality / long-text fields.
- Avoid fully materializing massive shared item lists when not needed for row/column/value axes.
- Ensure generated cache definition is compatible with strict Excel parser behavior.
Notes on attachments
I cannot include sample workbook files in the report. If needed, I can share redacted XML snippets from the relevant pivot cache parts.
(Disclaimer, I used AI to figure this out and generate the bug report.)
Bug report: Pivot cache generated by
@protobi/exceljstriggers Excel repairSummary
When generating a workbook with a Pivot Table using
@protobi/exceljs, Excel opens the file with:Repaired Records: PivotTable report from /xl/pivotCache/pivotCacheDefinition1.xml part (PivotTable cache)The same script and data generated with
exceljsopens cleanly.Environment
Repro
exceljsand unzip output totemp/excel-orig.@protobi/exceljsand unzip output totemp/excel-protobi..xlsxfiles in desktop Excel.Expected
Both files open without repair prompts.
Actual
Only the
@protobi/exceljsoutput triggers a repair forpivotCacheDefinition1.xml.Findings from XML comparison
The pivot table definition is essentially the same between outputs (
xl/pivotTables/pivotTable1.xml).The key difference is pivot cache serialization:
exceljswrites many non-pivot cache fields in a lightweight form inpivotCacheDefinition1.xml, e.g.:<sharedItems containsSemiMixedTypes="0" containsString="0" containsNumber="1" containsInteger="1" />@protobi/exceljswrites fully enumerated<sharedItems count="..."> ... </sharedItems>for many fields, including very large text fields (Description / File Names / Dependency Roots).Also:
exceljscache records are mostly inline strings (<s v="..."/>) inpivotCacheRecords1.xml.@protobi/exceljscache records are index-based (<x v="..."/>) and rely on huge shared item tables.This expanded shared-items representation appears to be what causes Excel to repair the pivot cache definition.
Why this seems to be a library issue
pivotTable1.xmlis comparable across both outputs.sharedItemsto lightweight forms avoids repair in downstream normalization.Minimal standalone repro script
This is a minimal script pattern (no project-specific data required):
Open
repro-protobi.xlsxin desktop Excel and inspect whether a repair prompt referencespivotCacheDefinition.Patch currently used as workaround
After writing the workbook to disk, we patch the generated
.xlsxzip and normalize pivot cache definitions.Suggested fix direction
When writing pivot cache definitions:
<sharedItems .../>for non-pivoted / high-cardinality / long-text fields.Notes on attachments
I cannot include sample workbook files in the report. If needed, I can share redacted XML snippets from the relevant pivot cache parts.