Google Sheets → link map + Mermaid diagram + HTML viewer.
This Apps Script crawls one or more Drive folders for Google Sheets, extracts formulas that create dependencies between sheets (e.g., IMPORTRANGE, QUERY, VLOOKUP/HLOOKUP/XLOOKUP, INDEX, FILTER), deduplicates results, and produces:
- FormulaLinks: a clean, tabular list of links and formula types (with per-type range extraction)
- Flowchart: a Mermaid graph ready to paste anywhere
- flowchart_viewer.html: a Drive-hosted HTML page rendering the diagram (no extra deployment)
- Grouped_Tree: a simplified, unique “From Doc / From Sheet → To Sheet” table for quick reading
- Progress: timestamped run logs
Current extractor version referenced here: v4.8 (column-first scanning; internal sheet refs dropped for performance).
- Features
- How it works (data flow)
- Installation
- Configuration
- Usage
- What the script creates
- Code architecture & function reference
- Performance notes & limits
- Troubleshooting
- Security & privacy
- Contributing
- Roadmap
- License
- Scans many files safely: recursive crawl of selected Drive folders, with batch processing and progress logging.
- Column-first early-exit: inspects only the first formula per column (rows 1–7) to avoid scanning 1000 blank rows from array-formulas.
- Understands popular formulas:
IMPORTRANGE,QUERY,VLOOKUP/HLOOKUP/XLOOKUP,INDEX,FILTER. - Per-type range capture: best-effort extraction of the key range argument(s) for each function type.
- Deduplication: avoids bloat; keeps only unique edges/rows when building outputs.
- Mermaid graph: ready for GitHub, wikis, or Mermaid Live.
- HTML viewer: one-click Update Diagram generates/updates a Drive file and opens it in a modal, no web-app deployment required.
- Helper “tree” view: unique 3-column, human-friendly grouping (From Document / From Sheet / To Sheet).
graph TD
A[Drive Folders] -->|collectSheetsRecursive| B[File IDs]
B -->|batchProcessNext (BATCH_SIZE N)| C[Scan Sheets]
C -->|detect formula types & ranges| D[LinkBuffer (raw)]
D -->|dedupe & normalize| E[FormulaLinks (clean)]
E -->|edges| F[Flowchart (Mermaid code)]
F -->|cleanMermaid| G[flowchart_cleaned.mmd]
G -->|wrap HTML| H[flowchart_viewer.html]
E --> I[Grouped_Tree]
C --> J[Progress log]
- Internal sheet-to-sheet references are intentionally skipped in v4.8 to keep output small and avoid noise. External links (across files) and top-level functions are preserved.
-
Open the Google Sheet you’ll use as the controller.
-
Extensions ▸ Apps Script.
-
Create Code.gs and paste the extraction code (v4.8) plus the viewer add-on block:
- The extractor implements:
runFullFlowchart,initializeScan,batchProcessNext,buildFlowchart, and helpers. - The viewer add-on implements:
cleanMermaid,updateDiagramHtml, and the updatedonOpenmenu.
- The extractor implements:
-
Save. Reload the spreadsheet so the menu appears.
-
First run will ask for permissions (Drive + Spreadsheet access).
Tip: Keep all code in a single file (Code.gs) to avoid scoping mistakes. If you split files, ensure there are no duplicate function names or unbalanced braces.
At the top of the script:
const FOLDERS = [
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', // Folder ID 1
'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy' // Folder ID 2
];
// Empty FOLDERS → scans entire Drive (not recommended)
const BATCH_SIZE = 20; // files per batch
const PROG_SHEET = 'Progress';
const FLOW_SHEET = 'Flowchart';
const BUFFER_SHEET = 'LinkBuffer';- FOLDERS: paste Drive folder IDs you want scanned. The scan is recursive.
- BATCH_SIZE: tune if you hit timeouts; 20 is a good default.
-
Extract Formulas From the spreadsheet menu: Flowchart Tools ▸ Extract Formulas
- Scans queued files in batches, writes LinkBuffer, then FormulaLinks and Flowchart.
- Progress sheet logs every step.
-
Visualize (Hierarchical) Flowchart Tools ▸ Visualize (Hierarchical)
- Writes Grouped_Tree: unique
From Document, From Sheet, To Sheetrows (external only).
- Writes Grouped_Tree: unique
-
Update Diagram Flowchart Tools ▸ Update Diagram
- Cleans
Flowchart!A1Mermaid into ID-safe nodes, writes/overwrites flowchart_cleaned.mmd and flowchart_viewer.html in Drive, and shows the rendered graph in a modal. - You can open flowchart_viewer.html from Drive later; its file ID stays constant.
- Cleans
-
LinkBuffer: raw rows collected during scanning (13 columns).
-
FormulaLinks: deduplicated final table (same 13 columns):
From Doc, From Sheet, From RangeTo Doc, To Sheet, To RangeFormula TypesImportrange Range, Query Range, Lookup Range, Index Range, Filter Range, Combined Range
-
Flowchart: Mermaid code in cell A1 (raw, quoted node labels).
-
Grouped_Tree: simplified table without To Document (unique rows only).
-
flowchart_cleaned.mmd (Drive): cleaned Mermaid (ID safe).
-
flowchart_viewer.html (Drive): one-file viewer that renders the cleaned graph.
-
Progress: timestamped log of every action.
-
runFullFlowchart()Orchestrates the whole flow in one shot:initializeScan()→ loopbatchProcessNext()→buildFlowchart(). -
initializeScan()Clears state/sheets; collects file IDs from configured folders; creates a 1-minute time-based trigger for batching; kicks off first batch. -
batchProcessNext()Pops the next BATCH_SIZE file IDs, opens each spreadsheet, and scans column-first: rows 1–7 only, stopping at the first non-empty formula per column.- Detects formula types with regex:
IMPORTRANGE,QUERY,XLOOKUP/VLOOKUP/HLOOKUP,INDEX,FILTER. - Extracts key ranges per type into dedicated columns and a
Combined Rangecolumn. - Skips INTERNAL sheet references for performance.
- Buffers rows to LinkBuffer.
- Detects formula types with regex:
-
buildFlowchart()Deduplicates LinkBuffer → writes FormulaLinks & Flowchart (Mermaid). SetsextractionDone = true.
collectSheetsRecursive(folder, list)– DFS traversal of Drive folders for Google Sheets files.resolveTitle(id, cache)– resolves spreadsheet name by ID (cached).log(msg)– timestamped entry to Progress.
-
buildHierarchicalTree()Reads FormulaLinks; outputs unique[From Doc, From Sheet, To Sheet]to Grouped_Tree. RequiresextractionDone = true. -
cleanMermaid()Converts"Label" --> "Label"edges intoN1["Label"] --> N2["Label"](ID-safe). -
updateDiagramHtml()Wraps cleaned Mermaid in a tiny HTML page, upserts both Drive files, and shows the diagram in a modal. -
onOpen()Adds menu:- Extract Formulas →
runFullFlowchart - Visualize (Hierarchical) →
buildHierarchicalTree - Update Diagram →
updateDiagramHtml
- Extract Formulas →
-
Apps Script 6-min limit (per sync run). The batch trigger keeps work going minute-by-minute for large sets.
-
Google Sheets cell limit: 10,000,000 cells per spreadsheet. If you push too many rows to LinkBuffer or FormulaLinks, you may see:
“This action would increase the number of cells in the workbook above the limit of 10000000 cells.”
- Reduce BATCH_SIZE; prune FOLDERS; or archive older runs to another spreadsheet.
-
50,000 characters per cell: Mermaid blocks can hit this if you inline huge graphs. Use Update Diagram to view via HTML instead.
-
Array-formulas: The column-first + row-cap (
rows 1–7) prevents scanning hundreds/thousands of repeated formulas.
Menu item runs but nothing appears
- Reload the spreadsheet after saving the script (menus bind on open).
- Check Progress sheet for logs/errors.
ReferenceError: batchProcessNext is not defined
- You likely have duplicate function names or a missing brace (
}) above the function, causing parse failure. - Ensure the project outline lists functions at top level (no nested functions).
“Lost connection to the server” during large scans
- That’s the editor UI timing out. The time-based trigger will continue in the background.
- Keep an eye on Progress for confirmation.
Mermaid won’t render
- Use Update Diagram to generate flowchart_cleaned.mmd and flowchart_viewer.html (ID-safe version) and open the HTML.
I only want IMPORTRANGE
- Keep v4.8 as is (it already optimises for IMPORTRANGE), or set the non-IMPORTRANGE regex checks behind a flag.
-
The script reads file names, sheet names, formulas, and ranges in the folders you specify.
-
No data leaves your Google Workspace; the HTML viewer loads Mermaid from jsDelivr CDN.
-
The Drive files created (flowchart_cleaned.mmd & flowchart_viewer.html) inherit your Drive’s sharing defaults.
- To share publicly, change sharing on the HTML file to “Anyone with the link”.
-
Use feature branches; submit PRs with:
- A short problem statement
- Clear before/after notes
- Test notes (e.g., number of files scanned, time taken)
-
Keep regex changes minimal and well-commented.
- Optional toggle to include internal sheet references (with throttling)
- CSV/BigQuery export
- Per-cluster Mermaid export (split giant graphs into logical subgraphs)
- Unit tests for range parsers
Specify your chosen license here (e.g., MIT). If omitted, the repository is effectively “all rights reserved”.
-
IMPORTRANGE("docIdOrUrl","Sheet!A1:Z99")- Extracts: doc ID/URL +
toSheet+toRange
- Extracts: doc ID/URL +
-
QUERY(range, ...)→ captures first argument as Query Range -
XLOOKUP(lookup, table, ...),VLOOKUP,HLOOKUP→ captures the table/range argument -
INDEX(range, ...)→ captures first argument as Index Range -
FILTER(range, ...)→ captures first argument as Filter Range
These values populate the dedicated columns plus Combined Range in FormulaLinks.
-
Paste script into Code.gs (extractor + viewer block).
-
Set FOLDERS with the Drive folder IDs you want to scan.
-
Reload the spreadsheet.
-
Menu ▸ Flowchart Tools
- Extract Formulas → builds everything
- Update Diagram → generates/opens HTML viewer
- Visualize (Hierarchical) → writes Grouped_Tree
You’re set.