Skip to content

hitanshuac/Mermaid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Flowchart Link Extractor (Google Apps Script)

Google Sheets → link map + Mermaid diagram + HTML viewer. This Apps Script crawls one or more Drive folders for Google Sheets, extracts formulas that create dependencies between sheets (e.g., IMPORTRANGE, QUERY, VLOOKUP/HLOOKUP/XLOOKUP, INDEX, FILTER), deduplicates results, and produces:

  • FormulaLinks: a clean, tabular list of links and formula types (with per-type range extraction)
  • Flowchart: a Mermaid graph ready to paste anywhere
  • flowchart_viewer.html: a Drive-hosted HTML page rendering the diagram (no extra deployment)
  • Grouped_Tree: a simplified, unique “From Doc / From Sheet → To Sheet” table for quick reading
  • Progress: timestamped run logs

Current extractor version referenced here: v4.8 (column-first scanning; internal sheet refs dropped for performance).


Table of contents


Features

  • Scans many files safely: recursive crawl of selected Drive folders, with batch processing and progress logging.
  • Column-first early-exit: inspects only the first formula per column (rows 1–7) to avoid scanning 1000 blank rows from array-formulas.
  • Understands popular formulas: IMPORTRANGE, QUERY, VLOOKUP/HLOOKUP/XLOOKUP, INDEX, FILTER.
  • Per-type range capture: best-effort extraction of the key range argument(s) for each function type.
  • Deduplication: avoids bloat; keeps only unique edges/rows when building outputs.
  • Mermaid graph: ready for GitHub, wikis, or Mermaid Live.
  • HTML viewer: one-click Update Diagram generates/updates a Drive file and opens it in a modal, no web-app deployment required.
  • Helper “tree” view: unique 3-column, human-friendly grouping (From Document / From Sheet / To Sheet).

How it works (data flow)

graph TD
  A[Drive Folders] -->|collectSheetsRecursive| B[File IDs]
  B -->|batchProcessNext (BATCH_SIZE N)| C[Scan Sheets]
  C -->|detect formula types & ranges| D[LinkBuffer (raw)]
  D -->|dedupe & normalize| E[FormulaLinks (clean)]
  E -->|edges| F[Flowchart (Mermaid code)]
  F -->|cleanMermaid| G[flowchart_cleaned.mmd]
  G -->|wrap HTML| H[flowchart_viewer.html]
  E --> I[Grouped_Tree]
  C --> J[Progress log]
Loading
  • Internal sheet-to-sheet references are intentionally skipped in v4.8 to keep output small and avoid noise. External links (across files) and top-level functions are preserved.

Installation

  1. Open the Google Sheet you’ll use as the controller.

  2. Extensions ▸ Apps Script.

  3. Create Code.gs and paste the extraction code (v4.8) plus the viewer add-on block:

    • The extractor implements: runFullFlowchart, initializeScan, batchProcessNext, buildFlowchart, and helpers.
    • The viewer add-on implements: cleanMermaid, updateDiagramHtml, and the updated onOpen menu.
  4. Save. Reload the spreadsheet so the menu appears.

  5. First run will ask for permissions (Drive + Spreadsheet access).

Tip: Keep all code in a single file (Code.gs) to avoid scoping mistakes. If you split files, ensure there are no duplicate function names or unbalanced braces.


Configuration

At the top of the script:

const FOLDERS = [
  'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', // Folder ID 1
  'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'  // Folder ID 2
];
// Empty FOLDERS → scans entire Drive (not recommended)
const BATCH_SIZE   = 20;   // files per batch
const PROG_SHEET   = 'Progress';
const FLOW_SHEET   = 'Flowchart';
const BUFFER_SHEET = 'LinkBuffer';
  • FOLDERS: paste Drive folder IDs you want scanned. The scan is recursive.
  • BATCH_SIZE: tune if you hit timeouts; 20 is a good default.

Usage

  1. Extract Formulas From the spreadsheet menu: Flowchart Tools ▸ Extract Formulas

    • Scans queued files in batches, writes LinkBuffer, then FormulaLinks and Flowchart.
    • Progress sheet logs every step.
  2. Visualize (Hierarchical) Flowchart Tools ▸ Visualize (Hierarchical)

    • Writes Grouped_Tree: unique From Document, From Sheet, To Sheet rows (external only).
  3. Update Diagram Flowchart Tools ▸ Update Diagram

    • Cleans Flowchart!A1 Mermaid into ID-safe nodes, writes/overwrites flowchart_cleaned.mmd and flowchart_viewer.html in Drive, and shows the rendered graph in a modal.
    • You can open flowchart_viewer.html from Drive later; its file ID stays constant.

What the script creates

  • LinkBuffer: raw rows collected during scanning (13 columns).

  • FormulaLinks: deduplicated final table (same 13 columns):

    • From Doc, From Sheet, From Range
    • To Doc, To Sheet, To Range
    • Formula Types
    • Importrange Range, Query Range, Lookup Range, Index Range, Filter Range, Combined Range
  • Flowchart: Mermaid code in cell A1 (raw, quoted node labels).

  • Grouped_Tree: simplified table without To Document (unique rows only).

  • flowchart_cleaned.mmd (Drive): cleaned Mermaid (ID safe).

  • flowchart_viewer.html (Drive): one-file viewer that renders the cleaned graph.

  • Progress: timestamped log of every action.


Code architecture & function reference

Extraction core (v4.8)

  • runFullFlowchart() Orchestrates the whole flow in one shot: initializeScan() → loop batchProcessNext()buildFlowchart().

  • initializeScan() Clears state/sheets; collects file IDs from configured folders; creates a 1-minute time-based trigger for batching; kicks off first batch.

  • batchProcessNext() Pops the next BATCH_SIZE file IDs, opens each spreadsheet, and scans column-first: rows 1–7 only, stopping at the first non-empty formula per column.

    • Detects formula types with regex: IMPORTRANGE, QUERY, XLOOKUP/VLOOKUP/HLOOKUP, INDEX, FILTER.
    • Extracts key ranges per type into dedicated columns and a Combined Range column.
    • Skips INTERNAL sheet references for performance.
    • Buffers rows to LinkBuffer.
  • buildFlowchart() Deduplicates LinkBuffer → writes FormulaLinks & Flowchart (Mermaid). Sets extractionDone = true.

Helpers

  • collectSheetsRecursive(folder, list) – DFS traversal of Drive folders for Google Sheets files.
  • resolveTitle(id, cache) – resolves spreadsheet name by ID (cached).
  • log(msg) – timestamped entry to Progress.

Visual / frontend

  • buildHierarchicalTree() Reads FormulaLinks; outputs unique [From Doc, From Sheet, To Sheet] to Grouped_Tree. Requires extractionDone = true.

  • cleanMermaid() Converts "Label" --> "Label" edges into N1["Label"] --> N2["Label"] (ID-safe).

  • updateDiagramHtml() Wraps cleaned Mermaid in a tiny HTML page, upserts both Drive files, and shows the diagram in a modal.

  • onOpen() Adds menu:

    • Extract FormulasrunFullFlowchart
    • Visualize (Hierarchical)buildHierarchicalTree
    • Update DiagramupdateDiagramHtml

Performance notes & limits

  • Apps Script 6-min limit (per sync run). The batch trigger keeps work going minute-by-minute for large sets.

  • Google Sheets cell limit: 10,000,000 cells per spreadsheet. If you push too many rows to LinkBuffer or FormulaLinks, you may see:

    “This action would increase the number of cells in the workbook above the limit of 10000000 cells.”

    • Reduce BATCH_SIZE; prune FOLDERS; or archive older runs to another spreadsheet.
  • 50,000 characters per cell: Mermaid blocks can hit this if you inline huge graphs. Use Update Diagram to view via HTML instead.

  • Array-formulas: The column-first + row-cap (rows 1–7) prevents scanning hundreds/thousands of repeated formulas.


Troubleshooting

Menu item runs but nothing appears

  • Reload the spreadsheet after saving the script (menus bind on open).
  • Check Progress sheet for logs/errors.

ReferenceError: batchProcessNext is not defined

  • You likely have duplicate function names or a missing brace (}) above the function, causing parse failure.
  • Ensure the project outline lists functions at top level (no nested functions).

“Lost connection to the server” during large scans

  • That’s the editor UI timing out. The time-based trigger will continue in the background.
  • Keep an eye on Progress for confirmation.

Mermaid won’t render

  • Use Update Diagram to generate flowchart_cleaned.mmd and flowchart_viewer.html (ID-safe version) and open the HTML.

I only want IMPORTRANGE

  • Keep v4.8 as is (it already optimises for IMPORTRANGE), or set the non-IMPORTRANGE regex checks behind a flag.

Security & privacy

  • The script reads file names, sheet names, formulas, and ranges in the folders you specify.

  • No data leaves your Google Workspace; the HTML viewer loads Mermaid from jsDelivr CDN.

  • The Drive files created (flowchart_cleaned.mmd & flowchart_viewer.html) inherit your Drive’s sharing defaults.

    • To share publicly, change sharing on the HTML file to “Anyone with the link”.

Contributing

  • Use feature branches; submit PRs with:

    • A short problem statement
    • Clear before/after notes
    • Test notes (e.g., number of files scanned, time taken)
  • Keep regex changes minimal and well-commented.


Roadmap

  • Optional toggle to include internal sheet references (with throttling)
  • CSV/BigQuery export
  • Per-cluster Mermaid export (split giant graphs into logical subgraphs)
  • Unit tests for range parsers

License

Specify your chosen license here (e.g., MIT). If omitted, the repository is effectively “all rights reserved”.


Appendix: Formula & range detection (regex cheatsheet)

  • IMPORTRANGE("docIdOrUrl","Sheet!A1:Z99")

    • Extracts: doc ID/URL + toSheet + toRange
  • QUERY(range, ...) → captures first argument as Query Range

  • XLOOKUP(lookup, table, ...), VLOOKUP, HLOOKUP → captures the table/range argument

  • INDEX(range, ...) → captures first argument as Index Range

  • FILTER(range, ...) → captures first argument as Filter Range

These values populate the dedicated columns plus Combined Range in FormulaLinks.


Quick start (TL;DR)

  1. Paste script into Code.gs (extractor + viewer block).

  2. Set FOLDERS with the Drive folder IDs you want to scan.

  3. Reload the spreadsheet.

  4. Menu ▸ Flowchart Tools

    • Extract Formulas → builds everything
    • Update Diagram → generates/opens HTML viewer
    • Visualize (Hierarchical) → writes Grouped_Tree

You’re set.

About

Google Sheets → link map + Mermaid diagram + HTML viewer. This Apps Script crawls one or more Drive folders for Google Sheets, extracts formulas that create dependencies between sheets (e.g., IMPORTRANGE, QUERY, VLOOKUP/HLOOKUP/XLOOKUP, INDEX, FILTER), deduplicates results, and produces:

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors