Skip to content

Add Knowledge Graphs dashboard pages#2001

Merged
kevinschaper merged 9 commits intomainfrom
feature/upstream-kg-dashboard-pages
Jan 23, 2026
Merged

Add Knowledge Graphs dashboard pages#2001
kevinschaper merged 9 commits intomainfrom
feature/upstream-kg-dashboard-pages

Conversation

@kevinschaper
Copy link
Copy Markdown
Collaborator

@kevinschaper kevinschaper commented Dec 20, 2025

Add Knowledge Graphs Dashboard Pages

Summary

  • Add dynamic dashboard pages for upstream knowledge graphs (RTX-KG2, ROBOKOP, PrimeKG, etc.)
  • Index page with comparison table showing nodes, edges, and normalization rates
  • Detail pages with normalization pipeline funnels, knowledge source flow diagram, epistemic robustness scores, edge type validation, and ABox/TBox classification
  • Dynamic SQL generation for pipeline metrics via generate-release-sql.cjs
  • Consolidated SQL sources (kg_list, kg_summary, kg_categories, kg_predicates)

What's Changed

Before After
Hardcoded rtx-kg2_normalization.md, robokop_normalization.md Single dynamic [knowledge_graph].md
Per-KG SQL files (robokop_node_summary.sql, etc.) Consolidated kg_summary.sql, kg_pipeline_metrics.sql
Version extraction via Makefile env vars Version extraction from globals.yml at build time

Complexity Notes

1. RTX-KG2 Naming Mismatch

There's an inconsistency between config and BigQuery naming:

  • globals.yml uses rtx_kg2 (underscore)
  • BigQuery upstream_data_source uses rtx-kg2 (hyphen)

This is handled by an explicit mapping in the SQL generator:

// generate-release-sql.cjs:258
const GLOBALS_TO_BQ_NAME = {
  'rtx_kg2': 'rtx-kg2'  // globals.yml uses underscore, BigQuery uses hyphen
};

And when generating kg_versions.sql:

// generate-release-sql.cjs:337
const unionClauses = kgNames.map(kg => {
  // Use BigQuery name if different from globals.yml name
  const bqName = GLOBALS_TO_BQ_NAME[kg] || kg;
  return `  SELECT '${bqName}' as knowledge_graph, ...`;
});

2. KG Discovery via Table Pattern Matching

The script auto-discovers KG sources by pattern-matching BigQuery table names rather than hardcoding a list. New KGs are automatically picked up if their tables follow the naming convention:

// generate-release-sql.cjs:376-384
const query = `
  SELECT table_name
  FROM \`${projectId}.${datasetId}.INFORMATION_SCHEMA.TABLES\`
  WHERE (
    table_name LIKE '%_nodes_ingested%'
    OR table_name LIKE '%_nodes_transformed'
    OR table_name LIKE '%_nodes_normalized'
    OR table_name LIKE '%_edges_ingested%'
    OR table_name LIKE '%_edges_transformed'
    OR table_name LIKE '%_edges_normalized'
  )
`;

Table names are parsed to extract KG source, entity type, and stage:

// generate-release-sql.cjs:407-420
// Pattern: {kg_source}_{entity_type}_{stage}[_{version}]
// Examples: rtx_kg2_nodes_ingested_v2_10_0, robokop_edges_transformed

// Ingested tables (with version suffix)
const ingestedMatch = tableName.match(/^(.+?)_(nodes|edges)_ingested_(.+)$/);

// Transformed/normalized tables (no version suffix)
const otherMatch = tableName.match(/^(.+?)_(nodes|edges)_(transformed|normalized)$/);

3. Exclusion Lists (Repeated in Multiple Places)

Non-KG data sources are excluded to ensure the Knowledge Graphs section only shows upstream graph sources. This logic is repeated in:

SQL sources (kg_list.sql, kg_summary.sql):

WHERE upstream_data_source NOT IN (
  'disease_list',           -- EC Core Entity
  'drug_list',              -- EC Core Entity
  'ec_ground_truth',        -- Evaluation data
  'ec_clinical_trials',     -- Internal data
  'kgml_xdtd_ground_truth', -- Evaluation data
  'off_label'               -- Internal data
)

JavaScript generator (generate-release-sql.cjs:220):

const KG_EXCLUSIONS = [
  'disease_list',
  'drug_list',
  'ec_ground_truth',
  'ec_clinical_trials',
  'kgml_xdtd_ground_truth',
  'off_label'
];

Note: This duplication could be consolidated in a future refactor (e.g., single source of truth in a config file).

4. Version Extraction from globals.yml

KG versions are now extracted at build time from pipelines/matrix/conf/base/globals.yml and written to kg_versions.sql. This replaces the previous Makefile approach that parsed globals.yml via shell commands and passed versions as env vars.

// generate-release-sql.cjs:272-297
function extractKGVersions() {
  const globalsContent = fs.readFileSync(GLOBALS_YML_PATH, 'utf8');
  const globals = yaml.load(globalsContent);
  const dataSources = globals.data_sources || {};
  const kgVersions = {};

  for (const [sourceName, sourceConfig] of Object.entries(dataSources)) {
    // Only include known KG sources (not disease_list, drug_list, etc.)
    if (!KG_SOURCES.includes(sourceName)) {
      continue;
    }
    if (sourceConfig && sourceConfig.version !== undefined) {
      kgVersions[sourceName] = String(sourceConfig.version);
    }
  }
  return kgVersions;
}

The explicit KG source list ensures only actual upstream knowledge graphs get versions displayed:

// generate-release-sql.cjs:246-252
const KG_SOURCES = [
  'rtx_kg2',
  'robokop',
  'primekg',
  'spoke',
  'embiology'
];

Files Changed

Added

  • pages/Knowledge Graphs/index.md - Index page with comparison table
  • pages/Knowledge Graphs/[knowledge_graph].md - Dynamic detail page
  • sources/bq/kg_list.sql - KG listing with display names
  • sources/bq/kg_summary.sql - Aggregate metrics per KG
  • sources/bq/kg_categories.sql - Top category pairs per KG
  • sources/bq/kg_predicates.sql - Top predicates per KG

Removed

  • pages/normalization/robokop_normalization.md
  • pages/normalization/rtx-kg2_normalization.md
  • sources/bq/robokop_node_summary.sql
  • sources/bq/robokop_edge_summary.sql
  • sources/bq/rtx_kg2_node_summary.sql
  • sources/bq/rtx_kg2_edge_summary.sql

Modified

  • scripts/generate-release-sql.cjs - Added KG discovery and version extraction
  • Makefile - Removed version env vars, added new SQL targets
  • pages/normalization/index.md - Added link to Knowledge Graphs section

Implement dynamic dashboard pages for upstream knowledge graphs with:
- Index page showing all KGs with comparison table (nodes, edges, normalization rates)
- Detail pages with normalization pipeline funnels, knowledge source flow diagram,
  epistemic robustness scores, edge type validation, and ABox/TBox classification
- Dynamic SQL generation for pipeline metrics via generate-release-sql.cjs
- Consolidated SQL sources (kg_list, kg_summary, kg_categories, kg_predicates)

Removes old hardcoded KG-specific pages (rtx-kg2_normalization, robokop_normalization)
in favor of the new parameterized approach.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Handle missing node levels (e.g., unifiedNodes) gracefully by defaulting
to empty arrays instead of destructuring undefined values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 4, 2026

This pull request has been automatically marked as stale because it has had no activity in the last 14 days.

If this PR is still relevant:

  • Please update it to the latest main/master branch
  • Address any existing review comments
  • Leave a comment to keep it open

Otherwise, it will be closed in 2 weeks if no further activity occurs.

@github-actions github-actions Bot added the stale label Jan 4, 2026
Resolved conflict in generate-release-sql.cjs by keeping both:
- bqReleaseVersion (from feature branch) for dataset ID construction
- benchmarkVersion (from main) for major-releases-only filtering

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it has been stale for 90 days with no activity.

If you believe this is still relevant, please feel free to reopen it or create a new PR.

@github-actions github-actions Bot closed this Jan 19, 2026
kevinschaper and others added 4 commits January 20, 2026 10:56
- Replace LIKE '%kg%' with proper SPLIT/UNNEST pattern for upstream_data_source filtering
- Fix duplicate Step 5 comment (now Step 6) in generate-release-sql.cjs
- Add documentation for isMajorRelease function clarifying its purpose

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add generateKGVersionsSQL() to create kg_versions.sql from globals.yml
- Remove VITE_kg_versions JSON export from Makefile
- Update index.md to query kg_versions via SQL join instead of JSON parsing
- Add version display to Knowledge Graphs detail page header
- Add Version column to Knowledge Graphs index comparison table
- Update build/clean targets for kg_versions.sql

This removes the need for hardcoded version variables and makes KG
version data available through the same SQL query pattern as other
dashboard data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BigQuery returns 'rtx-kg2' (with hyphen) but globals.yml uses 'rtx_kg2'
(with underscore). This caused:
- Version JOIN to fail (no match)
- Display name to fall through to ELSE case showing 'Rtx-Kg2'

Fix:
- Add GLOBALS_TO_BQ_NAME mapping in generate-release-sql.cjs
- Update CASE statements in kg_list.sql and kg_summary.sql to match 'rtx-kg2'

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@kevinschaper kevinschaper marked this pull request as ready for review January 21, 2026 02:50
@kevinschaper kevinschaper requested a review from a team as a code owner January 21, 2026 02:50
@kevinschaper kevinschaper requested a review from piotrkan January 21, 2026 02:50
@kevinschaper kevinschaper requested review from JacquesVergine and removed request for piotrkan January 21, 2026 02:51
Copy link
Copy Markdown
Collaborator

@JacquesVergine JacquesVergine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks Kevin!

@kevinschaper kevinschaper merged commit ca1d62f into main Jan 23, 2026
10 checks passed
@kevinschaper kevinschaper deleted the feature/upstream-kg-dashboard-pages branch January 23, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants