Nelson/aip 611 determine number of experimentsruns in last 3 months#1963
Merged
Dashing-Nelson merged 4 commits intomainfrom Nov 26, 2025
Merged
Conversation
- Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export.
6cf404b to
81223c0
Compare
JacquesVergine
approved these changes
Nov 26, 2025
…tsruns-in-last-3-months
JacquesVergine
pushed a commit
that referenced
this pull request
Nov 26, 2025
…1963) * Change test runner from ThreadRunner to ParallelRunner in docker test command * Change test runner from ParallelRunner to ThreadRunner in docker test command * Add runbook to count MLflow experiments from the last 3 months - Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export.
JacquesVergine
added a commit
that referenced
this pull request
Nov 27, 2025
* Correct trails to trial * Update ec_clinical_trial ingestion, transform and fabricator with ec_id * Update ec_clinical_trials transformation columns * Update off label ingestion and transformation to handle ec_drug_id * Update kgml ground truth ingestion, transformation and fabrication * Update drugbank * Update ec ground truth * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Follow Piotr's widsom * Minor fixes * Join generated pairs with known pairs on EC_id * Add xgboost * Join with embeddings on curie * Bump secrets * Remove source like columns before predicting * Update pipelines/matrix/src/matrix/pipelines/matrix_generation/nodes.py Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> * UV lock * Fix matrix test data * Fix fabricator params * Drop disease synonyms column * Fix drug/disease in the embeddings * Add default pipeline * Remove polars changes * Add key node pages to dashboard (#1887) * Add index & templated page for characterizing key nodes * remove prefixes, show count col first, remove unique subjects/objects cols * Optimize key_nodes_stats query performance by removing expensive descendant edge computations and sourcing from pre-computed release aggregate data * pull primary knowledge source out of the edge tables on key nodes pages * Add interactive chord diagram for key node category visualization This commit adds an interactive chord/radial diagram to visualize key node connections to biolink categories, with drill-down functionality to show example edges. New Features: - Interactive chord diagram showing key node in center with connected categories in oval layout - Direction-agnostic category grouping using biolink hierarchy (reduces 44 to ~14 parent groups) - Node sizes scaled by distinct connected nodes count - Link widths scaled by total edge count - Click-to-drill-down: select category to see example edges - Diverse edge sampling: 10 example edges per primary knowledge source - Clickable knowledge source links to detail pages - Dark mode compatible styling New Files: - sources/bq/key_nodes_category_summary.sql: Aggregates edges by parent biolink categories - sources/bq/key_nodes_category_edges.sql: Fetches example edges with diverse sampling - pages/_components/KeyNodeChordDashboard.svelte: Main visualization component with ECharts - pages/_lib/key-node-chord/constants.js: Layout constants and color palette - pages/_lib/key-node-chord/chord-layout.js: Position calculations and formatting utilities Modified Files: - pages/Key Nodes/[key_node_id].md: Integrated chord dashboard component Technical Details: - Used ECharts with direct initialization for click event handling (Evidence.dev workaround) - Oval-shaped layout (OUTER_RADIUS_X: 280, OUTER_RADIUS_Y: 150) - Straight connecting lines (curveness: 0) - Relative paths for knowledge source links with infores: prefix - Evidence.dev DataTable integration for drill-down with pagination reset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Code tidying for chord dashboard Code Cleanup: - Remove unused SQL columns (subject_category, object_category) from queries - Remove unused import (CATEGORY_COLORS) from Svelte component - Improve variable naming clarity (c → category, n → node) Documentation: - Add explanatory comments for complex category mapping logic in SQL - Document design decisions (oval shape, scaling ranges, straight lines) - Add edge case handling comments (equal counts in scaling) - Add component header documenting key technical decisions Technical Details: - Explain node size range (20-60px) ensures clickability without overlap - Explain link width range (2-12px) maintains visibility without overwhelming - Document biolink hierarchy mapping strategy reduces 44+ categories to ~14 groups - Clarify why direct ECharts initialization is used (for click event handling) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove search from DataTables and fix premature error message Fixes: - Remove search=true from 5 DataTable components (chord dashboard + 4 on key node page) - Change error message condition to check if key_node_info is defined before showing The search feature requires Query objects but we're passing filtered JavaScript arrays, triggering "Search Failed - Please use a query instead" toast warnings. The error message was showing immediately on page load before the query completed. Now uses {:else if key_node_info !== undefined} to only show after query finishes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove Connection Flow Sankey and rename section to Graph Edges Changes: - Remove "Connection Flow" Sankey diagram section (replaced by chord diagram) - Remove unused key_node_connected_categories SQL query - Rename "Interactive Category Explorer" to "{node_name} Graph Edges" - Update section description to better explain the visualization The chord diagram provides a cleaner, more interactive way to explore category connections than the Sankey flow diagram. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * use a shared data source for bar charts, graph and table, make sure connected category is always the other side of the association * removed unused source queries * remove unused another no-longer used query * move category colors from component to shared colors.js * key node custom viz labels compatible with dark mode * use category color from key node in key node viz, improve border colors in graph & bar charts * Update title for example/sample edges table * randomize key node sample edges * revert to deterministic sorting of example edges * update key node graph node and label colors for better visibility * include display of pks in edge counts added/removed/significant change, since it was already the natural key we were using, added explanatory text for each table * swap out markdown bold for strong tags --------- Co-authored-by: Claude <noreply@anthropic.com> * Nelson/aip 616 rd make topological embeddings resilient to spot failures (#1957) * Add spot node pools to GKE configuration and remove spot tolerations from Argo workflow template * Add tolerations for neo4j pod to enhance resource scheduling --------- Co-authored-by: Jacques Vergine <jacques.vergine35@gmail.com> * Nelson/aip 611 determine number of experimentsruns in last 3 months (#1963) * Change test runner from ThreadRunner to ParallelRunner in docker test command * Change test runner from ParallelRunner to ThreadRunner in docker test command * Add runbook to count MLflow experiments from the last 3 months - Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export. * docs: add LiteLLM New Provider Guide and update usage documentation (#1964) * Add xgboost * UV lock * Bump uv lock * Correct trails to trial * trails to trials --------- Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Nelson Alfonso <45660392+Dashing-Nelson@users.noreply.github.com>
drhodesbrc
pushed a commit
that referenced
this pull request
Nov 27, 2025
…1963) * Change test runner from ThreadRunner to ParallelRunner in docker test command * Change test runner from ParallelRunner to ThreadRunner in docker test command * Add runbook to count MLflow experiments from the last 3 months - Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export.
Dashing-Nelson
added a commit
that referenced
this pull request
Dec 18, 2025
…1963) * Change test runner from ThreadRunner to ParallelRunner in docker test command * Change test runner from ParallelRunner to ThreadRunner in docker test command * Add runbook to count MLflow experiments from the last 3 months - Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export.
Dashing-Nelson
added a commit
that referenced
this pull request
Dec 18, 2025
* Correct trails to trial * Update ec_clinical_trial ingestion, transform and fabricator with ec_id * Update ec_clinical_trials transformation columns * Update off label ingestion and transformation to handle ec_drug_id * Update kgml ground truth ingestion, transformation and fabrication * Update drugbank * Update ec ground truth * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Update pipelines/matrix/conf/base/fabricator/parameters.yml * Follow Piotr's widsom * Minor fixes * Join generated pairs with known pairs on EC_id * Add xgboost * Join with embeddings on curie * Bump secrets * Remove source like columns before predicting * Update pipelines/matrix/src/matrix/pipelines/matrix_generation/nodes.py Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> * UV lock * Fix matrix test data * Fix fabricator params * Drop disease synonyms column * Fix drug/disease in the embeddings * Add default pipeline * Remove polars changes * Add key node pages to dashboard (#1887) * Add index & templated page for characterizing key nodes * remove prefixes, show count col first, remove unique subjects/objects cols * Optimize key_nodes_stats query performance by removing expensive descendant edge computations and sourcing from pre-computed release aggregate data * pull primary knowledge source out of the edge tables on key nodes pages * Add interactive chord diagram for key node category visualization This commit adds an interactive chord/radial diagram to visualize key node connections to biolink categories, with drill-down functionality to show example edges. New Features: - Interactive chord diagram showing key node in center with connected categories in oval layout - Direction-agnostic category grouping using biolink hierarchy (reduces 44 to ~14 parent groups) - Node sizes scaled by distinct connected nodes count - Link widths scaled by total edge count - Click-to-drill-down: select category to see example edges - Diverse edge sampling: 10 example edges per primary knowledge source - Clickable knowledge source links to detail pages - Dark mode compatible styling New Files: - sources/bq/key_nodes_category_summary.sql: Aggregates edges by parent biolink categories - sources/bq/key_nodes_category_edges.sql: Fetches example edges with diverse sampling - pages/_components/KeyNodeChordDashboard.svelte: Main visualization component with ECharts - pages/_lib/key-node-chord/constants.js: Layout constants and color palette - pages/_lib/key-node-chord/chord-layout.js: Position calculations and formatting utilities Modified Files: - pages/Key Nodes/[key_node_id].md: Integrated chord dashboard component Technical Details: - Used ECharts with direct initialization for click event handling (Evidence.dev workaround) - Oval-shaped layout (OUTER_RADIUS_X: 280, OUTER_RADIUS_Y: 150) - Straight connecting lines (curveness: 0) - Relative paths for knowledge source links with infores: prefix - Evidence.dev DataTable integration for drill-down with pagination reset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Code tidying for chord dashboard Code Cleanup: - Remove unused SQL columns (subject_category, object_category) from queries - Remove unused import (CATEGORY_COLORS) from Svelte component - Improve variable naming clarity (c → category, n → node) Documentation: - Add explanatory comments for complex category mapping logic in SQL - Document design decisions (oval shape, scaling ranges, straight lines) - Add edge case handling comments (equal counts in scaling) - Add component header documenting key technical decisions Technical Details: - Explain node size range (20-60px) ensures clickability without overlap - Explain link width range (2-12px) maintains visibility without overwhelming - Document biolink hierarchy mapping strategy reduces 44+ categories to ~14 groups - Clarify why direct ECharts initialization is used (for click event handling) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove search from DataTables and fix premature error message Fixes: - Remove search=true from 5 DataTable components (chord dashboard + 4 on key node page) - Change error message condition to check if key_node_info is defined before showing The search feature requires Query objects but we're passing filtered JavaScript arrays, triggering "Search Failed - Please use a query instead" toast warnings. The error message was showing immediately on page load before the query completed. Now uses {:else if key_node_info !== undefined} to only show after query finishes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove Connection Flow Sankey and rename section to Graph Edges Changes: - Remove "Connection Flow" Sankey diagram section (replaced by chord diagram) - Remove unused key_node_connected_categories SQL query - Rename "Interactive Category Explorer" to "{node_name} Graph Edges" - Update section description to better explain the visualization The chord diagram provides a cleaner, more interactive way to explore category connections than the Sankey flow diagram. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * use a shared data source for bar charts, graph and table, make sure connected category is always the other side of the association * removed unused source queries * remove unused another no-longer used query * move category colors from component to shared colors.js * key node custom viz labels compatible with dark mode * use category color from key node in key node viz, improve border colors in graph & bar charts * Update title for example/sample edges table * randomize key node sample edges * revert to deterministic sorting of example edges * update key node graph node and label colors for better visibility * include display of pks in edge counts added/removed/significant change, since it was already the natural key we were using, added explanatory text for each table * swap out markdown bold for strong tags --------- Co-authored-by: Claude <noreply@anthropic.com> * Nelson/aip 616 rd make topological embeddings resilient to spot failures (#1957) * Add spot node pools to GKE configuration and remove spot tolerations from Argo workflow template * Add tolerations for neo4j pod to enhance resource scheduling --------- Co-authored-by: Jacques Vergine <jacques.vergine35@gmail.com> * Nelson/aip 611 determine number of experimentsruns in last 3 months (#1963) * Change test runner from ThreadRunner to ParallelRunner in docker test command * Change test runner from ParallelRunner to ThreadRunner in docker test command * Add runbook to count MLflow experiments from the last 3 months - Implemented a Jupyter notebook that connects to the MLflow tracking server. - Added steps for GKE authentication, service discovery, and port-forwarding. - Included functionality to query and analyze experiments and runs from the last 3 months. - Summarized results by experiment and status, with options for detailed statistics and CSV export. * docs: add LiteLLM New Provider Guide and update usage documentation (#1964) * Add xgboost * UV lock * Bump uv lock * Correct trails to trial * trails to trials --------- Co-authored-by: Alexei Stepanenko <alexei.stepa@gmail.com> Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Nelson Alfonso <45660392+Dashing-Nelson@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of the changes
This pull request introduces several improvements to the pipeline configuration and execution for the matrix project, focusing on better thread safety, enhanced configurability via environment variables, and updated dependencies. The most significant changes include switching to a thread-safe Spark session manager, enabling pipeline parameters to be set through environment variables, and running Kedro pipelines with the
ThreadRunnerfor enhanced parallelism.Pipeline execution and configuration:
Makefileanddocker-compose.ci.ymlto use theThreadRunner, enabling parallel execution and potentially faster test runs. [1] [2]n_cross_val_foldsandnum_shards) insettings.pyto be configurable via environment variables, improving flexibility for different environments.Thread safety and Spark session management:
getActiveSession()withSparkManager.get_or_create_session()ingcp.pyto ensure thread safety when creating Spark DataFrames.Dependency and environment management:
infra/secrets, likely pulling in new secrets or configuration changes.osinsettings.pyto support environment variable configuration.Fixes / Resolves the following issues:
Checklist:
enhancementorbug)pulling in latest main, uncomment the below "Merge Notification" section and
describe steps necessary for people
kedro run -e sample -p test_sample(see sample environment guide)