Conversation
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the SSRS dashboard connector to extract richer metadata from SSRS reports by fetching and parsing RDL content, emitting DashboardDataModels per dataset, and generating table lineage from dataset SQL.
Changes:
- Added
SsrsDataModelas a supportedDataModelTypein the DashboardDataModel schema. - Implemented SSRS RDL fetching + decoding in the SSRS client and an XML RDL parser to extract data sources/datasets/fields.
- Extended the SSRS source to cache parsed RDL, emit data models, and compute lineage using the SQL lineage parser; added unit/integration tests and fixtures.
Reviewed changes
Copilot reviewed 14 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/entity/data/dashboardDataModel.json | Adds SsrsDataModel to the enum of supported dashboard data model types. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py | Fetches report RDL bytes from SSRS endpoints and decodes XML/JSON(base64) responses. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py | New RDL XML parser to extract data sources, datasets, commands, and fields. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py | Loads/caches RDL per report, emits datamodels, and yields lineage from parsed SQL tables. |
| ingestion/tests/unit/topology/dashboard/test_ssrs_rdl_parser.py | Unit tests for RDL parsing and connection-string parsing. |
| ingestion/tests/unit/topology/dashboard/test_ssrs.py | Unit tests for SSRS datamodel emission and lineage behavior. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/*.rdl | RDL fixtures used by the unit tests. |
| ingestion/tests/integration/ssrs/conftest.py | Extends mock SSRS server to serve RDL endpoints. |
| ingestion/tests/integration/ssrs/test_metadata.py | Integration tests validating RDL fetch + parse via mock server. |
| .claude/scheduled_tasks.lock | Adds a lock file to the repo. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the SSRS dashboard connector to extract SSRS RDL metadata (datasets/data sources/SQL) and use it to emit Dashboard Data Models and table lineage, while also adding the new SsrsDataModel enum value to the shared DashboardDataModel schema and generated UI types.
Changes:
- Add an SSRS RDL XML parser and integrate it into the SSRS ingestion flow for datamodel + lineage extraction.
- Extend the SSRS client to retrieve report definitions from SSRS content endpoints (including base64-decoded JSON payloads).
- Update schema + generated UI types to support
SsrsDataModel, and add unit/integration test coverage with RDL fixtures.
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/generated/entity/data/dashboardDataModel.ts | Adds SsrsDataModel to the generated DataModelType enum for UI usage. |
| openmetadata-ui/src/main/resources/ui/src/generated/api/data/createDashboardDataModel.ts | Adds SsrsDataModel to the generated API enum used by UI create requests. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/dashboardDataModel.json | Extends the schema enum and javaEnums to include SsrsDataModel. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py | New module to parse RDL into structured datasets/data sources and extract connection info. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/models.py | Adds created_by field mapping (CreatedBy) to the SSRS report model. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py | Implements RDL caching, datamodel emission, owner resolution, and dataset SQL lineage extraction. |
| ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py | Adds report-definition fetch + decode logic (multiple endpoints, base64 JSON decoding, size limiting). |
| ingestion/tests/unit/topology/dashboard/test_ssrs_rdl_parser.py | Unit tests for parsing RDL variants, security rejection, and connection string parsing. |
| ingestion/tests/unit/topology/dashboard/test_ssrs.py | Unit tests for SSRS ownership, datamodel emission, lineage behavior, and hidden-report status filtering. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/shared_datasource.rdl | RDL fixture for shared datasource reference. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/no_datasource.rdl | RDL fixture with empty sources/datasets. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/malformed.rdl | RDL fixture for malformed XML. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/inline_single_dataset_2016.rdl | RDL fixture for single dataset with inline SQL and fields. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/inline_multi_dataset_2010.rdl | RDL fixture for multiple datasets with inline SQL. |
| ingestion/tests/unit/topology/dashboard/fixtures/ssrs/expression_commandtype.rdl | RDL fixture for CommandType=Expression. |
| ingestion/tests/integration/ssrs/test_metadata.py | Integration tests for report-definition fetching and end-to-end RDL parsing via mock server. |
| ingestion/tests/integration/ssrs/conftest.py | Extends mock SSRS server to serve RDL content endpoints used by the client. |
| def get_report_definition(self, report_id: str) -> Optional[bytes]: | ||
| """Return the RDL XML bytes for a report, or ``None`` if unavailable. | ||
|
|
||
| Tries ``/Reports({id})/Content/$value`` first, then ``/CatalogItems({id})/Content``. | ||
| Not-found responses (404/400) trigger fallback silently; transport errors | ||
| propagate so operators see outages instead of empty catalogs.""" | ||
| last_err: Optional[Exception] = None | ||
| for template in RDL_CONTENT_PATHS: | ||
| path = template.format(id=report_id) | ||
| try: | ||
| body = self._fetch_report_content(path) | ||
| except requests.RequestException as exc: | ||
| last_err = exc | ||
| logger.warning("RDL fetch transport error for %s: %s", path, exc) | ||
| continue | ||
| if body is not None: | ||
| return body | ||
| if last_err is not None: | ||
| raise SourceConnectionException( | ||
| f"Failed to fetch RDL content for report [{report_id}]: {last_err}" | ||
| ) from last_err | ||
| return None |
| self, dashboard: SsrsReport | ||
| ) -> Optional[SsrsReportDefinition]: | ||
| """Fetch and cache RDL lazily. Returns ``None`` when the report has no | ||
| sources or the RDL cannot be fetched/parsed.""" | ||
| cached = self._report_definitions.get(dashboard.id) | ||
| if cached is not None: | ||
| return cached | ||
| if dashboard.has_data_sources is False: | ||
| return None | ||
| try: | ||
| rdl_bytes = self.client.get_report_definition(dashboard.id) | ||
| except Exception as exc: | ||
| logger.debug(traceback.format_exc()) | ||
| logger.warning( | ||
| "Could not fetch RDL for report [%s]: %s", dashboard.name, exc | ||
| ) | ||
| return None | ||
| if not rdl_bytes: |
| if not rdl: | ||
| return | ||
| for dataset in rdl.data_sets: | ||
| try: | ||
| datamodel_request = self._build_datamodel_request( | ||
| dashboard_details, dataset | ||
| ) | ||
| if datamodel_request is None: | ||
| continue | ||
| yield Either(right=datamodel_request) | ||
| self.register_record_datamodel(datamodel_request=datamodel_request) | ||
| except Exception as exc: | ||
| yield Either( | ||
| left=StackTraceError( | ||
| name=f"{dashboard_details.name}.{dataset.name}", | ||
| error=( | ||
| f"Error yielding DataModel [{dataset.name}] for report " | ||
| f"[{dashboard_details.name}]: {exc}" | ||
| ), | ||
| stackTrace=traceback.format_exc(), | ||
| ) | ||
| ) | ||
|
|
Code Review ✅ Approved 4 resolved / 4 findingsSSRS Connector lineage improvements now correctly preserve source table identity, prevent XXE bypasses, and implement memory-efficient response sizing. The accidental .claude configuration file has been removed. ✅ 4 resolved✅ Bug: prefix_table overrides every source table, collapsing lineage
✅ Quality: Accidentally committed .claude/scheduled_tasks.lock
✅ Security: XXE guard is case-sensitive and can be bypassed with mixed-case
✅ Performance: Size-limit check runs after full response is already in memory
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
🟡 Playwright Results — all passed (13 flaky)✅ 3698 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 89 skipped
🟡 13 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
* Improve SSRS Connector - Lineage * Update generated TypeScript types * Add ownership extraction * remove claude file * Address comments * address comments --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Improve SSRS Connector - Lineage * Update generated TypeScript types * Add ownership extraction * remove claude file * Address comments * address comments --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Improve SSRS Connector - Lineage * Update generated TypeScript types * Add ownership extraction * remove claude file * Address comments * address comments --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Improve SSRS Connector - Lineage * Update generated TypeScript types * Add ownership extraction * remove claude file * Address comments * address comments --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>





Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
MAX_RDL_BYTES.SourceConnectionExceptionduring outages, preventing accidental entity deletion._current_rdlcache for<!DOCTYPEand<!ENTITYdeclarations in RDL content.This will update automatically on new commits.