Skip metadata.json parse in drop path#589
Merged
abhisheknath2011 merged 3 commits intoMay 26, 2026
Merged
Conversation
…ropped Previously, dropping a table went through catalog.loadTable twice: 1. TablesServiceImpl.deleteTable -> openHouseInternalRepository.findById for the auth-check TableDto, which parses metadata.json. 2. OpenHouseInternalCatalog.dropTable -> loadTable(id).location() to get the table base directory for fileIO.deletePrefix on purge. Both fail when metadata.json is corrupted (e.g. "Cannot find schema with current-schema-id=6 from schemas"), making it impossible to drop the table through the normal flow. This change avoids loadTable in both spots by going directly to HTS: - OpenHouseInternalCatalog#findHouseTable: HTS-only lookup returning the HouseTable row, no metadata parse. - OpenHouseInternalRepository#findStubById: returns a partial TableDto (databaseId, tableId, tableUUID, tableLocation) built from the HouseTable row. Enough for the auth check, which only needs db + UUID. - TablesServiceImpl#deleteTable now uses findStubById. - OpenHouseInternalCatalog#dropTable uses findHouseTable and derives the table base as the parent directory of the metadata.json path -- the same derivation OpenHouseInternalRepositoryImpl#save already uses in the replace flow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Unit tests:
- OpenHouseInternalCatalogTest: findHouseTable happy path + HouseTableNotFoundException
handling; dropTable behavior across {row missing, purge=true, purge=false} including
verifying the metadata.json -> base location derivation via deletePrefix arg.
- OpenHouseInternalRepositoryImplTest: findStubById happy path (asserting which fields
are populated and which stay null), empty-Optional propagation, and the instanceof
guard for non-OpenHouseInternalCatalog catalogs.
Integration test:
- TablesServiceTest (h2): testTableDeleteSucceedsWhenMetadataJsonIsCorrupted creates a
table, overwrites metadata.json with garbage so TableMetadataParser would throw, then
verifies deleteTable still succeeds and the HTS row is gone. Regression coverage for
the corrupted-table scenario this change set fixes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Switch metadata.json -> table base derivation in OpenHouseInternalCatalog.dropTable
from substring to org.apache.hadoop.fs.Path.getParent().toString() (URI-aware).
- Add .endsWith(".metadata.json") guard before deriving the base, so a directory
value in metadata_location (from a bad migration / manual edit) fails loudly
instead of letting deletePrefix wipe the parent directory.
- Extract derivation + guard into private getTableBaseLocation helper.
- Rename OpenHouseInternalRepository.findStubById -> findTableRefById to convey
that the returned partial TableDto is a catalog-level reference (identifiers +
tableUUID + metadata.json location), not a generic stub.
- Add unit test for the new guard
(dropTableRefusesWhenMetadataLocationIsNotAMetadataJsonFile).
- Rename existing findStubById tests to match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jiang95-dev
approved these changes
May 25, 2026
abhisheknath2011
approved these changes
May 26, 2026
17 tasks
cbb330
pushed a commit
that referenced
this pull request
May 28, 2026
## Summary Reverts commit `637896219c53cad672a4a3d381c1dad302971854` — "Support returning HTS fields in table list api (#579)". The revert conflicted with #589 (merged after #579) in `OpenHouseInternalCatalog.java`. To make the conflict visible to the reviewer, this PR is split into two commits: 1. **Revert with conflict markers preserved** (`a15b0da5`) — raw `git revert` output, `OpenHouseInternalCatalog.java` left with `<<<<<<<` / `=======` / `>>>>>>>` markers. Committed with `--no-verify` because the markers fail spotless. 2. **Resolve conflict** (`387dd773`) — keep `findHouseTable` (from #589, still used by the drop path), drop `listHouseTables` (from #579, the new addition being reverted). The intent of the split is to let the reviewer inspect what conflicted, then see exactly how it was resolved. ## Changes - [X] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [X] Refactoring - [ ] Documentation - [X] Tests Reverts the public list-table API additions and the supporting tests from #579. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [X] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. Tests added by #579 are removed as part of the revert. Relying on CI to confirm the remaining suite (including the #589 drop-path tests that depend on `findHouseTable`) still passes. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 tasks
maluchari
added a commit
that referenced
this pull request
May 29, 2026
Reverts the 17 commits that landed on main after v0.5.417, bringing the tree back to exactly the v0.5.417 state. Squashed into a single revert commit for reviewability and to allow reinstating everything as one unit (revert this commit to bring all 17 changes back). Reverted commits (v0.5.417..main, newest first): - Revert #579 (HTS fields in table list api) (#610) - feat(optimizer): [3/N] Analyzer (#533) - [DataLoader] Handle Cast(Literal, TIMESTAMP/DATE/TIME) in scan optimizer (#569 follow-up) (#583) - Skip metadata.json parse in drop path (#589) - feat(optimizer): [2/N] Optimizer REST Service and Controller (#531) - [BDP-102028] feat(optimizer): [1/N] Optimizer Database (#530) - [RTAS]: Fix bug - remove fs scheme from tableLocation in commit (cont) (#594) - Trigger ELR process (#593) - [BDP-102028] feat(optimizer): [0/N] Optimizer API and internal model (#527) - Fail retention app when the columnPattern mismatch partition spec (#552) - [DataLoader] Drop OpenTelemetry minimum version to 1.38.0 (#590) - [DataLoader] Emit OpenTelemetry metrics for read operations (#582) - Cache iceberg metadata to reduce redundant requests to storage (#509) - bump iceberg 1.2 version to 1.2.0.17 (#587) - Support returning HTS fields in table list api (#579) - [DataLoader] Add unique id property to OpenHouseDataLoader (#580) - [DataLoader] Add OpenTelemetry metrics support (#575) ## Summary <!--- HINT: Replace #nnn with corresponding Issue number, if you are fixing an existing issue --> [Issue](https://github.com/linkedin/openhouse/issues/#nnn)] Briefly discuss the summary of the changes made in this pull request in 2-3 lines. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [ ] Tests For all the boxes checked, please include additional details of the changes made in this pull request. ## Testing Done <!--- Check any relevant boxes with "x" --> - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. For all the boxes checked, include additional details of the changes made in this pull request. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Dropping a table currently goes through
catalog.loadTabletwice:TablesServiceImpl.deleteTable→findByIdparses metadata.json to build aTableDtofor the ACL check.OpenHouseInternalCatalog.dropTable→loadTable(id).location()parses metadata.json again to get the table base directory forfileIO.deletePrefixon purge.Both parses are redundant — everything drop needs (
databaseId,tableId,tableUUID, the metadata.json path) is already on the HTS row. The redundant parses also make drop fragile: if metadata is semantically invalid (e.g.Cannot find schema with current-schema-id=6 from schemas), drop fails even though the HTS row and storage files are still well-formed.This PR removes both
loadTablecalls from the drop path:OpenHouseInternalCatalog#findHouseTable(TableIdentifier)— HTS-only lookup that returns theHouseTablerow, no metadata parse.OpenHouseInternalRepository#findStubById(TableDtoPrimaryKey)— returns a partialTableDto(databaseId,tableId,tableUUID,tableLocation) built from the HTS row. Enough for the OPA auth check, which only uses db + UUID.TablesServiceImpl#deleteTableusesfindStubByIdinstead offindById.OpenHouseInternalCatalog#dropTableusesfindHouseTableand derives the storage base location as the parent directory of the metadata.json path — same derivationOpenHouseInternalRepositoryImpl#savealready uses in its replace flow, since OpenHouse writesmetadata.jsondirectly under<base>/(no/metadata/subdir).A side benefit: corrupted-metadata tables can now be dropped end-to-end through the normal API.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.