Skip to content

[RTAS]: Fix bug - remove fs scheme from tableLocation in commit (cont)#594

Merged
cbb330 merged 3 commits into
linkedin:mainfrom
jiang95-dev:lejiang/fix-rtas2
May 21, 2026
Merged

[RTAS]: Fix bug - remove fs scheme from tableLocation in commit (cont)#594
cbb330 merged 3 commits into
linkedin:mainfrom
jiang95-dev:lejiang/fix-rtas2

Conversation

@jiang95-dev
Copy link
Copy Markdown
Collaborator

@jiang95-dev jiang95-dev commented May 21, 2026

Summary

This is the follow up PR of #542. After the fix, we were able to replace the table once but not multiple times. RC anslysis:

  1. OpenHouseInternalTableOperations uses metadata.location() for the new HTS tableLocation, and that value comes from .withLocation(tableLocation) that we pass, so we need to make sure it's schemeless.
  2. We used tableDto.getTableVersion() to populate .withLocation(tableLocation), but tableDto always contains scheme. (Why is tableLocation = tableVersion in tableDto? Because tableVersion comes from client request, and client request uses the tableLocation from the server response, so they will always be same and contain scheme)
  3. The last PR fixes the tableLocation in table properties, so the new tableVersion for the first replace will be schemeless. But since the new tableLocation has scheme, the next replace will fail.

Therefore, in this PR, we use the tableLocation from the table properties that we had just stripped in the last PR for .withLocation(tableLocation).

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

Unit tests
I removed stripPathScheme in the spark E2E tests to gurantee that scheme issues can be captured by unit tests.

Test 1: RTAS

scala> spark.sql(s"CREATE TABLE $tableName TBLPROPERTIES ('prop1'='val1', 'prop2'='val2') AS SELECT * FROM $sourceName");
res3: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql(s"INSERT INTO $tableName values (4, 'd')");
res4: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"REPLACE TABLE $tableName PARTITIONED BY (part) TBLPROPERTIES ('prop1'='newval1', 'prop3'='val3') AS SELECT id, data, CASE WHEN (id % 2) = 0 THEN 'even' ELSE 'odd' END AS part FROM $sourceName ORDER BY 3, 1");
res5: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"REPLACE TABLE $tableName PARTITIONED BY (part) AS SELECT 2 * id as id, data, CASE WHEN ((2 * id) % 2) = 0 THEN 'even' ELSE 'odd' END AS part FROM $sourceName ORDER BY 3, 1");
res12: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"SELECT * FROM $tableName").show(false)
+---+----+----+
|id |data|part|
+---+----+----+
|2  |a   |even|
|4  |b   |even|
|6  |c   |even|
+---+----+----+

Test 2: CRTAS

scala> spark.sql(s"CREATE OR REPLACE TABLE $tableName TBLPROPERTIES ('prop1'='val1', 'prop2'='val2') AS SELECT * FROM $sourceName");
res19: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"INSERT INTO $tableName values (4, 'd')");
res20: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"CREATE OR REPLACE TABLE $tableName PARTITIONED BY (part) AS SELECT id, data, CASE WHEN id % 2 = 0 THEN 'even' ELSE 'odd' END AS part FROM $sourceName ORDER BY 3, 1");
res21: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"CREATE OR REPLACE TABLE $tableName PARTITIONED BY (part) AS SELECT 2 * id as id, data, CASE WHEN ((2 * id) % 2) = 0 THEN 'even' ELSE 'odd' END AS part FROM $sourceName ORDER BY 3, 1");
res22: org.apache.spark.sql.DataFrame = []

scala> spark.sql(s"SELECT * FROM $tableName").show(false)
+---+----+----+
|id |data|part|
+---+----+----+
|2  |a   |even|
|4  |b   |even|
|6  |c   |even|
+---+----+----+

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@jiang95-dev jiang95-dev requested review from cbb330 and rohitkum2506 and removed request for rohitkum2506 May 21, 2026 09:14
@cbb330 cbb330 merged commit fb0097c into linkedin:main May 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants