Skip to content

Always recreate Hive metastore_db to fix TABLE_NOT_FOUND on repeated runs#597

Closed
charles-typ wants to merge 1 commit intofacebookresearch:v2-betafrom
charles-typ:export-D102699392-to-v2-beta
Closed

Always recreate Hive metastore_db to fix TABLE_NOT_FOUND on repeated runs#597
charles-typ wants to merge 1 commit intofacebookresearch:v2-betafrom
charles-typ:export-D102699392-to-v2-beta

Conversation

@charles-typ
Copy link
Copy Markdown
Contributor

Summary:
Symptom: SparkBench via automark reported "No known db metrics found for
spark_standalone_remote" — the benchmark produced architectural metrics
(mpstat, perf-stat, etc.) but no execution-time results.

Error: Spark query log showed:
[TABLE_OR_VIEW_NOT_FOUND] The table or view table_SPdBtF_Y9di_...
cannot be found.

Analysis (traced through three layers):

  1. perfpub couldn't find metrics — the metrics JSON only contained
    worker_cores/worker_memory, no execution_time_test_93586-stage-* keys.

  2. Spark SQL queries failed with TABLE_OR_VIEW_NOT_FOUND because the Hive
    tables were never registered in the metastore.

  3. create_tables.sql was silently failing — install_database() in runner.py
    used os.popen() which doesn't check return codes, so the error was
    swallowed. The actual error in create_tables.log was:
    [LOCATION_ALREADY_EXISTS] Cannot name the managed table as ...,
    as its associated location
    'file:/flash23/warehouse/bpc_t93586_s2_synthetic.db/...'
    already exists. SQLSTATE: 42710
    Spark 4.0 is stricter than 2.x — it refuses CREATE TABLE when the
    warehouse directory already has data from a prior run, even after
    DROP TABLE IF EXISTS clears the metastore entry.

Root cause: install_database() skipped table re-creation when metastore_db/
and the warehouse .db/ directory existed from a prior run. When it didn't
skip (e.g. after manual cleanup of metastore_db), it still failed because
Spark 4.0 rejects CREATE TABLE on a pre-existing warehouse directory.

Fix: Always remove both metastore_db/ and the warehouse .db/ directory
before running install. The source-dataset download is independently
guarded, so this only adds ~30s of CREATE TABLE DDL overhead per run.

Differential Revision: D102699392

…runs

Summary:
Symptom: SparkBench via automark reported "No known db metrics found for
spark_standalone_remote" — the benchmark produced architectural metrics
(mpstat, perf-stat, etc.) but no execution-time results.

Error: Spark query log showed:
  [TABLE_OR_VIEW_NOT_FOUND] The table or view `table_SPdBtF_Y9di_...`
  cannot be found.

Analysis (traced through three layers):

1. perfpub couldn't find metrics — the metrics JSON only contained
   worker_cores/worker_memory, no execution_time_test_93586-stage-* keys.

2. Spark SQL queries failed with TABLE_OR_VIEW_NOT_FOUND because the Hive
   tables were never registered in the metastore.

3. create_tables.sql was silently failing — install_database() in runner.py
   used os.popen() which doesn't check return codes, so the error was
   swallowed. The actual error in create_tables.log was:
     [LOCATION_ALREADY_EXISTS] Cannot name the managed table as `...`,
     as its associated location
     'file:/flash23/warehouse/bpc_t93586_s2_synthetic.db/...'
     already exists. SQLSTATE: 42710
   Spark 4.0 is stricter than 2.x — it refuses CREATE TABLE when the
   warehouse directory already has data from a prior run, even after
   DROP TABLE IF EXISTS clears the metastore entry.

Root cause: install_database() skipped table re-creation when metastore_db/
and the warehouse .db/ directory existed from a prior run. When it didn't
skip (e.g. after manual cleanup of metastore_db), it still failed because
Spark 4.0 rejects CREATE TABLE on a pre-existing warehouse directory.

Fix: Always remove both metastore_db/ and the warehouse .db/ directory
before running install. The source-dataset download is independently
guarded, so this only adds ~30s of CREATE TABLE DDL overhead per run.

Differential Revision: D102699392
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 28, 2026

@charles-typ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102699392.

meta-codesync Bot pushed a commit that referenced this pull request Apr 28, 2026
…runs (#597)

Summary:
Pull Request resolved: #597

Symptom: SparkBench via automark reported "No known db metrics found for
spark_standalone_remote" — the benchmark produced architectural metrics
(mpstat, perf-stat, etc.) but no execution-time results.

Error: Spark query log showed:
  [TABLE_OR_VIEW_NOT_FOUND] The table or view `table_SPdBtF_Y9di_...`
  cannot be found.

Analysis (traced through three layers):

1. perfpub couldn't find metrics — the metrics JSON only contained
   worker_cores/worker_memory, no execution_time_test_93586-stage-* keys.

2. Spark SQL queries failed with TABLE_OR_VIEW_NOT_FOUND because the Hive
   tables were never registered in the metastore.

3. create_tables.sql was silently failing — install_database() in runner.py
   used os.popen() which doesn't check return codes, so the error was
   swallowed. The actual error in create_tables.log was:
     [LOCATION_ALREADY_EXISTS] Cannot name the managed table as `...`,
     as its associated location
     'file:/flash23/warehouse/bpc_t93586_s2_synthetic.db/...'
     already exists. SQLSTATE: 42710
   Spark 4.0 is stricter than 2.x — it refuses CREATE TABLE when the
   warehouse directory already has data from a prior run, even after
   DROP TABLE IF EXISTS clears the metastore entry.

Root cause: install_database() skipped table re-creation when metastore_db/
and the warehouse .db/ directory existed from a prior run. When it didn't
skip (e.g. after manual cleanup of metastore_db), it still failed because
Spark 4.0 rejects CREATE TABLE on a pre-existing warehouse directory.

Fix: Always remove both metastore_db/ and the warehouse .db/ directory
before running install. The source-dataset download is independently
guarded, so this only adds ~30s of CREATE TABLE DDL overhead per run.

Reviewed By: gandhijayneel

Differential Revision: D102699392

fbshipit-source-id: 104c6ef8dcdff0b501ba206c46320c6f2f3f45f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant