Skip to content

Bug: PostgreSQL storage crashes when updating the trial status #999

@motus

Description

@motus

Our current implementation allows repeated inserts with the same key. In databases like MySQL and DuckDB that does not case an error - instead, an insert statement returns a special value. Our existing code depends on such return values and does not expect an exception to be thrown. We need to handle the case such that it works for all DBs, including Postgres.

Logs:

2025-09-19 13:20:34,922 base_storage.py:677 update_telemetry INFO Store telemetry: tqp-local-002:1:1:1 :: Status.FAILED 0 records
2025-09-19 13:20:35,131 base_storage.py:644 update INFO Store trial: tqp-local-002:1:1:1 :: Status.FAILED None
2025-09-19 13:20:35,234 trial.py:243 _update_status WARNING Status with that timestamp already exists: tqp-local-002:1:1:1 2025-09-19 20:20:34.851653+00:00 :: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "trial_status_exp_id_trial_id_ts_key"
DETAIL:  Key (exp_id, trial_id, ts)=(tqp-local-002, 1, 2025-09-19 20:20:34.851653+00) already exists.

[SQL: INSERT INTO trial_status (exp_id, trial_id, ts, status) VALUES (%(exp_id)s, %(trial_id)s, %(ts)s, %(status)s)]
[parameters: {'exp_id': 'tqp-local-002', 'trial_id': 1, 'ts': datetime.datetime(2025, 9, 19, 20, 20, 34, 851653, tzinfo=<UTC>), 'status': 'FAILED'}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

Stack trace:

  File "/Users/sergiym/devel/MLOS/mlos_bench/mlos_bench/schedulers/trial_runner.py", line 221, in run_trial
    trial.update(status, timestamp, results)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sergiym/devel/MLOS/mlos_bench/mlos_bench/storage/sql/trial.py", line 122, in update
    cur_status = conn.execute(
        self._schema.trial.update()
    ...<16 lines>...
        )
    )

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions