Fix: dataset deletion cleanup gaps#21
Merged
Merged
Conversation
Dataset deletion removed the feature type but left the layer's .r/.w ACL rules in GeoServer. Since the final table name becomes reusable once the row is deleted, a stale rule could silently apply to a future dataset published under the same name. Add GeoServerService.delete_layer_acl (best-effort, 404-tolerant) and call it from DatasetDeletionService next to delete_layer.
DAG deletion was guarded by the schedule still being set. When the schedule had been cleared before the dataset was deleted, the stale ingestion_<id> DAG (metadata + run history) stayed in Airflow forever. Call delete_dag unconditionally — it already treats 404 as success.
A staging_dag or process_dag run still in flight when the dataset was deleted would recreate the staging/final table after cleanup (its callbacks 404, leaving an orphaned table with no GS/GN artifacts). Force-fail all runs of the dataset first (best-effort): the scheduled ingestion DAG, process_dag runs (prefix '<id>_') and staging_dag runs (prefix '<id>' — the first staging run id is exactly the uuid).
staging_dag/process_dag run records (with task instances and XComs
holding staging/final table names) outlived the dataset forever.
Add purge_dataset_dag_runs, matching run ids by SQL LIKE prefix
('<id>%' for staging_dag, '<id>_%' for process_dag), called best-effort
before the IntegrityLink row is deleted.
Known limitation: failed-run log files under /opt/airflow/logs live on
the Airflow volume and cannot be reached from the backend; success-run
logs are already removed by the DAG success callback.
The per-org GeoServer workspace ({org}) and datastore ({org}_ds) and
the org PostgreSQL schema survived forever, even after the org's last
dataset was deleted.
When no IntegrityLink remains for the organization, delete the
datastore and workspace via raw non-recursive REST calls (GeoServer
refuses to delete non-empty resources, so anything still in use
survives; geoservercloud's own delete helpers hardcode recurse=true and
are deliberately avoided) and DROP SCHEMA ... RESTRICT the org schema.
The shared 'staging' and default 'data' schemas are never touched.
The never-drop guard hardcoded the 'staging' literal inside the deletion service; when get_staging_schema becomes configurable the guard would silently stop covering it. is_shared_schema now derives from get_staging_schema/DEFAULT_DATA_SCHEMA in core.config.
This was referenced Jun 3, 2026
tonio
approved these changes
Jun 3, 2026
Contributor
tonio
left a comment
There was a problem hiding this comment.
Nice, I wouldn't have guess there was so many things to cleanup !
Collaborator
Author
I was the first to be surprised ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Audit of what each action creates vs. what deletion cleaned up — the gaps found on the deletion path, fixed:
.r/.wrules. Sincefinal_table_nameis reused byget_available_table_name, old rules could silently apply to a future dataset.delete_layer_aclis now called alongsidedelete_layer(best-effort, 404-tolerant).cancel_ingestion_dagnow force-fails the scheduled ingestion DAG,process_dagandstaging_dagruns first.ingestion_<id>DAG was only deleted when a schedule was still set at deletion time. Deletion now always callsdelete_dag(404-tolerant).staging_dag/process_dagrun records (dag runs, task instances, XComs carrying table names) were never removed.purge_dataset_dag_runsdeletes them viarun_id_patternLIKE before the IntegrityLink row is deleted.{org}, datastore{org}_dsand the org PostgreSQL schema outlived all datasets forever. After the last dataset of an org is deleted, the service attempts (best-effort) non-recursive REST deletes (GeoServer refuses when non-empty → harmless) andDROP SCHEMA … RESTRICT(skipped for sharedstaging/dataschemas, guard co-located with the schema config).