Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Unexpected data loss modifying table #1481
Details for the issue
What did you do?
I added Not Null & Unique to the primary keys of some tables with the "Database structure" "Modify table" GUI of DB4S.
What did you expect to see?
Just the attributes changed
What did you see instead?
Entire dependent table emptied. (!)
Here is what I think happened:
The dependent table has a number of fields that use FOREIGN KEY clauses to reference the tables where I added "Not Null" and "Unique". The "references" clauses also include
The (clever DB4S) implementation of checking "NN" and "U" is that the tables are copied to a temporary table; the original is DROPped and the temporary table is renamed to replace the original. (This does take a while with a several million row table...)
The DROP caused an implicit DELETE of all rows, triggering the dependent table's "on delete" clause per, emptying it.
This all makes sense at the statement level - but is unexpected in the GUI.
You appear to be trying to handle this by bracketing the sequence with
PRAGMA defer_foreign_keys = ;
However, this doesn't prevent the implicit DELETEs. I suspect that the (untested) easy fix is to use
Since you are replicating the existing table's contents, the result should not cause any (new) foreign key constraint violations.
Here is an abbreviated copy of the SQL log in one such sequence:
Useful extra information
The info below often helps, please fill it out if you're able to. :)
What operating system are you using?
What is your DB4S version?
Did you also
Also, as noted the copy/drop/rename sequence can be quite slow on a large table. And if you hit the wrong box, or change your mind, you wait for each click...
I don't see why the action is triggered on each click of a checkbox. It would be better to aggregate all changes and apply them when OK is clicked.
If the concern is that the GUI reflect the DB state, you could display modified items in a different color, footnoted as "pending".
This is less severe than the data loss, but still annoying. (The copy/drop/rename time is measured in minutes for a modest 2M row table...).
I can't speak to the SQL history. I can speak to my experience.
I believe that the code you referenced is successful in solving a different, but related problem.
By enabling "defer_foreign_keys", it prevents the DROP from failing, since consistency rules are applied at COMMIT time - and by then, everything is back in place.
This doesn't address my case - the DROP creates an implied DELETE, which runs the ON DELETE CASCADE clauses - deleting rows in the dependent table. In my case, all of them.
With respect to these issues, defer_foreign_keys is a subset of the required effect. It will prevent the DROP from failing. It won't prevent the DROP from cascading DELETEs to the tables referring to the table with foreign keys.
Setting foreign_keys to zero addresses both scenarios.
However, the comments indicate that you can't disable foreign keys because you're in a transaction. Which means that either you have to commit the transaction; roll it back, change the table, open a new transaction, and replay the transaction; or defer these actions until the transaction is committed.
Note that the performance issue of doing the copy/drop/rename sequence for every checkbox click militates toward deferral for other reasons...
The fact that a single click can cause (silent) data loss is worrisome. Also, that the natural reaction to clicking a checkbox by mistake is to click it again - but while that restores the checkbox, it doesn't restore the deleted records.
It occurred to me that until you come up with an implement a transparent solution, you might want to disable anything that does an implicit DROP if foreign keys are enabled and any table has an ON clause associated with a foreign key. Note that ON DELETE SET (NULL, DEFAULT) can cause as much damage as CASCADE. While less obvious than emptying an entire table, losing references corrupts a database.
If you were to implement this, the user can still do anything by manually disabling foreign keys and applying any pending changes first.
While passing responsibility to the user is ugly, this seems preferable to allowing silent data loss/corruption...
The only solution I see is to open a new transaction just for the table modification. It's the only way to disable the foreign_key pragma. This will force saving the changes before the table modification and after. I'm trying to implement this following the recommendations in:
added a commit
Aug 7, 2018
referenced this issue
Aug 7, 2018
I've opened a PR for this, since it has a drawback (needing to save the changes before and after the table modification, for the pragma foreign_keys changes to take effect). I'm also unsure about deleting or not the use of the pragma defer_foreign_keys. It is still there in the source code.
I don't think you need to prompt the user unless there are pending changes to commit. I didn't see a test for that in your commit.
If foreign_keys is OFF, defer_foreign_keys is irrelevant, so unless it's reachable in another path through the code I think that it can be removed.
lang_altertable seems to be the right reference. The remark "can make other arbitrary changes to the format of a table using a simple sequence of operations" seems a bit, er, oversimplified :-)
FWIW - a few random thoughts. I haven't read the DB4S code, so perhaps you handle some or all of them already.
There seem to be a number of potential failure cases that might cause problems:
At one point, I'm pretty sure that DROP TABLE was considered a DDL change and could not be rolled back.
On the other hand, this simple test seems to say that it (sometimes) can:
It may be worth verifying with the SQLite folks that DROP is (now) something that can always be rolled-back if something goes wrong. If so, they should update their documentation. And If I'm correct that this has not always been the case, what version number you need to check for before relying on it.
Considering the complexity, this would seem to be a good time to add "backup database" to the "Tools" menu. (I expose the built-in backup in my applications - it's quite fast.) You may want to offer it before executing the complex metadata changes.
added a commit
Aug 9, 2018
We'll still need to figure out the rollback of a DROP though. With the current approach we just assume that works...
Regarding the other points:
We check for errors when dropping the table. If there is an error you'll get a message box and we roll back.
I noticed that too. We'll need to rewrite parts of the alter table code anyway to allow multiple changes at once without copying the entire table every time. It makes sense to keep that second procedure in mind when doing this.
It is. Especially for triggers it's very hard because they can contain all sorts of SQL statements with references to a table and its structure. We currently just add the triggers as they were before and hope for the best. If there is an error we show that to the user along with the problematic SQL statement and tell them to fix that statement.
Views are as hard as triggers. But as far as I remember views persist even if the referenced table is deleted. So we don't even know what view is affected by the changes and can't even tell the user about that. And that is for all statements, even simple non-WITH ones.
@mgrojo is doing that in his PR. So it should be fine
That's a good point. We might want to look into that. Thanks