-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of "violates foreign key constraint" exception still not working correctly #831
Comments
My updated version of @kderme 's test is as follows: insertForeignKeyMissing :: IO ()
insertForeignKeyMissing = do
time <- getCurrentTime
runDbNoLogging $ do
slid <- insertSlotLeader testSlotLeader
bid <- insertBlockChecked (blockZero slid)
txid <- insertTx (txZero bid)
phid <- insertPoolHash poolHash0
pmrid <- insertPoolMetadataRef $ poolMetadataRef txid phid
let fe = poolOfflineFetchError phid pmrid time
void . insertAbortForeignKey nullPrint . void $ insertPoolOfflineFetchError fe
count0 <- poolOfflineFetchErrorCount
assertBool (show count0 ++ "/= 1") (count0 == 1)
-- Delete the foreign key.
delete pmrid
-- Following insert should fail internally, but the exception should be caught
-- and swallowed.
void . insertAbortForeignKey nullPrint . void $ insertPoolOfflineFetchError fe
count1 <- poolOfflineFetchErrorCount
assertBool (show count1 ++ "/= 0") (count1 == 0)
-- Try again to insert the PoolOfflineFetchError
pmrid2 <- insertPoolMetadataRef $ poolMetadataRef txid phid
void . insertAbortForeignKey nullPrint .
void $ insertPoolOfflineFetchError (poolOfflineFetchError phid pmrid2 time)
count2 <- poolOfflineFetchErrorCount
assertBool (show count2 ++ "/= 1") (count2 == 1)
where
nullPrint :: Text -> IO ()
nullPrint = const $ pure ()
poolOfflineFetchErrorCount :: MonadIO m => ReaderT SqlBackend m Int
poolOfflineFetchErrorCount = do
ls :: [Entity PoolOfflineFetchError] <- selectList [] []
pure $ length ls This now fails with:
which is occurs on the line:
The problem is that the PostgreSQL row identifiers This particular test can be fixed by adding a second call to @kderme's PR #829 adds exception handlers other places, but because of the transaction abort issue, I do not think these are useful. They will catch and log the exception, but then cause problems elsewhere do to the data lost when the transaction was aborted. |
I have tried adding a |
The problem is the exception. As soon as that happens we are in a world of hurt. I think the solution might be a custom insert function (something like |
Can I just say again, publicly, how much I hate exceptions. |
Couple of possible "simple solutions":
|
I noticed we already do a commit per block. So this won't really have a huge performance impact. |
Sorry, that is not correct. We do a commit per block when we are following the chain, but only commit once every 1000 or so blocks when syncing (because syncing every block is a huge performance hit). |
where is 1000 specified? |
Sorry msi-remembered the In
so that is once per epoch. In
which is commit every new block, when we are following the chain tip but not when syncing. |
There is a new transaction for every |
This code has changed so many times its confusing. However, the code is: insertDefaultBlock
:: SqlBackend -> Trace IO Text -> SyncEnv -> [BlockDetails]
-> IO (Either SyncNodeError ())
insertDefaultBlock backend tracer env blockDetails = do
thisIsAnUglyHack tracer (envLedger env)
DB.runDbIohkLogging backend tracer $
runExceptT (traverse_ insert blockDetails) So the commit is done for every call to However, this does mean that some of the other |
Actually we don't call
That's rare enough to add a commit. If we add a 'transactionCommit' first thing in |
But this is all getting away from the problem. The problem is the excepton. We should be finding a way to avoid that rather than trying to fix the issues resulting from the exception, |
Well for the problem I think we need to go back to #806 (comment). We have a separate thread which collects |
Also I don't think it's the exception that causes the issue. Postgress aborts the transaction and returns an error code. The error code is translated to an exception in postgresql-simple. We catch the exception, but the transaction is still aborted in postgres, which causes the error |
But restarting the transaction would require a roll forward of all the insertions that were aborted. That would be a huge book keeping nightmare.
No, the separate thread is there is to remove the actual HTTP fetch from the actual main code path. We need that thread because each HTTP fetch takes at least 10 seconds (and fetches that fail can take 30 seconds or more). I still think the cleanest and nicest solution is to avoid the exception, even if that means that when we insert the I think we can write a custom insert that first checks if the foreign key is valid and only if it is does the regular insert and if the foreign key is invalid, it does nothing. |
Yes I start to agree, handling the exception properly through all the layers: persistent, postgresql-simple woule be a nightmare.
Actually thinking about this again, there won't be any race condition. There is no other thread inserting in the db or doing rollbacks, so checking before inserting is 100% reliable. |
Threads are hard, but I like threads. Exceptions I just hate. |
fix here #835 |
FYI it happened again today on multiple instances running this version at block 6266650. Here's the log in case it might help:
|
Yes @1000101 , that version is known to still have this issue. |
This avoids the race condition described in the ticket #806 below. It does so by ensuring the required foreign keys exist before the insert. If the required foreign keys do not exist the data is just dropped and will be refetched later. Cherry pick of d658f38 from master. Closes: #806 Closes: #823 Closes: #831
The supposed "fix" in a105cb5 was not actually a fix. @kderme has a PR (#829) with a correction and a test, but even that is not sufficient.
This issue is more complex. I updated @kderme 's test and it seems that if an exception occurs, the whole transaction is aborted.
The text was updated successfully, but these errors were encountered: