Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule pending tasks during startup #168

Merged
merged 28 commits into from
Jun 22, 2022
Merged

Conversation

adzialocha
Copy link
Member

@adzialocha adzialocha commented Jun 17, 2022

Summary

  • New internal broadcast channel inside of Factory to send status updates ("pending", "completed") of tasks to external subscribers, they can subscribe via the on_update method
  • Materializer service ..
    • .. subscribes to these status updates and accordingly inserts pending tasks into a new tasks SQL table, and removes them again if they are "completed"
    • .. asks the database about pending queries once during startup and dispatches them so they get "rescheduled"

Notes

  • Added a nice e2e test for the materializer service (woho test-utils ✨ )
  • I've changed the GraphQL mutation now to not fail on SendError. It made testing a little bit annoying when you forgot to not drop your subscribers on the broadcast channel
  • Stumbled a bit over to_string by causing bugs with it and added a note about it here

Closing #162

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@adzialocha adzialocha changed the base branch from main to development June 17, 2022 11:51
@adzialocha adzialocha linked an issue Jun 17, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jun 17, 2022

Codecov Report

Merging #168 (5117402) into development (72df98b) will increase coverage by 0.54%.
The diff coverage is 94.73%.

@@               Coverage Diff               @@
##           development     #168      +/-   ##
===============================================
+ Coverage        88.79%   89.34%   +0.54%     
===============================================
  Files               42       43       +1     
  Lines             2302     2486     +184     
===============================================
+ Hits              2044     2221     +177     
- Misses             258      265       +7     
Impacted Files Coverage Δ
aquadoggo/src/db/provider.rs 87.50% <ø> (ø)
aquadoggo/src/db/stores/test_utils.rs 98.97% <ø> (ø)
aquadoggo/src/db/utils.rs 99.52% <ø> (ø)
aquadoggo/src/materializer/input.rs 100.00% <ø> (ø)
aquadoggo/src/db/stores/task.rs 91.52% <91.52%> (ø)
aquadoggo/src/materializer/service.rs 94.44% <95.77%> (+94.44%) ⬆️
aquadoggo/src/materializer/worker.rs 90.45% <97.22%> (+2.59%) ⬆️
aquadoggo/src/graphql/client/mutation.rs 93.10% <100.00%> (+0.12%) ⬆️
aquadoggo/src/materializer/tasks/dependency.rs 98.26% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72df98b...5117402. Read the comment docs.

@adzialocha adzialocha changed the title Reschedule tasks Reschedule pending tasks during startup Jun 17, 2022
@adzialocha adzialocha marked this pull request as ready for review June 17, 2022 17:13
Copy link
Member

@cafca cafca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 😎 👍

Comment on lines 25 to 42
let task_row = query_as::<_, TaskRow>(
"
SELECT
name,
document_id,
document_view_id
FROM
tasks
",
)
.fetch_optional(&self.pool)
.await
.map_err(|err| SqlStorageError::Transaction(err.to_string()))?;

// If yes, we are already done here
if task_row.is_some() {
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could spare this query with a unique constraint on the tasks table couldn't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, No 😅 When the pending tasks are loaded into the queue during rescheduling, they are not being removed from the database (yet). The worker itself will fire on_update with a new TaskStatus::Pending as soon as it got queued, which leads to a new row in the tasks table and ultimately to a duplicate.

The worker itself makes sure that no duplicates exist in the queues, but thats not the case with the database, this is why I've added that check before insertion.

I guess one could take the rows out of the database after we run get_tasks and before we queue them up. Or we need a new Factory::reschedule method which is used for rescheduling tasks which should lead to not firing TaskStatus::Pending? 🤔

DELETE FROM
tasks
WHERE
name IS $1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen IS used for comparison and couldn't find anything in the pg manual. Does it actually work? The standard operator is = and I only know IS as part of sth like IS DISTINCT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know! Funnily it didn't work when I did = .. and using IS it suddenly did. Maybe because there are NULL values involved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, made some experiments, and this works (I changed name to =, the others stay with IS):

DELETE FROM
  tasks
WHERE
  name = $1
  AND document_id IS $2
  AND document_view_id IS $3

And changing all to = fails, as document_id and document_view_id can be NULL and we need to exactly match that to fullfil the query.

aquadoggo/src/materializer/worker.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/worker.rs Outdated Show resolved Hide resolved
Copy link
Member

@sandreae sandreae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey hey! Great work 👍 this feels like an elegant/simple (as possible) solution to a tricky problem. Nice to have it in place already!

I had one comment on a method name but the rest is goood.

aquadoggo/src/materializer/worker.rs Outdated Show resolved Hide resolved
name TEXT NOT NULL,
document_id TEXT NULL,
document_view_id TEXT NULL,
PRIMARY KEY (name, document_id, document_view_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just stumbling about this primary key with null columns. If I understand correctly this is actually only possible in Sqlite:

According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or a STRICT table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.

https://www.sqlite.org/lang_createtable.html

@adzialocha
Copy link
Member Author

Beautiful! Thats a very elegant solution, thanks @cafca for the improvements! And I'm learning something new about SQL again.

@adzialocha adzialocha merged commit a4acac6 into development Jun 22, 2022
adzialocha added a commit that referenced this pull request Jun 27, 2022
* main:
  Update README.md
  UPSERT documents (#173)
  Run tests against PostgreSQL, fix compatibility (#170)
  End-to-end publishEntry tests (#167)
  Reschedule pending tasks during startup (#168)
@adzialocha adzialocha deleted the reschedule-tasks branch June 29, 2022 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-schedule unmaterialized operations during startup
3 participants