New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better async job status handling #2851
Conversation
Thanks for the pull request! Here is what will happen next:
Thank you for contributing! |
ee90df9
to
1357ae1
Compare
LIMIT $1 | ||
); | ||
END | ||
$$ LANGUAGE plpgsql VOLATILE"#, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these functions can be implemented in SQL rather than PL/pgSQL. Functions that are implemented in SQL are "transparent" to the planner, allowing it to optimize execution in ways that it can't when given the "black box" of a PL/pgSQL function.
Granted, for these simple functions, that's likely not going to make a huge impact, but always giving the system more information is probably a good habit to get into. As a nice bonus, you'll also get to drop the BEGIN
and END
keywords, so you get shorter functions 👅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah these are super simple functions that can go either way - we should look at a high level at our PL/pgSQL vs. SQL usage. PL/pgSQL seems to offer much more flexibility, including the ability to trap errors, better performance with plan caching when things don't get inlined, etc. It would be nice to just pick one language and stick with it so we can copy/paste more easily and use the patterns for that language. I've leaned toward pgSQL since that's what the vast bulk of our code is atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the to see as little PL/pgSQL as possible. Not such a big deal for these functions but removing as much complexity from the data layer even if it sacrifices some sugar is a +. The only exception is where there are blatant perf improvements to the PL/pgSQL route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're wrong about the PL/pgSQL planner conversation. Each function gets each statement automatically made into a prepared one, and then the actual execution plan gets cached and re-used. It's the most efficient thing possible for the planner - to get the same, you would need to have every statement prepared in advance manually. There is no benefit to using raw SQL functions most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also made a concerted decision to use functions as part of the contract between the application and the database. The alternative is putting this kind of logic in the application, which in large part is not better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. My primary concern is to avoid the temptation of including extra logic in postgres procs and keeping them in rust but that may be a different topic entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres is actually able to inline code and fold function bodies into outer queries in some cases when the function is written in SQL, but can't do that when it's in PL/pgSQL.
Another good thing about writing in SQL is that it forces you to express your problem in terms of set operations, which is almost without exception going to be better than expressing it iteratively. For instance, I helped @raskchanky refactor a PL/pgSQL function that was making n+1 insert statements into an SQL statement that does the same, but in a single operation. True, nothing is stopping you from executing that same single query in a PL/pgSQL function, but that seems like just adding noise at that point.
Certainly, if there are situations where we really do need the capabilities of PL/pgSQL (like exception handling, for example), then by all means, we should use PL/pgSQL. You can do a lot more things in SQL in Postgres these days, though, so that's becoming increasingly unnecessary, particularly for the kinds of things we're doing on this project. That, coupled with the clarity that being forced to express your problem in set-oriented operations, makes me reach for SQL first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya - the important thing is to not change the contract between the database and the application. We're consciously making the database interface be functions with versions in order to make forward migration and new behavior easy to add. It's not the only way to do that, but it's the way we chose, and we should make sure we don't change it accidentally. :) All things being equal, I'm 2/10 on wether a given function is written in SQL or PL/pgSQL. I'm 8/10 that we should keep the current function/app boundaries as they are, unless we want to really reconsider the entire approach.
net_error_msg text, | ||
build_started_at timestamptz, | ||
build_finished_at timestamptz | ||
) RETURNS SETOF job_status AS $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can our Rust database code deal with functions that return a single database type, or not? If not, that's fine; I was just thinking that declaring this as RETURNS job_status
would be ideal, since it's only ever going to return one; there's no real possibility of it returning multiple rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we can go either way, however the way we are doing things above is part of a pattern that is established to help keep things DRY. There's some explanatory comments at https://github.com/habitat-sh/habitat/blob/master/components/builder-jobsrv/src/data_store.rs#L124
JobState::Processing => "Processing", | ||
JobState::Complete => "Complete", | ||
JobState::Rejected => "Rejected", | ||
JobState::Failed => "Failed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement FromStr
(so we can use parse
) and Display
for JobState
instead, to handle the conversion to and from a String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I shoulda done that.
DateTime::<UTC>::from_str(job.get_build_finished_at()).unwrap(), | ||
), | ||
false => None, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These might be more concise as if
statements, rather than match
statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - some copy pasta here.
LIMIT $1; | ||
RETURN; | ||
END | ||
$$ LANGUAGE plpgsql VOLATILE"#)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name and argument type of this function suggests that you're getting the status for the job whose ID you're supplying; naming it something like get_oldest_active_job
, pop_job_queue
or similar would be more intention-revealing.
Also, is there any time we're calling this without an argument of 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will tweak the function name to make it clearer. The current usage is always 1, however like a number of other functions like the pending jobs ones, this is designed to eventually be able to handle more than just one status at a time.
FROM job_status | ||
ORDER BY created_at ASC | ||
LIMIT $1 | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be implemented in terms of a JOIN
with the output of get_job_status_v1
, rather than duplicating that function's logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will simplify this by deleting based on id, as per your other comment.
} | ||
|
||
// Clear the top job status from the queue | ||
self.datastore.clear_job_status(1)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could have the potential for subtle bugs; can we just remove, by ID, the entry for the job we just processed, instead of assuming that it's at the top of our queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed, I think that will be simpler as well - will update.
Ok(()) | ||
} | ||
|
||
pub fn get_job_status(&self, count: i32) -> Result<Vec<Job>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: naming applies here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will tweak.
I care more about not abandoning the function boundaries we have now. If
it's a SQL function rather than a pl/pgsql one, great. If we move more
logic into the application, we have to start rethinking the design for the
forward compatibility of the back end. Not that we can't do it, but that's
why it is the way it is.
…On Tue, Aug 1, 2017 at 10:55 AM Christopher Maier ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In components/builder-scheduler/src/data_store.rs
<#2851 (comment)>:
> + END
+ $$ LANGUAGE plpgsql VOLATILE"#)?;
+
+ migrator.migrate(
+ "scheduler",
+ r#"CREATE OR REPLACE FUNCTION delete_job_status_v1 (integer) RETURNS void AS $$
+ BEGIN
+ DELETE FROM job_status
+ WHERE id IN (
+ SELECT id
+ FROM job_status
+ ORDER BY created_at ASC
+ LIMIT $1
+ );
+ END
+ $$ LANGUAGE plpgsql VOLATILE"#,
Postgres is actually able to inline code and fold function bodies into
outer queries in some cases when the function is written in SQL, but can't
do that when it's in PL/pgSQL.
Another good thing about writing in SQL is that it forces you to express
your problem in terms of set operations, which is almost without exception
going to be better than expressing it iteratively. For instance, I helped
@raskchanky <https://github.com/raskchanky> refactor a PL/pgSQL function
that was making n+1 insert statements into an SQL statement that does the
same, but in a single operation. True, nothing is stopping you from
executing that same single query in a PL/pgSQL function, but that seems
like just adding noise at that point.
Certainly, if there are situations where we really do need the
capabilities of PL/pgSQL (like exception handling, for example), then by
all means, we should use PL/pgSQL. You can do a lot more things in SQL in
Postgres these days, though, so that's becoming increasingly unnecessary,
particularly for the kinds of things we're doing on this project. That,
coupled with the clarity that being forced to express your problem in
set-oriented operations, makes me reach for SQL first.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2851 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQ0IKkuCJVRFtIz2OvpyiGUExBpf1vks5sT2abgaJpZM4OnOxv>
.
|
1357ae1
to
574ce5c
Compare
4859382
to
2463c2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good overall, but thinking a bit more on it, I wonder if it'd be better to rethink how we're describing the functionality that's being added here. It seems like we're fundamentally talking about managing a queue in the database, but a lot of the naming is more narrow than that, and potentially confusing. We're not just dealing with the status of a job in a narrow sense; we're managing our queue of work. The more that our tables, functions, etc. reflect that intention, the clearer the code will be.
"Failed" => jobsrv::JobState::Failed, | ||
_ => return Err(Error::UnknownJobState), | ||
}; | ||
let job_state = jobsrv::JobState::from_str(&js).map_err( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit pick, but it seems like it's more idiomatic to use parse
rather than calling from_str
directly.
_ => (), | ||
} | ||
|
||
// Clear the top job status from the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't quite in line with the implementation anymore.
50da7a2
to
02a5c63
Compare
Signed-off-by: Salim Alam <salam@chef.io>
02a5c63
to
1194cc1
Compare
Looks good to me. In particular, this is a step toward the right thing regardless - it's either going to be fine for the long term, or need another refactor (one that deals with job volume/persistence in the database) - but assuming @christophermaier feels good, I say merge this. Nice work all around, btw - @chefsalim for the code, and the team for the review. |
I agree with @adamhjk... I really like how this ended up; talking about the queue as a queue really makes the code flow very nicely. Nice job @chefsalim. @thesentinels approve |
🤘 I am testing your branch against master before merging it. We do this to ensure that the master branch is never failing tests. |
Travis CI has started testing this PR. |
💖 Travis CI reports this PR passed. It always makes me feel nice when humans approve of one anothers work. I'm merging this PR now. I just want you and the contributor to answer me one question: |
This change fixes a race condition in the builder-scheduler where a job status can get lost. Now, the incoming job status is persisted by the handler thread, and later picked up from the datastore to be processed by the scheduler. Switching over to this mode also yields a simplification where we can eliminate the need to keep a separate socket around for status notifications.
Signed-off-by: Salim Alam salam@chef.io