-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ref(celery): Make renaming tasks more painless #29421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tonyo @oioki: This PR also updates the metrics associated with symbolication tasks to reflect the namespace change that was made in #28973. This will most likely cause a minor blip in our current metrics dashboards, which I plan to update if/after this gets merged. I'd like to get your thoughts on this in case this should be reverted, or if there needs to be additional prep work done in anticipation of this change. |
src/sentry/tasks/base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this registers a handler for any task with the given name?
Meaning that we register the same handler with multiple names, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this registers the handler with all of the provided legacy names.
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this, it's too much complexity and moving for very little gain. Why can't we keep those things being named by their original task names forever?
|
See also #28973 (comment) which is related to this opinion 😄 |
These names are currently being generated in a way such that they have some amount of meaning associated with the actual task itself. celery's documentation also encourages using module names as namespaces. Semantic names like these are expected to change over time to reflect changes in the code, its use cases, and its ownership. Keeping these immutable and pinned to their original value applies restrictions to these names that poorly reflect how they're being used and interpreted. Let's consider the scenario that motivated this change to begin with: subsections of a shared file, #28973 (comment) suggests to rename the paths of the files such that they conform to the existing naming task scheme: I think that the greater problem that this unearths is that perhaps such names shouldn't be assigned any meaning if they're expected to stay static through a task's entire lifetime. If names were generated with zero or minimal association with the actual task's purpose, then it wouldn't make sense to change task names under most circumstances, barring maybe something like an accidental naming collision. The change in this PR at least opens up the ability to re-assign names if needed, buying us time to at least have a discussion about the naming scheme of tasks and if the issue outlined above warrants some reconsideration about how tasks are named. |
The Celery documentation describes in details how it can auto-name tasks based on the modules. I wouldn't claim (or at least I didn't see) that it encourages that. I think the sentry code base chose a very good pattern of making the task names explicit and to not let them depend on the modules where the code happens to live. This exactly decouples the task naming from the module naming which is great for refactoring.
I would argue that the current task naming is logical: We have a bunch of tasks that are all related to the part of the pipeline that's involved in "storing" a received event: Now the implementation in this single module is getting unwieldy for a number of reasons and we want to move the code around. I already argued why I don't think renaming the tasks is appropriate, so it's just a question of how we would like to structure the code of the implementation. The current I by far feel not as strong about where the split up code ends up living as to not renaming the tasks. I'm fine with the split PR as is. Also because I now (think I) understand how Celery finds tasks and it is fairly easy to make sure the tasks are available in the workers after moving things around I think if further restructuring of code is desired for something it is not something that will be that difficult.
This is maybe going slightly off topic, but naming tasks logically makes sense to me. It makes understanding the code and architecture much easier. It makes debugging easier, it facilitates prod hot fixes and prodding in the prod data during incidents etc. |
I'm making a nitpicky point here, but it states this underneath the linked section:
Which reads pretty heavily to me as encouragement, or a recommendation. The documentation does explain how its auto-naming behaviour behaviour works as you've mentioned, but this is a recommendation outside of that explanation to use the module name.
I'm confused about this assertion, particularly the bit about decoupling: As far as I can tell, a large number of task names are scoped by the modules they live in: everything in Regarding the actual naming choice of
Perhaps this is splitting hairs, I would be hesitant to agree that a majority of the tasks that live in All of the other functions in both files are concerned about processing events prior to their storage, and simply mutate events while letting those 4-5 aforementioned functions take care of the actual work of storing them. This means that approximately 5/23 functions in these files, or 1/10 tasks are actually related to event storage. As a result, this leads me to believe that "store" is more of a misnomer than an accurate name for these tasks. Despite this, I'm not willing to die on a hill for this specific name change, and if you feel strongly enough that "store" is an appropriate name then I can drop those parts of the diff from the PR. Hair splitting about names aside, my main concern here is how to make it easier to rename tasks. I would be open to other ideas on how one could restructure existing modules that contain tasks while avoiding the issues outlined in the PR description (ie event dropping). In other words, I'm curious as to what you mean by the below statement:
I'm assuming that this is saying that it's actually fairly easy to ensure that tasks whose names have been changed will not experience a loss in events after the change is deployed. Could you elaborate on this, if you feel that it's a better solution to the problem than what's proposed in the PR?
I feel this at least agrees with what this PR is trying to accomplish. All it's trying to do is to make it easier for task names to correctly reflect their purpose and/or location in the codebase, instead of legacy names that cannot be easily changed. The point that paragraph is making is that I don't think that a value like a task's name should be used in a field that is expected to be set in stone and immutable after conception. I think a potential solution could be to use a stable, less meaningful identifier for a task name for the purposes of registering it with celery, and a secondary name that prioritizes human-readability. |
f9bdbcf to
01cfc26
Compare
01cfc26 to
414fcad
Compare
Ok, fine. I'll go a step further and say I disagree with celery's assertion in that case 😄
What I mean is that every invocation of
My main assertion is that is it not desirable to ever rename a task. My secondary assertion is that task names do not need to match module and function names and that these will diverge over time due to refactoring, and that this is fine.
By doing what was done in #28973: you can move the implementation of tasks without renaming the tasks. I've tried to explain to explain why I think it is a bad idea to rename tasks. If none of the reasoning is convincing and you really believe this PR is important than go ahead, I'm now assuming it is because you disagree with me rather than that I failed to explain myself. |
414fcad to
71cfdc2
Compare
|
considering the only usage of this change has now been removed, i'm closing this PR. |
There's currently a significant cost (well, greater than what it should be) associated with renaming celery tasks. As mentioned in #28973 (comment) which can be verified via the instructions found in #28973 (comment), renaming tasks will cause all in-flight tasks associated with the previous name to fail and be dropped. One must be willing to eat the cost of these dropped in-flight tasks which could end up being a fairly costly decision, meaning that there's a fairly high impact associated with something which doesn't make any meaningful change to any of the logic in the code.
This attempts to make such changes more painless by adding an additional parameter
legacy_namesto theinstrumented_taskdecorator: if a task is renamed, previous names can be added to thelegacy_nameslist which will take care of making sure that any tasks associated with the previous name(s) will complete. Metrics associated with instrumented tasks has also been updated appropriately so it's possible to determine when a certain legacy name is no longer in use, and can be removed from the task.The nitty gritty implementation overview: This registers additional tasks with the celery registry, one per legacy name supplied in
legacy_names. None of these are associated with the actual task that is exposed and exported to other modules; Only the task associated with the current name as specified bynamewill be the one that is available and invokable by other modules.