-
Notifications
You must be signed in to change notification settings - Fork 216
Show warning when restarting job and assets are missing #2676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show warning when restarting job and assets are missing #2676
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Our change to look in the config in the new location broke other tests, e.g.
t/ui/18-tests-details.t ................ Can't call method "config" on an undefined value at /home/squamata/project/lib/OpenQA/Schema/Result/Assets.pm line 172.
t/ui/18-tests-details.t ................ 1/?
# Failed test 'on main page'
# at t/ui/18-tests-details.t line 100.
# got: ''
# expected: 'openQA'
I would like to avoid that we need to add the boiler-plate fake app code in all affected test modules though. Maybe we should fall back to a default value in the code if the whole "config" can not be found or we change the "app->config" lookup in general to make that easier in general as well as for testing. As we already discussed the topic of the "global $app variable" yesterday, I wonder what's a better approach, e.g. have a method returning the app instead as a first step?
| @@ -164,13 +164,18 @@ sub refresh_size { | |||
| return $new_size; | |||
| } | |||
|
|
|||
| sub hidden { | |||
| sub is_type_hidden { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems weird to have this as a public function in a result class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If OpenQA::Setup wasn't so chaotic i would say just put it there as a config utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be good finding a better place for this function. But for now this is likely the least problem with that draft.
Me, too. And yes, it would make sense to let the code handle this instead of providing an app within all tests. |
|
I suppose it is better to make this a warning as a first step to avoid potential problems. Then we can decide what we actually want to do if an asset is missing:
The check for missing assets introduced here would be useful regardless whether we implement option 1 or 2 in the end. |
ffdc09d to
4f14198
Compare
4f14198 to
1ed83a6
Compare
1ed83a6 to
9f4890e
Compare
Codecov Report
@@ Coverage Diff @@
## master #2676 +/- ##
==========================================
- Coverage 92.03% 91.98% -0.05%
==========================================
Files 190 190
Lines 11764 11779 +15
==========================================
+ Hits 10827 10835 +8
- Misses 937 944 +7
Continue to review full report at Codecov.
|
|
"There's one ticket which suggests the download should be restarted" I'd quite like this, FWIW. Also what about the case where the asset was generated by a chained parent but has now been garbage-collected? Restart the parent? |
|
Yes, if the asset was provided by the parent this is what I suggest for now: prevent the restart of the child. The parent can still be restarted which would generate the necessary asset and also trigger the child |
|
This (and the later change making it an error) breaks restarting a job that has its hard disk image created by the parent job. I get: The asset is there, but is named 00007628-disk_unencrypted_updated.qcow2 (because it is created by the parent job). PS I'd comment on progress.opensuse.org, but my user is not allowed there ("marmarek was authenticated but needs to be created and/or activated in Redmine first."). |
|
I'll look into it on Monday. |
|
oh, yeah, the name check will need to do the same as the code that modifies asset names on upload I guess. I think that's this bit here. |
|
So, looking a bit more: |
|
I also don't understand why this ignores hidden assets. Hidden assets are hidden from the web UI. It is still going to be a problem if the job needs to access them but they don't exist - why are we ignoring them? In Fedora openQA we set ISOs and disk images to be "hidden" because we don't want people to be able to download them from the web UI (our server doesn't really have the bandwidth for that), so that means this check will just ignore them and be more or less useless for us. Did you really want to exclude repos from the check, and this just 'works' because the default config is that only repos are hidden? |
|
@AdamWill The point of "hidden assets" is that these are "hidden dependencies" which we don't want to (try to) clone on |
* Otherwise the check for missing assets is unable to find the asset under its adjusted name * See os-autoinst#2676 (comment)
|
@marmarek It turns out that this is not that trivial to implement because the job settings don't have the (parent) job ID which is added on asset creation. It would be possible to apply the same logic as in Another idea is to simply adjust the job settings when naming the asset: #2961 I could also add a "Force restart" button to make it at least not so inconvenient to restart the job when a false alert happens. |
|
@okurz we are not "abusing it for something else". The hidden assets feature you're using is specifically designed for concealing assets from display as downloadable by the web UI. Once again I can tell you this because I wrote it. :) At the time it was written it was only used for this purpose, so if |
|
Mh... your commit makes repos the only default "hidden asset". That's why I though of repos as an example of a hidden asset: Something we can not download locally and don't display on the web page. Apparently it was only about the last property, though. (The commit message make that clear.) But the clone script didn't even adopt the feature. It has a hard-coded exception for repos only for the "can not download locally part": So maybe we should open yet another category for not clonable assets if we want to avoid hardcoding in the clone script. These assets could then be ignored by the check for missing assets and not all hidden assets. The good thing is that this is not a regression for you. It simply means you can not benefit from the new feature to prevent starting jobs with missing assets. But that feature still has its problems anyways. |
|
right, in practice I'm happy that bug exists, but it's still a bug =) Before my commit, this was similarly hardcoded for the web UI: it just hardcoded "don't show repo assets". My commit made that a config setting instead. I left the default value as just repos, because that's what SUSE wants (wanted?), but for Fedora, ever since that commit we've had that setting set to also hide ISOs and hard disk images. And yeah, I agree it makes sense to abstract out the concept of "non-clonable assets", it just needs to be different from "not-displayed-in-the-web-UI assets". |
Ideally a proper solution would be better. But in the meantime, is there a single line I can comment out to workaround the issue? I'm unable to restart most of the jobs right now... |
|
You can still force the restart via the API (using the query parameter |
|
I didn't know |
Inspired by register_assets_from_settings, check using _asset_find() instead of database query. What does matter here is not really looking for the actual file, but that _asset_find() also consider private assets named after parent job id. For this purpose, query also parent(s) job ids in missing_assets(). Related to os-autoinst#2676
When checking for missing assets, consider also private assets uploaded by the parent job (with name prefixes by the parent id). For this purpose, factor out _parent_job_ids() from register_assets_from_settings() and use it in missing_assets() too. Related to os-autoinst#2676 Co-authored-by: Marius Kittler <mkittler@suse.de>
When checking for missing assets, consider also private assets uploaded by the parent job (with name prefixes by the parent id). For this purpose, factor out _parent_job_ids() from register_assets_from_settings() and use it in missing_assets() too. Related to os-autoinst#2676 Co-authored-by: Marius Kittler <mkittler@suse.de>
When checking for missing assets, consider also private assets uploaded by the parent job (with name prefixes by the parent id). For this purpose, factor out _parent_job_ids() from register_assets_from_settings() and use it in missing_assets() too. Related to os-autoinst#2676 Co-authored-by: Marius Kittler <mkittler@suse.de>
When checking for missing assets, consider also private assets uploaded by the parent job (with name prefixes by the parent id). For this purpose, factor out _parent_job_ids() from register_assets_from_settings() and use it in missing_assets() too. Related to os-autoinst#2676 Co-authored-by: Marius Kittler <mkittler@suse.de>
When checking for missing assets, consider also private assets uploaded by the parent job (with name prefixes by the parent id). For this purpose, factor out _parent_job_ids() from register_assets_from_settings() and use it in missing_assets() too. Related to os-autoinst#2676 Co-authored-by: Marius Kittler <mkittler@suse.de>
|
ok, so that fixes the file name calculation, but is someone planning to fix the incorrect use of |
|
Not sure what incorrect use you mean. But there is #2978 to enable forced restart over the UI as well so is only about internal semantics done right or is there a broken feature behind that you want to see fixed? |
|
@okurz I'm talking about the discussion we had last week where we determined that this code uses |
As discussed in os-autoinst#2676 (comment) and follow-ups, this check misunderstands what the "hidden" type means. The code assumes that "hidden" assets are the same thing as "assets we don't really want to copy down when cloning jobs", but they are not. The "hidden" attribute was written (by me) to mean "asset types not to be shown for downloading in the web UI". For SUSE, it happens to be the case that ["repo"] would be the right array of both "not-to-be-shown-in-the-web-ui assets" and "non-clonable assets", so this bug wasn't apparent, as SUSE deployments leave the `hide_asset_types` config setting at its default value of just 'repo'. But on Fedora deployments, this setting is changed to 'repo iso hdd' (because we don't want to show those asset types for download in the web UI), so this code also ignored ISO and HDD assets when checking for "missing" assets, which we don't want. As discussed in the pull request we could potentially make this a configurable attribute and have the clone_job script use it too, but doing that is a bit harder, and I don't think for now anyone wants a different definition of "non-clonable assets", so it doesn't seem really necessary. Signed-off-by: Adam Williamson <awilliam@redhat.com>
As discussed in os-autoinst#2676 (comment) and follow-ups, this check misunderstands the "hidden" attribute. The code assumes that "hidden" assets are the same thing as "assets we don't really want to copy down when cloning jobs", but they are not. The "hidden" attribute was written (by me) to mean "asset types not to be shown for downloading in the web UI". For SUSE, it happens to be the case that ["repo"] would be the right array of both "not-to-be-shown-in-the-web-ui assets" and "non-clonable assets", so this bug wasn't apparent, as SUSE deployments leave the `hide_asset_types` config setting at its default value of just 'repo'. But on Fedora deployments, this setting is changed to 'repo iso hdd' (because we don't want to show those asset types for download in the web UI), so this code also ignored ISO and HDD assets when checking for "missing" assets, which we don't want. As discussed in the pull request we could potentially make this a configurable attribute and have the clone_job script use it too, but doing that is a bit harder, and I don't think for now anyone wants a different definition of "non-clonable assets", so it doesn't seem really necessary. Signed-off-by: Adam Williamson <awilliam@redhat.com>
|
I sent #3017 which just hardcodes 'repo', for now, since at least that's correct. I looked at adding a |
As discussed in os-autoinst#2676 (comment) and follow-ups, this check misunderstands the "hidden" attribute. The code assumes that "hidden" assets are the same thing as "assets we don't really want to copy down when cloning jobs", but they are not. The "hidden" attribute was written (by me) to mean "asset types not to be shown for downloading in the web UI". For SUSE, it happens to be the case that ["repo"] would be the right array of both "not-to-be-shown-in-the-web-ui assets" and "non-clonable assets", so this bug wasn't apparent, as SUSE deployments leave the `hide_asset_types` config setting at its default value of just 'repo'. But on Fedora deployments, this setting is changed to 'repo iso hdd' (because we don't want to show those asset types for download in the web UI), so this code also ignored ISO and HDD assets when checking for "missing" assets, which we don't want. As discussed in the pull request we could potentially make this a configurable attribute and have the clone_job script use it too, but doing that is a bit harder, and I don't think for now anyone wants a different definition of "non-clonable assets", so it doesn't seem really necessary. Signed-off-by: Adam Williamson <awilliam@redhat.com>
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
The retry commit is now possible to rever thanks to the merged pr of Dominik Heidler os-autoinst#2676 This will revert the changes made by prs that mitigate the problem: os-autoinst#6244 and os-autoinst#6115 related: https://progress.opensuse.org/issues/175060
See #2676 (comment) for the current state.
Related issue: https://progress.opensuse.org/issues/34783
This is WIP as it turns out to be not so easy as expected:
Note that this is only about restarting assets to prevent the creation of a clone in the first place.