New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
treat kernels / initrds as assets, allow download of all assets #535
Conversation
I took a brief look and looks fine (except failed tidy :) ), but one thing is worrying me, and should be worrying since introduction of ISOURL. |
That was something I considered with the |
sounds reasonable to me - if it comes with proper error handling. If you post a job and get a 404 in case you're not in the whitelist, you will get paranoid quickly :) |
My plan was to have |
So there's a slight fun thing there, which is that
I think that regex is the appropriate way to validate - basically the URL's host must end with one of the strings in the whitelisted host list. So if you whitelisted I'm not sure if it'd be worth including a warning above |
f675d4a
to
74c3e33
Compare
So I managed to test the whitelisting stuff; it seems to work fine (with one small fix - ten points to whoever spots the error in the diff :>), so I've updated the pull request. |
LGTM, but it's a bit complicated so I would like @aaannz to merge if ok |
74c3e33
to
34cbbe7
Compare
Just rebased to master and fixed the |
34cbbe7
to
634e2ec
Compare
...and added a bit more to the comment in the config file explaining that the option must be set to enable asset downloading. |
LGTM, one thought though. Am I overthinking this if check against whitelisted domains is done in Iso controller and not in gru jobs where the actual download is done? |
I didn't really think about doing it that way. At first I guess I assumed gru doesn't have the config file settings available to it, but now I look, I think maybe it does, which would make that viable. I agree with your assessment. We could possibly put a 'check whitelist' function in shared code somewhere ( |
I didn't check GRU to have access to config. It does have access to some mojo app, but I'm not certain how it all works together. I however doubt the So what I think to do is to move whiletlist check from |
if Gru doesn't have the config, how does it know it should or should not log debug? |
actually, gru is a webapi plugin, so its basically a webapi without the http stuff ;) |
So it does run startup and ignore routing? And starts another worker checker? Hmm.. indeed. I just tried to use bogus auth plugin and GRU fails to run due to requiring non-existent module. |
I have a feeling there are some assumptions in |
well, the gru service is gru run - which triggers another run function than the one in Mojo::Base |
I'll investigate when I have a bit of spare time, I guess =) |
634e2ec
to
2def19c
Compare
Whew, OK, maybe this is good enough now? :) It seems the So this does what I proposed: the check code is moved to I tested this, and it really seems to work. To test that the 'failsafe' check really works you can comment out the whole block in |
2def19c
to
95ecd1a
Compare
Small note on this: it doesn't explicitly handle the case of someone passing complete garbage as the URL. However, this should be handled OK, because we use Mojo's own download code, and it won't be able to download anything we can't parse a host out of. In practice if you set a garbage URL it will fail the check anyway, as |
LGTM |
# subroutine as parse_assets_from_settings | ||
my $assettype = asset_type_from_setting($short); | ||
# We're only going to allow downloading of asset types | ||
die "_URL downloading only allowed for asset types! $short is not an asset type" unless ($assettype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about dying here. I think this will kill the Scheduler DBUS handler and it will refuse to schedule new isos until rebooted. Sadly we don't have any text feedback from scheduler back to webapi, so I would just return empty array to signalize no job was created.
One comment, the rest LGTM. |
Thank you! I didn't think to check that. I will check it out and change as appropriate. |
95ecd1a
to
3c36fba
Compare
There are several other points where we
so does that mean those will also kill the scheduler entirely? |
3c36fba
to
b3a6afb
Compare
Still, how about this: just log the issue as a warning and continue with the next param. This is more or less equivalent to what we do on other failure cases in this path, we just go ahead and run the jobs. Perhaps someone's using a custom param that happens to end in |
1a4b3d9
to
8714013
Compare
LGTM, what about the coverage? Lets punish @AdamWill with some more test writing? I'm not completely familiar how coveralls compute test coverage. Maybe expanding scheduler tests with one testing if GRU task is created in db for iso scheduled with ISO_URL is sufficient for that 0.2% drop. |
this coverage is not computer by coveralls but by "cover". I wouldn't call it punishment. Yes, I recommend to extend the tests, i.e. @AdamWill are you willing to do this? |
Didn't we go over this before? AFAICS there is currently no test code that hits one problem I see immediately is that the test database has no job templates, so I don't think edit: hmm, I see there's some stuff in |
8714013
to
a035f83
Compare
there, have some goddamn tests...they can't go in |
this is to make it possible to do 'remote scheduling' (where the scheduler runs on some machine which does not have access to the openQA 'factory' directory) of jobs that boot using a kernel and/or initrd and/or hdd image, just as the 'ISOURL' feature made it possible for jobs that boot using an ISO. I had to change a few other things to be happy with it, though. The convention now is you can request openQA download for any asset type, by adding _URL to the appropriate setting name. To download the ISO set 'ISO_URL', to download a kernel set 'KERNEL_URL', and so on. The POST will error out if you try to download something that is *not* an asset type. This is because we can't really guess where we should store non-assets. As with the old ISOURL code, if you set the normal setting (ISO, KERNEL etc.), that value will be used as the filename of the downloaded file, otherwise the original filename will be split out from the URL and used. Of course, in order to achieve the original goal, I had to make kernels and initrds be treated as assets :) In garretraziel's initial patch they were not. We have another good reason to treat them as assets, though - it will allow us to clean them up with the limit_assets gru task (this patch does not change that yet, but it's necessary to make it possible). For now we treat kernels and initrds as 'other' assets, they could be given a new asset type instead if we think that's a good idea. HDDs of course already have an asset type. It seemed clean to factor the 'figure out the asset type for a given setting' code in parse_assets_from_settings, because we need to do that same thing in job_schedule_iso now, to decide where the asset should be downloaded to (we need to know its type). So this adds 'asset_type_from_setting' to Utils, and has both parse_assets_from_settings and job_schedule_iso use it. It's also necessary to define OTHER_DIR (like HDD_DIR and ISO_DIR) in Common.pm so isotovideo can use it. The code for attaching the appropriate path to the KERNEL and INITRD values is almost identical to the code for ISO, but it's not easy to reuse unfortunately. As suggested by @aaannz, this also adds whitelisting for asset download domains. Allowing asset download from anywhere on the internet adds a potential attack vector by compromising the credentials (especially the API key / secret) of an admin: an attacker could craft an ISO or other asset to try and break out of a worker VM and attack the worker host (which may also be the server, in a small deployment). We mitigate this with the download_domains config option, which specifies the domains from which assets can be downloaded; asset download requests for files from any other domain will be rejected. The check is done twice: when the API handles the iso POST request (with a nice error message returned to the user if the check fails) and also directly by download_asset, as a safety measure in case someone somehow finds a way to bypass the API and schedule a GRU task directly. Finally I added a bit more handling of potential error cases in download_asset (renamed from download_iso) to try and avoid triggering the Eternal Gru Failure Loop Of Pain (it now checks if it can write to the target path, and catches errors when moving the temporary file to the final location, since we found out that was possible).
a035f83
to
c7699c7
Compare
I would have preferred tests write like -# check we have no gru download tasks to start with
-my @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'});
-ok(scalar @tasks == 0);
is(scalar @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'}), 0,
'check we have no gru download tasks to start with'); but better @aaannz just merges this before the swearing continues ;-) |
how bout this, you merge this, then send a PR to rewrite them however you please, and I'll ack it ;) |
treat kernels / initrds as assets, allow download of all assets
Tada..! |
# the file at that URL before running the job | ||
my %downloads = (); | ||
for my $arg (keys %args) { | ||
next unless ($arg =~ /_URL$/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hit on a rather innocent URL in our tests - is there a way to only trigger on URLs you would also download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I'm not sure I quite get the question. Are you saying you happen to have an existing parameter which ends in _URL which isn't for downloading?
Hmm - I guess what we can do is reverse the ordering just below this chunk, so we check if its a downloadable type before we generate the shortened param, and complain less when it's not downloadable? Sound good?
this is to make it possible to do 'remote scheduling' (where
the scheduler runs on some machine which does not have access
to the openQA 'factory' directory) of jobs that boot using a
kernel and/or initrd and/or hdd image, just as the 'ISOURL'
feature made it possible for jobs that boot using an ISO. I had
to change a few other things to be happy with it, though.
The convention now is you can request openQA download for any
asset type, by adding _URL to the appropriate setting name. To
download the ISO set 'ISO_URL', to download a kernel set
'KERNEL_URL', and so on. The POST will error out if you try to
download something that is not an asset type. This is because
we can't really guess where we should store non-assets.
As with the old ISOURL code, if you set the normal setting
(ISO, KERNEL etc.), that value will be used as the filename of
the downloaded file, otherwise the original filename will be
split out from the URL and used.
Of course, in order to achieve the original goal, I had to make
kernels and initrds be treated as assets :) In garretraziel's
initial patch they were not. We have another good reason to
treat them as assets, though - it will allow us to clean them
up with the limit_assets gru task (this patch does not change
that yet, but it's necessary to make it possible). For now we
treat kernels and initrds as 'other' assets, they could be
given a new asset type instead if we think that's a good idea.
HDDs of course already have an asset type.
It seemed clean to factor the 'figure out the asset type for
a given setting' code in parse_assets_from_settings, because we
need to do that same thing in job_schedule_iso now, to decide
where the asset should be downloaded to (we need to know its
type). So this adds 'asset_type_from_setting' to Utils, and
has both parse_assets_from_settings and job_schedule_iso use
it.
It's also necessary to define OTHER_DIR (like HDD_DIR and
ISO_DIR) in Common.pm so isotovideo can use it. The code for
attaching the appropriate path to the KERNEL and INITRD values
is almost identical to the code for ISO, but it's not easy to
reuse unfortunately.
Finally I added a bit more handling of potential error cases
in download_asset (renamed from download_iso) to try and avoid
triggering the Eternal Gru Failure Loop Of Pain (it now checks
if it can write to the target path, and catches errors when
moving the temporary file to the final location, since we found
out that was possible).