treat kernels / initrds as assets, allow download of all assets #535

AdamWill · 2016-02-04T16:13:26Z

this is to make it possible to do 'remote scheduling' (where
the scheduler runs on some machine which does not have access
to the openQA 'factory' directory) of jobs that boot using a
kernel and/or initrd and/or hdd image, just as the 'ISOURL'
feature made it possible for jobs that boot using an ISO. I had
to change a few other things to be happy with it, though.

The convention now is you can request openQA download for any
asset type, by adding _URL to the appropriate setting name. To
download the ISO set 'ISO_URL', to download a kernel set
'KERNEL_URL', and so on. The POST will error out if you try to
download something that is not an asset type. This is because
we can't really guess where we should store non-assets.

As with the old ISOURL code, if you set the normal setting
(ISO, KERNEL etc.), that value will be used as the filename of
the downloaded file, otherwise the original filename will be
split out from the URL and used.

Of course, in order to achieve the original goal, I had to make
kernels and initrds be treated as assets :) In garretraziel's
initial patch they were not. We have another good reason to
treat them as assets, though - it will allow us to clean them
up with the limit_assets gru task (this patch does not change
that yet, but it's necessary to make it possible). For now we
treat kernels and initrds as 'other' assets, they could be
given a new asset type instead if we think that's a good idea.
HDDs of course already have an asset type.

It seemed clean to factor the 'figure out the asset type for
a given setting' code in parse_assets_from_settings, because we
need to do that same thing in job_schedule_iso now, to decide
where the asset should be downloaded to (we need to know its
type). So this adds 'asset_type_from_setting' to Utils, and
has both parse_assets_from_settings and job_schedule_iso use
it.

It's also necessary to define OTHER_DIR (like HDD_DIR and
ISO_DIR) in Common.pm so isotovideo can use it. The code for
attaching the appropriate path to the KERNEL and INITRD values
is almost identical to the code for ISO, but it's not easy to
reuse unfortunately.

Finally I added a bit more handling of potential error cases
in download_asset (renamed from download_iso) to try and avoid
triggering the Eternal Gru Failure Loop Of Pain (it now checks
if it can write to the target path, and catches errors when
moving the temporary file to the final location, since we found
out that was possible).

aaannz · 2016-02-04T16:35:42Z

I took a brief look and looks fine (except failed tidy :) ), but one thing is worrying me, and should be worrying since introduction of ISOURL.
Everyone with operators authorization and higher can post REST requests including this one. So far, if someone's access credentials are compromised, he/she can mess with openQA, but is shielded from underlying system. Using this (and ISOURL from before) is actually very easy to craft some custom ISO or now other type of asset which can contain some e.g. qemu-kvm exploits, disguised as some standard utility called by our test jobs and thus attack openQA workers, usually firewalled from outside network.
I would like to have some domain whitelist in e.g. openqa.ini so that openQA will not download files from all over the internet if such event occurs.

AdamWill · 2016-02-04T17:34:48Z

That was something I considered with the ISOURL version, but I was kinda operating under the assumption that anyone who could post jobs would have write access somewhere on the box anyhow. The point about compromised credentials is a sensible one indeed, though. I can certainly add a whitelist, and perhaps we can leave it empty by default and require the admin to define it explicitly? Thanks for the suggestion.

coolo · 2016-02-05T07:36:45Z

sounds reasonable to me - if it comes with proper error handling. If you post a job and get a 404 in case you're not in the whitelist, you will get paranoid quickly :)

AdamWill · 2016-02-05T08:24:30Z

My plan was to have job_schedule_iso fail with an error message, a different error depending on if the whitelist is configured but the requested URL was not in it, or if the whitelist is not configured. IIRC errors from job_schedule_iso wind up in the HTTP return message, right?

AdamWill · 2016-02-05T13:28:39Z

So there's a slight fun thing there, which is that job_schedule_iso is in the scheduler and the scheduler has no config. So I figured, let's put it in the WebAPI ISO create method instead. I'm having trouble testing it as I'm at devconf with insanely slow wifi, so before I go any further, does this look conceptually OK to everyone, or should I have another think?

diff --git a/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm b/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
index 8d9ff58..bdde767 100644
--- a/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
+++ b/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
@@ -43,6 +43,32 @@ sub create {
     my %up_params = map { uc $_ => $params->{$_} } keys %$params;
     # restore URL encoded /
     my %params = map { $_ => $up_params{$_} =~ s@%2F@/@gr } keys %up_params;
+
+    # check asset download params against domain whitelist
+    @okdomains = split(/ /, $self->app->config->{global}->{download_domains});
+    for my $param (keys %params) {
+        next unless ($param =~ /_URL$/);
+        # special error if asset download requested but no whitelisted domains
+        unless (@okdomains) {
+            my $error = "Asset download requested but no domains whitelisted! Set download_domains";
+            $self->app->log->debug("$param - $params{$param}");
+            $self->res->message($error);
+            return $self->rendered(403);
+        }
+        my $host = Mojo::URL->new($params{$param})->host;
+        my $ok = 0;
+        for my $okdomain (@okdomains) {
+            my $quoted = qr/$okdomain/;
+            $ok = 1 if ($host =~ /${quoted}$/);
+        }
+        next if ($ok);
+        # we only get here if we weren't OK, i.e. host is not whitelisted
+        my $error = "Asset download requested from non-whitelisted host $host";
+        $self->app->log->debug("$param - $params{$param}");
+        $self->res->message($error);
+        return $self->rendered(403);
+    }
+
     $self->emit_event('openqa_iso_create', \%params);

     my $ids = $ipc->scheduler('job_schedule_iso', \%params);

I think that regex is the appropriate way to validate - basically the URL's host must end with one of the strings in the whitelisted host list. So if you whitelisted fedoraproject.org, files from dl.fedoraproject.org and koji.fedoraproject.org would be OK, but fedoraproject.org.mordor.evil wouldn't be.

I'm not sure if it'd be worth including a warning above job_schedule_iso that anything calling it must validate the _URL params in this way, or making this a function in Util or something, or if we're just assuming that ISO scheduling is only ever going to be allowed to happen from a web API request?

AdamWill · 2016-02-05T17:53:55Z

So I managed to test the whitelisting stuff; it seems to work fine (with one small fix - ten points to whoever spots the error in the diff :>), so I've updated the pull request.

coolo · 2016-02-11T06:11:05Z

LGTM, but it's a bit complicated so I would like @aaannz to merge if ok

AdamWill · 2016-02-11T09:10:04Z

Just rebased to master and fixed the openqa.ini comment text explaining the option to refer to fedoraproject.org.evil as an example 'bad' domain rather than foo.com.evil.

AdamWill · 2016-02-11T09:15:01Z

...and added a bit more to the comment in the config file explaining that the option must be set to enable asset downloading.

aaannz · 2016-02-11T09:32:23Z

LGTM, one thought though. Am I overthinking this if check against whitelisted domains is done in Iso controller and not in gru jobs where the actual download is done?
Pro for controller - we return error immediately.
Pro for gru job - if someone somehow manage to create custom gru job, she can bypass this check.

AdamWill · 2016-02-11T10:20:14Z

I didn't really think about doing it that way. At first I guess I assumed gru doesn't have the config file settings available to it, but now I look, I think maybe it does, which would make that viable.

I agree with your assessment. We could possibly put a 'check whitelist' function in shared code somewhere (Util?) and have both functions read in the whitelist and run the check function? That way we can have both benefits, without too code duplication...WDYT?

aaannz · 2016-02-11T13:12:08Z

I didn't check GRU to have access to config. It does have access to some mojo app, but I'm not certain how it all works together. I however doubt the WebAPI.pm::startup is run, so the config is not available.

So what I think to do is to move whiletlist check from Iso controller to Util and merge this PR.
Then in another PR move config parsing from startup to Util?, reuse it in GRU and hook GRU to domain whitelist check.

coolo · 2016-02-11T14:47:38Z

if Gru doesn't have the config, how does it know it should or should not log debug?

coolo · 2016-02-11T14:49:47Z

actually, gru is a webapi plugin, so its basically a webapi without the http stuff ;)

aaannz · 2016-02-11T14:53:52Z

So it does run startup and ignore routing? And starts another worker checker? Hmm.. indeed. I just tried to use bogus auth plugin and GRU fails to run due to requiring non-existent module.

aaannz · 2016-02-11T14:55:59Z

I have a feeling there are some assumptions in WebAPI.pm which does not hold for GRU. :/

coolo · 2016-02-11T14:57:19Z

well, the gru service is gru run - which triggers another run function than the one in Mojo::Base

AdamWill · 2016-02-11T15:46:28Z

I'll investigate when I have a bit of spare time, I guess =)

AdamWill · 2016-02-12T23:17:16Z

Whew, OK, maybe this is good enough now? :) It seems the download_asset sub does have access to the config; you can verify this easily enough by just having it log the setting and seeing that it does so successfully.

So this does what I proposed: the check code is moved to Util and both the ISO API controller and the download_asset sub use it, they only handle errors themselves. Since the API controller has the entire params hash while download_asset only has a URL, I split the check code into two subs to give them both an appropriate entry point. Technically speaking, at present, the bits unique to check_download_whitelist are only used by the API controller and so could just as well be inline there (and have only check_download_url shared), but it seems reasonable to keep them in Util in case we do wind up needing to reuse them in some future case.

I tested this, and it really seems to work. To test that the 'failsafe' check really works you can comment out the whole block in Iso.pm that runs the check, then check the logs from gru and verify that it refused to download the file.

AdamWill · 2016-02-12T23:46:00Z

Small note on this: it doesn't explicitly handle the case of someone passing complete garbage as the URL. However, this should be handled OK, because we use Mojo's own download code, and it won't be able to download anything we can't parse a host out of. In practice if you set a garbage URL it will fail the check anyway, as $host winds up uninitialized, and it won't pass the regex match.

okurz · 2016-02-13T06:15:23Z

LGTM

aaannz · 2016-02-13T07:50:06Z

lib/OpenQA/Scheduler/Scheduler.pm

+        # subroutine as parse_assets_from_settings
+        my $assettype = asset_type_from_setting($short);
+        # We're only going to allow downloading of asset types
+        die "_URL downloading only allowed for asset types! $short is not an asset type" unless ($assettype);


I am not sure about dying here. I think this will kill the Scheduler DBUS handler and it will refuse to schedule new isos until rebooted. Sadly we don't have any text feedback from scheduler back to webapi, so I would just return empty array to signalize no job was created.

aaannz · 2016-02-13T07:51:19Z

One comment, the rest LGTM.

AdamWill · 2016-02-13T08:43:37Z

Thank you! I didn't think to check that. I will check it out and change as appropriate.

AdamWill · 2016-02-13T09:10:16Z

There are several other points where we die in Scheduler.pm, including another in job_schedule_iso:

try {
    schema->txn_do($coderef);
}
catch {
    my $error = shift;
    OpenQA::Utils::log_debug("rollback job_schedule_iso: $error");
    die "Rollback failed during failed job_schedule_iso: $error"
      if ($error =~ /Rollback failed/);
    @ids = ();
};

so does that mean those will also kill the scheduler entirely?

AdamWill · 2016-02-13T09:15:53Z

Still, how about this: just log the issue as a warning and continue with the next param. This is more or less equivalent to what we do on other failure cases in this path, we just go ahead and run the jobs. Perhaps someone's using a custom param that happens to end in _URL or something.

aaannz · 2016-02-17T12:55:07Z

LGTM, what about the coverage? Lets punish @AdamWill with some more test writing? I'm not completely familiar how coveralls compute test coverage. Maybe expanding scheduler tests with one testing if GRU task is created in db for iso scheduled with ISO_URL is sufficient for that 0.2% drop.

okurz · 2016-02-17T15:50:40Z

this coverage is not computer by coveralls but by "cover". I wouldn't call it punishment. Yes, I recommend to extend the tests, i.e. @AdamWill are you willing to do this?

AdamWill · 2016-02-17T17:39:29Z

Didn't we go over this before? AFAICS there is currently no test code that hits job_schedule_iso, so this is not really a case of 'just add a new test that covers the new bits' but 'add a whole new type of test we are not currently doing'. I can try just simple-mindedly throwing a test in 04-scheduler.t or something which hits job_schedule_iso, I guess...

one problem I see immediately is that the test database has no job templates, so I don't think _generate_jobs will generate any jobs, and so the #enqueue gru jobs block will be skipped because there will be no @ids at that point.

edit: hmm, I see there's some stuff in t/fixtures/04-products.pl which might work if we create a schema without skipping fixtures...I can play with it I guess.

AdamWill · 2016-02-17T18:40:41Z

there, have some goddamn tests...they can't go in 04-scheduler because it has no fixtures (as per previous comment) and you can't recreate the schema halfway through testing (I tried). They could be stuck into 05-scheduler-capabilities, but then the workers are gonna pick them up and it just seems icky, a new file feels right.

@aaannz

this is to make it possible to do 'remote scheduling' (where the scheduler runs on some machine which does not have access to the openQA 'factory' directory) of jobs that boot using a kernel and/or initrd and/or hdd image, just as the 'ISOURL' feature made it possible for jobs that boot using an ISO. I had to change a few other things to be happy with it, though. The convention now is you can request openQA download for any asset type, by adding _URL to the appropriate setting name. To download the ISO set 'ISO_URL', to download a kernel set 'KERNEL_URL', and so on. The POST will error out if you try to download something that is *not* an asset type. This is because we can't really guess where we should store non-assets. As with the old ISOURL code, if you set the normal setting (ISO, KERNEL etc.), that value will be used as the filename of the downloaded file, otherwise the original filename will be split out from the URL and used. Of course, in order to achieve the original goal, I had to make kernels and initrds be treated as assets :) In garretraziel's initial patch they were not. We have another good reason to treat them as assets, though - it will allow us to clean them up with the limit_assets gru task (this patch does not change that yet, but it's necessary to make it possible). For now we treat kernels and initrds as 'other' assets, they could be given a new asset type instead if we think that's a good idea. HDDs of course already have an asset type. It seemed clean to factor the 'figure out the asset type for a given setting' code in parse_assets_from_settings, because we need to do that same thing in job_schedule_iso now, to decide where the asset should be downloaded to (we need to know its type). So this adds 'asset_type_from_setting' to Utils, and has both parse_assets_from_settings and job_schedule_iso use it. It's also necessary to define OTHER_DIR (like HDD_DIR and ISO_DIR) in Common.pm so isotovideo can use it. The code for attaching the appropriate path to the KERNEL and INITRD values is almost identical to the code for ISO, but it's not easy to reuse unfortunately. As suggested by @aaannz, this also adds whitelisting for asset download domains. Allowing asset download from anywhere on the internet adds a potential attack vector by compromising the credentials (especially the API key / secret) of an admin: an attacker could craft an ISO or other asset to try and break out of a worker VM and attack the worker host (which may also be the server, in a small deployment). We mitigate this with the download_domains config option, which specifies the domains from which assets can be downloaded; asset download requests for files from any other domain will be rejected. The check is done twice: when the API handles the iso POST request (with a nice error message returned to the user if the check fails) and also directly by download_asset, as a safety measure in case someone somehow finds a way to bypass the API and schedule a GRU task directly. Finally I added a bit more handling of potential error cases in download_asset (renamed from download_iso) to try and avoid triggering the Eternal Gru Failure Loop Of Pain (it now checks if it can write to the target path, and catches errors when moving the temporary file to the final location, since we found out that was possible).

okurz · 2016-02-17T19:43:14Z

I would have preferred tests write like

-# check we have no gru download tasks to start with
-my @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'});
-ok(scalar @tasks == 0);
is(scalar @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'}), 0,
    'check we have no gru download tasks to start with');

but better @aaannz just merges this before the swearing continues ;-)

AdamWill · 2016-02-17T19:49:44Z

how bout this, you merge this, then send a PR to rewrite them however you please, and I'll ack it ;)

treat kernels / initrds as assets, allow download of all assets

aaannz · 2016-02-17T20:26:16Z

Tada..!

coolo · 2016-02-25T11:45:52Z

lib/OpenQA/Scheduler/Scheduler.pm

+    # the file at that URL before running the job
+    my %downloads = ();
+    for my $arg (keys %args) {
+        next unless ($arg =~ /_URL$/);


this hit on a rather innocent URL in our tests - is there a way to only trigger on URLs you would also download?

I'm sorry, I'm not sure I quite get the question. Are you saying you happen to have an existing parameter which ends in _URL which isn't for downloading?

Hmm - I guess what we can do is reverse the ordering just below this chunk, so we check if its a downloadable type before we generate the shortened param, and complain less when it's not downloadable? Sound good?

AdamWill force-pushed the download-generic branch from f675d4a to 74c3e33 Compare February 5, 2016 17:52

AdamWill force-pushed the download-generic branch from 74c3e33 to 34cbbe7 Compare February 11, 2016 09:08

AdamWill force-pushed the download-generic branch from 34cbbe7 to 634e2ec Compare February 11, 2016 09:14

AdamWill force-pushed the download-generic branch from 634e2ec to 2def19c Compare February 12, 2016 23:11

AdamWill force-pushed the download-generic branch from 2def19c to 95ecd1a Compare February 12, 2016 23:41

aaannz reviewed Feb 13, 2016
View reviewed changes

AdamWill force-pushed the download-generic branch from 95ecd1a to 3c36fba Compare February 13, 2016 09:07

AdamWill force-pushed the download-generic branch from 3c36fba to b3a6afb Compare February 13, 2016 09:14

AdamWill force-pushed the download-generic branch 3 times, most recently from 1a4b3d9 to 8714013 Compare February 16, 2016 17:15

AdamWill force-pushed the download-generic branch from 8714013 to a035f83 Compare February 17, 2016 18:40

AdamWill force-pushed the download-generic branch from a035f83 to c7699c7 Compare February 17, 2016 19:12

aaannz pushed a commit that referenced this pull request Feb 17, 2016

Merge pull request #535 from AdamWill/download-generic

ac603b1

treat kernels / initrds as assets, allow download of all assets

aaannz merged commit ac603b1 into os-autoinst:master Feb 17, 2016

okurz mentioned this pull request Feb 18, 2016

t: Catch expected warnings in 05-scheduler-iso #559

Merged

coolo reviewed Feb 25, 2016
View reviewed changes

treat kernels / initrds as assets, allow download of all assets #535

treat kernels / initrds as assets, allow download of all assets #535

Conversation

AdamWill commented Feb 4, 2016

aaannz commented Feb 4, 2016

AdamWill commented Feb 4, 2016

coolo commented Feb 5, 2016

AdamWill commented Feb 5, 2016

AdamWill commented Feb 5, 2016

AdamWill commented Feb 5, 2016

coolo commented Feb 11, 2016

AdamWill commented Feb 11, 2016

AdamWill commented Feb 11, 2016

aaannz commented Feb 11, 2016

AdamWill commented Feb 11, 2016

aaannz commented Feb 11, 2016

coolo commented Feb 11, 2016

coolo commented Feb 11, 2016

aaannz commented Feb 11, 2016

aaannz commented Feb 11, 2016

coolo commented Feb 11, 2016

AdamWill commented Feb 11, 2016

AdamWill commented Feb 12, 2016

AdamWill commented Feb 12, 2016

okurz commented Feb 13, 2016

aaannz Feb 13, 2016

Choose a reason for hiding this comment

aaannz commented Feb 13, 2016

AdamWill commented Feb 13, 2016

AdamWill commented Feb 13, 2016

AdamWill commented Feb 13, 2016

aaannz commented Feb 17, 2016

okurz commented Feb 17, 2016

AdamWill commented Feb 17, 2016

AdamWill commented Feb 17, 2016

okurz commented Feb 17, 2016

AdamWill commented Feb 17, 2016

aaannz commented Feb 17, 2016

coolo Feb 25, 2016

Choose a reason for hiding this comment

AdamWill Feb 25, 2016

Choose a reason for hiding this comment