Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treat kernels / initrds as assets, allow download of all assets #535

Merged
merged 1 commit into from Feb 17, 2016

Conversation

AdamWill
Copy link
Contributor

@AdamWill AdamWill commented Feb 4, 2016

this is to make it possible to do 'remote scheduling' (where
the scheduler runs on some machine which does not have access
to the openQA 'factory' directory) of jobs that boot using a
kernel and/or initrd and/or hdd image, just as the 'ISOURL'
feature made it possible for jobs that boot using an ISO. I had
to change a few other things to be happy with it, though.

The convention now is you can request openQA download for any
asset type, by adding _URL to the appropriate setting name. To
download the ISO set 'ISO_URL', to download a kernel set
'KERNEL_URL', and so on. The POST will error out if you try to
download something that is not an asset type. This is because
we can't really guess where we should store non-assets.

As with the old ISOURL code, if you set the normal setting
(ISO, KERNEL etc.), that value will be used as the filename of
the downloaded file, otherwise the original filename will be
split out from the URL and used.

Of course, in order to achieve the original goal, I had to make
kernels and initrds be treated as assets :) In garretraziel's
initial patch they were not. We have another good reason to
treat them as assets, though - it will allow us to clean them
up with the limit_assets gru task (this patch does not change
that yet, but it's necessary to make it possible). For now we
treat kernels and initrds as 'other' assets, they could be
given a new asset type instead if we think that's a good idea.
HDDs of course already have an asset type.

It seemed clean to factor the 'figure out the asset type for
a given setting' code in parse_assets_from_settings, because we
need to do that same thing in job_schedule_iso now, to decide
where the asset should be downloaded to (we need to know its
type). So this adds 'asset_type_from_setting' to Utils, and
has both parse_assets_from_settings and job_schedule_iso use
it.

It's also necessary to define OTHER_DIR (like HDD_DIR and
ISO_DIR) in Common.pm so isotovideo can use it. The code for
attaching the appropriate path to the KERNEL and INITRD values
is almost identical to the code for ISO, but it's not easy to
reuse unfortunately.

Finally I added a bit more handling of potential error cases
in download_asset (renamed from download_iso) to try and avoid
triggering the Eternal Gru Failure Loop Of Pain (it now checks
if it can write to the target path, and catches errors when
moving the temporary file to the final location, since we found
out that was possible).

@aaannz
Copy link
Contributor

aaannz commented Feb 4, 2016

I took a brief look and looks fine (except failed tidy :) ), but one thing is worrying me, and should be worrying since introduction of ISOURL.
Everyone with operators authorization and higher can post REST requests including this one. So far, if someone's access credentials are compromised, he/she can mess with openQA, but is shielded from underlying system. Using this (and ISOURL from before) is actually very easy to craft some custom ISO or now other type of asset which can contain some e.g. qemu-kvm exploits, disguised as some standard utility called by our test jobs and thus attack openQA workers, usually firewalled from outside network.
I would like to have some domain whitelist in e.g. openqa.ini so that openQA will not download files from all over the internet if such event occurs.

@AdamWill
Copy link
Contributor Author

AdamWill commented Feb 4, 2016

That was something I considered with the ISOURL version, but I was kinda operating under the assumption that anyone who could post jobs would have write access somewhere on the box anyhow. The point about compromised credentials is a sensible one indeed, though. I can certainly add a whitelist, and perhaps we can leave it empty by default and require the admin to define it explicitly? Thanks for the suggestion.

@coolo
Copy link
Contributor

coolo commented Feb 5, 2016

sounds reasonable to me - if it comes with proper error handling. If you post a job and get a 404 in case you're not in the whitelist, you will get paranoid quickly :)

@AdamWill
Copy link
Contributor Author

AdamWill commented Feb 5, 2016

My plan was to have job_schedule_iso fail with an error message, a different error depending on if the whitelist is configured but the requested URL was not in it, or if the whitelist is not configured. IIRC errors from job_schedule_iso wind up in the HTTP return message, right?

@AdamWill
Copy link
Contributor Author

AdamWill commented Feb 5, 2016

So there's a slight fun thing there, which is that job_schedule_iso is in the scheduler and the scheduler has no config. So I figured, let's put it in the WebAPI ISO create method instead. I'm having trouble testing it as I'm at devconf with insanely slow wifi, so before I go any further, does this look conceptually OK to everyone, or should I have another think?

diff --git a/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm b/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
index 8d9ff58..bdde767 100644
--- a/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
+++ b/lib/OpenQA/WebAPI/Controller/API/V1/Iso.pm
@@ -43,6 +43,32 @@ sub create {
     my %up_params = map { uc $_ => $params->{$_} } keys %$params;
     # restore URL encoded /
     my %params = map { $_ => $up_params{$_} =~ s@%2F@/@gr } keys %up_params;
+
+    # check asset download params against domain whitelist
+    @okdomains = split(/ /, $self->app->config->{global}->{download_domains});
+    for my $param (keys %params) {
+        next unless ($param =~ /_URL$/);
+        # special error if asset download requested but no whitelisted domains
+        unless (@okdomains) {
+            my $error = "Asset download requested but no domains whitelisted! Set download_domains";
+            $self->app->log->debug("$param - $params{$param}");
+            $self->res->message($error);
+            return $self->rendered(403);
+        }
+        my $host = Mojo::URL->new($params{$param})->host;
+        my $ok = 0;
+        for my $okdomain (@okdomains) {
+            my $quoted = qr/$okdomain/;
+            $ok = 1 if ($host =~ /${quoted}$/);
+        }
+        next if ($ok);
+        # we only get here if we weren't OK, i.e. host is not whitelisted
+        my $error = "Asset download requested from non-whitelisted host $host";
+        $self->app->log->debug("$param - $params{$param}");
+        $self->res->message($error);
+        return $self->rendered(403);
+    }
+
     $self->emit_event('openqa_iso_create', \%params);

     my $ids = $ipc->scheduler('job_schedule_iso', \%params);

I think that regex is the appropriate way to validate - basically the URL's host must end with one of the strings in the whitelisted host list. So if you whitelisted fedoraproject.org, files from dl.fedoraproject.org and koji.fedoraproject.org would be OK, but fedoraproject.org.mordor.evil wouldn't be.

I'm not sure if it'd be worth including a warning above job_schedule_iso that anything calling it must validate the _URL params in this way, or making this a function in Util or something, or if we're just assuming that ISO scheduling is only ever going to be allowed to happen from a web API request?

@AdamWill
Copy link
Contributor Author

AdamWill commented Feb 5, 2016

So I managed to test the whitelisting stuff; it seems to work fine (with one small fix - ten points to whoever spots the error in the diff :>), so I've updated the pull request.

@coolo
Copy link
Contributor

coolo commented Feb 11, 2016

LGTM, but it's a bit complicated so I would like @aaannz to merge if ok

@AdamWill
Copy link
Contributor Author

Just rebased to master and fixed the openqa.ini comment text explaining the option to refer to fedoraproject.org.evil as an example 'bad' domain rather than foo.com.evil.

@AdamWill
Copy link
Contributor Author

...and added a bit more to the comment in the config file explaining that the option must be set to enable asset downloading.

@aaannz
Copy link
Contributor

aaannz commented Feb 11, 2016

LGTM, one thought though. Am I overthinking this if check against whitelisted domains is done in Iso controller and not in gru jobs where the actual download is done?
Pro for controller - we return error immediately.
Pro for gru job - if someone somehow manage to create custom gru job, she can bypass this check.

@AdamWill
Copy link
Contributor Author

I didn't really think about doing it that way. At first I guess I assumed gru doesn't have the config file settings available to it, but now I look, I think maybe it does, which would make that viable.

I agree with your assessment. We could possibly put a 'check whitelist' function in shared code somewhere (Util?) and have both functions read in the whitelist and run the check function? That way we can have both benefits, without too code duplication...WDYT?

@aaannz
Copy link
Contributor

aaannz commented Feb 11, 2016

I didn't check GRU to have access to config. It does have access to some mojo app, but I'm not certain how it all works together. I however doubt the WebAPI.pm::startup is run, so the config is not available.

So what I think to do is to move whiletlist check from Iso controller to Util and merge this PR.
Then in another PR move config parsing from startup to Util?, reuse it in GRU and hook GRU to domain whitelist check.

@coolo
Copy link
Contributor

coolo commented Feb 11, 2016

if Gru doesn't have the config, how does it know it should or should not log debug?

@coolo
Copy link
Contributor

coolo commented Feb 11, 2016

actually, gru is a webapi plugin, so its basically a webapi without the http stuff ;)

@aaannz
Copy link
Contributor

aaannz commented Feb 11, 2016

So it does run startup and ignore routing? And starts another worker checker? Hmm.. indeed. I just tried to use bogus auth plugin and GRU fails to run due to requiring non-existent module.

@aaannz
Copy link
Contributor

aaannz commented Feb 11, 2016

I have a feeling there are some assumptions in WebAPI.pm which does not hold for GRU. :/

@coolo
Copy link
Contributor

coolo commented Feb 11, 2016

well, the gru service is gru run - which triggers another run function than the one in Mojo::Base

@AdamWill
Copy link
Contributor Author

I'll investigate when I have a bit of spare time, I guess =)

@AdamWill
Copy link
Contributor Author

Whew, OK, maybe this is good enough now? :) It seems the download_asset sub does have access to the config; you can verify this easily enough by just having it log the setting and seeing that it does so successfully.

So this does what I proposed: the check code is moved to Util and both the ISO API controller and the download_asset sub use it, they only handle errors themselves. Since the API controller has the entire params hash while download_asset only has a URL, I split the check code into two subs to give them both an appropriate entry point. Technically speaking, at present, the bits unique to check_download_whitelist are only used by the API controller and so could just as well be inline there (and have only check_download_url shared), but it seems reasonable to keep them in Util in case we do wind up needing to reuse them in some future case.

I tested this, and it really seems to work. To test that the 'failsafe' check really works you can comment out the whole block in Iso.pm that runs the check, then check the logs from gru and verify that it refused to download the file.

@AdamWill
Copy link
Contributor Author

Small note on this: it doesn't explicitly handle the case of someone passing complete garbage as the URL. However, this should be handled OK, because we use Mojo's own download code, and it won't be able to download anything we can't parse a host out of. In practice if you set a garbage URL it will fail the check anyway, as $host winds up uninitialized, and it won't pass the regex match.

@okurz
Copy link
Member

okurz commented Feb 13, 2016

LGTM

# subroutine as parse_assets_from_settings
my $assettype = asset_type_from_setting($short);
# We're only going to allow downloading of asset types
die "_URL downloading only allowed for asset types! $short is not an asset type" unless ($assettype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about dying here. I think this will kill the Scheduler DBUS handler and it will refuse to schedule new isos until rebooted. Sadly we don't have any text feedback from scheduler back to webapi, so I would just return empty array to signalize no job was created.

@aaannz
Copy link
Contributor

aaannz commented Feb 13, 2016

One comment, the rest LGTM.

@AdamWill
Copy link
Contributor Author

Thank you! I didn't think to check that. I will check it out and change as appropriate.

@AdamWill
Copy link
Contributor Author

There are several other points where we die in Scheduler.pm, including another in job_schedule_iso:

try {
    schema->txn_do($coderef);
}
catch {
    my $error = shift;
    OpenQA::Utils::log_debug("rollback job_schedule_iso: $error");
    die "Rollback failed during failed job_schedule_iso: $error"
      if ($error =~ /Rollback failed/);
    @ids = ();
};

so does that mean those will also kill the scheduler entirely?

@AdamWill
Copy link
Contributor Author

Still, how about this: just log the issue as a warning and continue with the next param. This is more or less equivalent to what we do on other failure cases in this path, we just go ahead and run the jobs. Perhaps someone's using a custom param that happens to end in _URL or something.

@AdamWill AdamWill force-pushed the download-generic branch 3 times, most recently from 1a4b3d9 to 8714013 Compare February 16, 2016 17:15
@aaannz
Copy link
Contributor

aaannz commented Feb 17, 2016

LGTM, what about the coverage? Lets punish @AdamWill with some more test writing? I'm not completely familiar how coveralls compute test coverage. Maybe expanding scheduler tests with one testing if GRU task is created in db for iso scheduled with ISO_URL is sufficient for that 0.2% drop.

@okurz
Copy link
Member

okurz commented Feb 17, 2016

this coverage is not computer by coveralls but by "cover". I wouldn't call it punishment. Yes, I recommend to extend the tests, i.e. @AdamWill are you willing to do this?

@AdamWill
Copy link
Contributor Author

Didn't we go over this before? AFAICS there is currently no test code that hits job_schedule_iso, so this is not really a case of 'just add a new test that covers the new bits' but 'add a whole new type of test we are not currently doing'. I can try just simple-mindedly throwing a test in 04-scheduler.t or something which hits job_schedule_iso, I guess...

one problem I see immediately is that the test database has no job templates, so I don't think _generate_jobs will generate any jobs, and so the #enqueue gru jobs block will be skipped because there will be no @ids at that point.

edit: hmm, I see there's some stuff in t/fixtures/04-products.pl which might work if we create a schema without skipping fixtures...I can play with it I guess.

@AdamWill
Copy link
Contributor Author

there, have some goddamn tests...they can't go in 04-scheduler because it has no fixtures (as per previous comment) and you can't recreate the schema halfway through testing (I tried). They could be stuck into 05-scheduler-capabilities, but then the workers are gonna pick them up and it just seems icky, a new file feels right.

this is to make it possible to do 'remote scheduling' (where
the scheduler runs on some machine which does not have access
to the openQA 'factory' directory) of jobs that boot using a
kernel and/or initrd and/or hdd image, just as the 'ISOURL'
feature made it possible for jobs that boot using an ISO. I had
to change a few other things to be happy with it, though.

The convention now is you can request openQA download for any
asset type, by adding _URL to the appropriate setting name. To
download the ISO set 'ISO_URL', to download a kernel set
'KERNEL_URL', and so on. The POST will error out if you try to
download something that is *not* an asset type. This is because
we can't really guess where we should store non-assets.

As with the old ISOURL code, if you set the normal setting
(ISO, KERNEL etc.), that value will be used as the filename of
the downloaded file, otherwise the original filename will be
split out from the URL and used.

Of course, in order to achieve the original goal, I had to make
kernels and initrds be treated as assets :) In garretraziel's
initial patch they were not. We have another good reason to
treat them as assets, though - it will allow us to clean them
up with the limit_assets gru task (this patch does not change
that yet, but it's necessary to make it possible). For now we
treat kernels and initrds as 'other' assets, they could be
given a new asset type instead if we think that's a good idea.
HDDs of course already have an asset type.

It seemed clean to factor the 'figure out the asset type for
a given setting' code in parse_assets_from_settings, because we
need to do that same thing in job_schedule_iso now, to decide
where the asset should be downloaded to (we need to know its
type). So this adds 'asset_type_from_setting' to Utils, and
has both parse_assets_from_settings and job_schedule_iso use
it.

It's also necessary to define OTHER_DIR (like HDD_DIR and
ISO_DIR) in Common.pm so isotovideo can use it. The code for
attaching the appropriate path to the KERNEL and INITRD values
is almost identical to the code for ISO, but it's not easy to
reuse unfortunately.

As suggested by @aaannz, this also adds whitelisting for asset
download domains. Allowing asset download from anywhere on the
internet adds a potential attack vector by compromising the
credentials (especially the API key / secret) of an admin:
an attacker could craft an ISO or other asset to try and break
out of a worker VM and attack the worker host (which may also
be the server, in a small deployment). We mitigate this with
the download_domains config option, which specifies the domains
from which assets can be downloaded; asset download requests
for files from any other domain will be rejected. The check is
done twice: when the API handles the iso POST request (with a
nice error message returned to the user if the check fails) and
also directly by download_asset, as a safety measure in case
someone somehow finds a way to bypass the API and schedule a GRU
task directly.

Finally I added a bit more handling of potential error cases
in download_asset (renamed from download_iso) to try and avoid
triggering the Eternal Gru Failure Loop Of Pain (it now checks
if it can write to the target path, and catches errors when
moving the temporary file to the final location, since we found
out that was possible).
@okurz
Copy link
Member

okurz commented Feb 17, 2016

I would have preferred tests write like

-# check we have no gru download tasks to start with
-my @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'});
-ok(scalar @tasks == 0);
is(scalar @tasks = $schema->resultset("GruTasks")->search({taskname => 'download_asset'}), 0,
    'check we have no gru download tasks to start with');

but better @aaannz just merges this before the swearing continues ;-)

@AdamWill
Copy link
Contributor Author

how bout this, you merge this, then send a PR to rewrite them however you please, and I'll ack it ;)

aaannz pushed a commit that referenced this pull request Feb 17, 2016
treat kernels / initrds as assets, allow download of all assets
@aaannz aaannz merged commit ac603b1 into os-autoinst:master Feb 17, 2016
@aaannz
Copy link
Contributor

aaannz commented Feb 17, 2016

Tada..!

# the file at that URL before running the job
my %downloads = ();
for my $arg (keys %args) {
next unless ($arg =~ /_URL$/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hit on a rather innocent URL in our tests - is there a way to only trigger on URLs you would also download?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I'm not sure I quite get the question. Are you saying you happen to have an existing parameter which ends in _URL which isn't for downloading?

Hmm - I guess what we can do is reverse the ordering just below this chunk, so we check if its a downloadable type before we generate the shortened param, and complain less when it's not downloadable? Sound good?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants