Skip to content

Commit

Permalink
Treat kernels / initrds as assets, allow download of all assets
Browse files Browse the repository at this point in the history
this is to make it possible to do 'remote scheduling' (where
the scheduler runs on some machine which does not have access
to the openQA 'factory' directory) of jobs that boot using a
kernel and/or initrd and/or hdd image, just as the 'ISOURL'
feature made it possible for jobs that boot using an ISO. I had
to change a few other things to be happy with it, though.

The convention now is you can request openQA download for any
asset type, by adding _URL to the appropriate setting name. To
download the ISO set 'ISO_URL', to download a kernel set
'KERNEL_URL', and so on. The POST will error out if you try to
download something that is *not* an asset type. This is because
we can't really guess where we should store non-assets.

As with the old ISOURL code, if you set the normal setting
(ISO, KERNEL etc.), that value will be used as the filename of
the downloaded file, otherwise the original filename will be
split out from the URL and used.

Of course, in order to achieve the original goal, I had to make
kernels and initrds be treated as assets :) In garretraziel's
initial patch they were not. We have another good reason to
treat them as assets, though - it will allow us to clean them
up with the limit_assets gru task (this patch does not change
that yet, but it's necessary to make it possible). For now we
treat kernels and initrds as 'other' assets, they could be
given a new asset type instead if we think that's a good idea.
HDDs of course already have an asset type.

It seemed clean to factor the 'figure out the asset type for
a given setting' code in parse_assets_from_settings, because we
need to do that same thing in job_schedule_iso now, to decide
where the asset should be downloaded to (we need to know its
type). So this adds 'asset_type_from_setting' to Utils, and
has both parse_assets_from_settings and job_schedule_iso use
it.

It's also necessary to define OTHER_DIR (like HDD_DIR and
ISO_DIR) in Common.pm so isotovideo can use it. The code for
attaching the appropriate path to the KERNEL and INITRD values
is almost identical to the code for ISO, but it's not easy to
reuse unfortunately.

As suggested by @aaannz, this also adds whitelisting for asset
download domains. Allowing asset download from anywhere on the
internet adds a potential attack vector by compromising the
credentials (especially the API key / secret) of an admin:
an attacker could craft an ISO or other asset to try and break
out of a worker VM and attack the worker host (which may also
be the server, in a small deployment). We mitigate this with
the download_domains config option, which specifies the domains
from which assets can be downloaded; asset download requests
for files from any other domain will be rejected. The check is
done twice: when the API handles the iso POST request (with a
nice error message returned to the user if the check fails) and
also directly by download_asset, as a safety measure in case
someone somehow finds a way to bypass the API and schedule a GRU
task directly.

Finally I added a bit more handling of potential error cases
in download_asset (renamed from download_iso) to try and avoid
triggering the Eternal Gru Failure Loop Of Pain (it now checks
if it can write to the target path, and catches errors when
moving the temporary file to the final location, since we found
out that was possible).
  • Loading branch information
AdamWill committed Feb 17, 2016
1 parent 610dd74 commit c7699c7
Show file tree
Hide file tree
Showing 9 changed files with 294 additions and 65 deletions.
9 changes: 9 additions & 0 deletions etc/openqa/openqa.ini
Expand Up @@ -11,6 +11,15 @@
## space seaparated list of IPs or regular expressions that match IPs
#allowed_hosts = 127.0.0.1 ::1

## space separated list of domains from which asset download with
## _URL params is allowed. Matched at the end of the hostname in
## the URL. with these values downloads from opensuse.org,
## dl.fedoraproject.org, and a.b.c.opensuse.org are allowed; downloads
## from moo.net, dl.opensuse and fedoraproject.org.evil are not
## default is undefined, meaning asset download is *disabled*, you
## must set this option to enable it
#download_domains = fedoraproject.org opensuse.org

## set if you have a local repo mirror
#suse_mirror = http://FIXME

Expand Down
71 changes: 45 additions & 26 deletions lib/OpenQA/Scheduler/Scheduler.pm
Expand Up @@ -30,7 +30,7 @@ use Data::Dump qw/dd pp/;
use Date::Format qw/time2str/;
use DBIx::Class::Timestamps qw/now/;
use DateTime;
use File::Spec::Functions qw/catfile/;
use File::Spec::Functions qw/catfile catdir/;
use File::Temp qw/tempdir/;
use Mojo::URL;
use Mojo::Util 'url_unescape';
Expand All @@ -41,7 +41,7 @@ use OpenQA::Schema::Result::JobDependencies;
use FindBin;
use lib $FindBin::Bin;
#use lib $FindBin::Bin.'Schema';
use OpenQA::Utils qw/log_debug parse_assets_from_settings/;
use OpenQA::Utils qw/log_debug parse_assets_from_settings asset_type_from_setting/;
use db_helpers qw/rndstr/;

use OpenQA::IPC;
Expand Down Expand Up @@ -1179,23 +1179,39 @@ sub job_schedule_iso {
asset_register(%$a);
}
my $noobsolete = delete $args{_NOOBSOLETEBUILD};
# ISOURL == ISO download. If the ISO already exists, skip the
# download step (below) entirely by leaving $isodlpath unset.
my $isodlpath;
if ($args{ISOURL}) {
# Any arg name ending in _URL is special: it tells us to download
# the file at that URL before running the job
my %downloads = ();
for my $arg (keys %args) {
next unless ($arg =~ /_URL$/);
# As this comes in from an API call, URL will be URI-encoded
# This obviously creates a vuln if untrusted users can POST ISOs
$args{ISOURL} = url_unescape($args{ISOURL});
# set $args{ISO} to the URL filename if we only got ISOURL.
# This has to happen *before* _generate_jobs so the jobs have
# ISO set
if (!$args{ISO}) {
$args{ISO} = Mojo::URL->new($args{ISOURL})->path->parts->[-1];
# This obviously creates a vuln if untrusted users can POST
$args{$arg} = url_unescape($args{$arg});
my $url = $args{$arg};
# if $args{FOO_URL} is set but $args{FOO} is not, we will
# set $args{FOO} (the filename of the downloaded asset) to
# the URL filename. This has to happen *before*
# generate_jobs so the jobs have FOO set
my $short = substr($arg, 0, -4);
if (!$args{$short}) {
$args{$short} = Mojo::URL->new($url)->path->parts->[-1];
}
# full path to download target location
my $fulliso = catfile($OpenQA::Utils::isodir, $args{ISO});
unless (-s $fulliso) {
$isodlpath = $fulliso;
# full path to download target location. We need to guess
# the asset type to know where to put it, using the same
# subroutine as parse_assets_from_settings
my $assettype = asset_type_from_setting($short);
# We're only going to allow downloading of asset types
unless ($assettype) {
OpenQA::Utils::log_warning("_URL downloading only allowed for asset types! $short is not an asset type");
next;
}
my $dir = catdir($OpenQA::Utils::assetdir, $assettype);
my $fullpath = catfile($dir, $args{$short});

unless (-s $fullpath) {
# if the file doesn't exist, add the url/target path
# as a key/value pair to the %downloads hash
$downloads{$url} = $fullpath;
}
}
my $jobs = _generate_jobs(%args);
Expand Down Expand Up @@ -1263,18 +1279,21 @@ sub job_schedule_iso {
};

# enqueue gru jobs
if ($isodlpath and @ids) {
if (%downloads and @ids) {
# array of hashrefs job_id => id; this is what create needs
# to create entries in a related table (gru_dependencies)
my @jobsarray = map +{job_id => $_}, @ids;
schema->resultset('GruTasks')->create(
{
taskname => 'download_iso',
priority => 20,
args => [$args{ISOURL}, $isodlpath],
run_at => now(),
jobs => \@jobsarray,
});
for my $url (keys %downloads) {
my $path = $downloads{$url};
schema->resultset('GruTasks')->create(
{
taskname => 'download_asset',
priority => 20,
args => [$url, $path],
run_at => now(),
jobs => \@jobsarray,
});
}
}
schema->resultset('GruTasks')->create(
{
Expand Down
54 changes: 46 additions & 8 deletions lib/OpenQA/Schema/Result/Assets.pm
Expand Up @@ -22,6 +22,8 @@ use OpenQA::Utils;
use OpenQA::Scheduler::Scheduler 'job_notify_workers';
use Date::Format;
use Mojo::UserAgent;
use File::Spec::Functions 'splitpath';
use Try::Tiny;

use db_helpers;

Expand Down Expand Up @@ -114,27 +116,63 @@ sub ensure_size {

# A GRU task...arguments are the URL to grab and the full path to save
# it in. scheduled in job_schedule_iso()
sub download_iso {
my ($app, $args) = @_;
my ($url, $isodlpath) = @{$args};
sub download_asset {
my ($app, $args) = @_;
my ($url, $dlpath) = @{$args};
# Bail if the dest file exists (in case multiple downloads of same ISO
# are scheduled)
return if (-e $isodlpath);
return if (-e $dlpath);

OpenQA::Utils::log_debug("Downloading " . $url . " to " . $isodlpath . "...");
my $dldir = (splitpath($dlpath))[1];
unless (-w $dldir) {
OpenQA::Utils::log_error("download_asset: cannot write to $dldir");
# we're not going to die because this is a gru task and we don't
# want to cause the Endless Gru Loop Of Despair, just return and
# let the jobs fail
job_notify_workers();
return;
}

# check URL is whitelisted for download. this should never fail;
# if it does, it means this task has been created without going
# through the ISO API controller, and that means either a code
# change we didn't think through or someone being evil
my @check = check_download_url($url, $app->config->{global}->{download_domains});
if (@check) {
my ($status, $host) = @check;
if ($status == 2) {
OpenQA::Utils::log_error("download_asset: no hosts are whitelisted for asset download!");
}
else {
OpenQA::Utils::log_error("download_asset: URL $url host $host is blacklisted!");
}
OpenQA::Utils::log_error("**API MAY HAVE BEEN BYPASSED TO CREATE THIS TASK!**");
job_notify_workers();
return;
}

OpenQA::Utils::log_debug("Downloading " . $url . " to " . $dlpath . "...");
my $ua = Mojo::UserAgent->new(max_redirects => 5);
my $tx = $ua->build_tx(GET => $url);
# Allow >16MiB downloads
# http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Large-file-downloads
$tx->res->max_message_size(0);
$tx = $ua->start($tx);
if ($tx->success) {
$tx->res->content->asset->move_to($isodlpath);
try {
$tx->res->content->asset->move_to($dlpath);
}
catch {
# again, we're trying not to die here, but log and return on fail
OpenQA::Utils::log_error("Error renaming temporary file to $dlpath: $_");
job_notify_workers();
return;
};
}
else {
# Clean up after ourselves. Probably won't exist, but just in case
OpenQA::Utils::log_debug("Downloading failed! Deleting files");
unlink($isodlpath);
OpenQA::Utils::log_error("Downloading failed! Deleting files");
unlink($dlpath);
}
# We want to notify workers either way: if we failed to download the ISO,
# we want the jobs to run and fail.
Expand Down
100 changes: 86 additions & 14 deletions lib/OpenQA/Utils.pm
Expand Up @@ -4,6 +4,7 @@ require 5.002;

use Carp;
use IPC::Run();
use Mojo::URL;

require Exporter;
our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
Expand All @@ -27,6 +28,9 @@ $VERSION = sprintf "%d.%03d", q$Revision: 1.12 $ =~ /(\d+)/g;
&parse_assets_from_settings
&bugurl
&bugref_to_href
&asset_type_from_setting
&check_download_url
&check_download_whitelist
);


Expand Down Expand Up @@ -230,25 +234,32 @@ sub commit_git {
return 1;
}

sub asset_type_from_setting {
my ($setting) = @_;
if ($setting eq 'ISO' || $setting =~ /^ISO_\d$/) {
return 'iso';
}
if ($setting =~ /^HDD_\d$/) {
return 'hdd';
}
if ($setting =~ /^REPO_\d$/) {
return 'repo';
}
if ($setting =~ /^ASSET_\d$/ || $setting eq 'KERNEL' || $setting eq 'INITRD') {
return 'other';
}
# empty string if this doesn't look like an asset type
return '';
}

sub parse_assets_from_settings {
my ($settings) = (@_);
my $assets = {};

for my $k (keys %$settings) {
if ($k eq 'ISO') {
$assets->{$k} = {type => 'iso', name => $settings->{$k}};
}
if ($k =~ /^ISO_\d$/) {
$assets->{$k} = {type => 'iso', name => $settings->{$k}};
}
if ($k =~ /^HDD_\d$/) {
$assets->{$k} = {type => 'hdd', name => $settings->{$k}};
}
if ($k =~ /^REPO_\d$/) {
$assets->{$k} = {type => 'repo', name => $settings->{$k}};
}
if ($k =~ /^ASSET_\d$/) {
$assets->{$k} = {type => 'other', name => $settings->{$k}};
my $type = asset_type_from_setting($k);
if ($type) {
$assets->{$k} = {type => $type, name => $settings->{$k}};
}
}

Expand All @@ -274,5 +285,66 @@ sub bugref_to_href {
return $text;
}

sub check_download_url {
# Passed a URL and the download_domains whitelist from openqa.ini.
# Checks if the host of the URL is in the whitelist. Returns an
# array: (1, host) if there is a whitelist and the host is not in
# it, (2, host) if there is no whitelist, and () if we pass. This
# is used by check_download_whitelist below (and so indirectly by
# the Iso controller) and directly by the download_asset() Gru
# task subroutine.
my ($url, $whitelist) = @_;
my @okdomains;
if (defined $whitelist) {
@okdomains = split(/ /, $whitelist);
}
my $host = Mojo::URL->new($url)->host;
unless (@okdomains) {
return (2, $host);
}
my $ok = 0;
for my $okdomain (@okdomains) {
my $quoted = qr/$okdomain/;
$ok = 1 if ($host =~ /${quoted}$/);
}
if ($ok) {
return ();
}
else {
return (1, $host);
}
}

sub check_download_whitelist {
# Passed the params hash ref for a job and the download_domains
# whitelist read from openqa.ini. Checks that all params ending
# in _URL (i.e. requesting asset download) specify URLs that are
# whitelisted. It's provided here so that we can run the check
# twice, once to return immediately and conveniently from the Iso
# controller, once again directly in the Gru asset download sub
# just in case someone somehow manages to bypass the API and
# create a gru task directly. On failure, returns an array of 4
# items: the first is 1 if there was a whitelist at all or 2 if
# there was not, the second is the name of the param for which the
# check failed, the third is the URL, and the fourth is the host.
# On success, returns an empty array.

my ($params, $whitelist) = @_;
my @okdomains;
if (defined $whitelist) {
@okdomains = split(/ /, $whitelist);
}
for my $param (keys %$params) {
next unless ($param =~ /_URL$/);
my $url = $$params{$param};
my @check = check_download_url($url, $whitelist);
next unless (@check);
# if we get here, we got a failure
return ($check[0], $param, $url, $check[1]);
}
# empty list signals caller that check passed
return ();
}

1;
# vim: set sw=4 et:
29 changes: 15 additions & 14 deletions lib/OpenQA/WebAPI.pm
Expand Up @@ -37,14 +37,15 @@ sub _read_config {

my %defaults = (
global => {
appname => 'openQA',
base_url => undef,
branding => 'openSUSE',
allowed_hosts => undef,
suse_mirror => undef,
scm => undef,
hsts => 365,
audit_enabled => 1,
appname => 'openQA',
base_url => undef,
branding => 'openSUSE',
allowed_hosts => undef,
download_domains => undef,
suse_mirror => undef,
scm => undef,
hsts => 365,
audit_enabled => 1,
},
auth => {
method => 'OpenID',
Expand Down Expand Up @@ -518,12 +519,12 @@ sub startup {
## JSON API ends here
#

$self->gru->add_task(optipng => \&OpenQA::Schema::Result::Jobs::optipng);
$self->gru->add_task(reduce_result => \&OpenQA::Schema::Result::Jobs::reduce_result);
$self->gru->add_task(limit_assets => \&OpenQA::Schema::Result::Assets::limit_assets);
$self->gru->add_task(download_iso => \&OpenQA::Schema::Result::Assets::download_iso);
$self->gru->add_task(scan_old_jobs => \&OpenQA::Schema::Result::Needles::scan_old_jobs);
$self->gru->add_task(scan_needles => \&OpenQA::Schema::Result::Needles::scan_needles);
$self->gru->add_task(optipng => \&OpenQA::Schema::Result::Jobs::optipng);
$self->gru->add_task(reduce_result => \&OpenQA::Schema::Result::Jobs::reduce_result);
$self->gru->add_task(limit_assets => \&OpenQA::Schema::Result::Assets::limit_assets);
$self->gru->add_task(download_asset => \&OpenQA::Schema::Result::Assets::download_asset);
$self->gru->add_task(scan_old_jobs => \&OpenQA::Schema::Result::Needles::scan_old_jobs);
$self->gru->add_task(scan_needles => \&OpenQA::Schema::Result::Needles::scan_needles);

# start workers checker
$self->_workers_checker;
Expand Down

0 comments on commit c7699c7

Please sign in to comment.