Skip to content

Commit

Permalink
Merge branch 'master' into flask_prune_reloads
Browse files Browse the repository at this point in the history
  • Loading branch information
karlcz committed Sep 5, 2023
2 parents cab56ca + 13b8881 commit 1bfa23c
Show file tree
Hide file tree
Showing 13 changed files with 435 additions and 79 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,13 @@ jobs:
sudo pip3 install pycryptodome
- name: Install webauthn
run: |
sudo pip3 install -U pyopenssl cryptography
sudo pip3 install pycryptodome
sudo useradd -m -r webauthn
sudo su -c '/usr/bin/python3 -c "import sys;import pprint;pprint.pprint(sys.path)"' - webauthn
git clone https://github.com/informatics-isi-edu/webauthn.git
cd webauthn
git checkout origin/flask
git checkout origin/master
sudo make testvars
sudo make install
sudo make deploy
Expand Down
272 changes: 272 additions & 0 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@

# Hatrac Configuration Manual

The service is configured with a `hatrac_config.json` file in the daemon account home directory, e.g. `/home/hatrac/hatrac_config.json` on typical deployments.

Some basic examples are included in the [Hatrac Installation document](INSTALL.md). This document is meant as more complete specification of configurable parameters.

## Core Service Config

The top-level JSON document has options which affect the service as a
whole:

```
{
"service_prefix": <URL path prefix string>,
"database_dsn": <database connection DSN string>,
"max_request_payload_size": <integer (default 134217728)>,
"firewall_acls": { <aclname>: <acl>, ... },
"read_only": <boolean>,
"storage_backend": <backend name string>,
"error_templates": { <error response template map...> },
...
}
```

### `service_prefix`

The service's URL path prefix, i.e. `"/hatrac"`. By default it is auto-configured from the WSGI environment.

This is a *developer* option which should not be used in practice.

### `database_dsn`

The connection string used when opening the service database via the `psycopg2` database API.

A typical value for a single-host deployment would be `"dbname=hatrac"`. In a more complex deployment, this might include remote database server addresses or other connection options.

### `max_request_payload_size`

An integer byte count. The default is 134217728, i.e. 128 MiB.

This policy setting limits the size of object payload that a client may send to the service in one request. Requests exceeding this size will be rejected with an HTTP `413` error code. To create larger objects, a client must use the chunked upload job feature to send the large object content as a sequence of smaller chunk requests.

### `firewall_acls`

A mapping of predefined ACL names to access control lists. Default configuration:

```
{
"firewall_acls": {
"create": ["*"],
"delete": ["*"],
"manage_acls": ["*"],
"manage_metadata": ["*"]
}
}
```

These predefined ACL names affect the following kinds of request:
- `create`: PUT of namespaces, PUT of objects or new object versions, POST of chunked upload jobs
- `delete`: DELETE of namespaces, objects, and object-versions
- `manage_acls`: PUT or DELETE of ACL sub-resources
- `manage_metadata`: PUT or DELETE of metadata sub-resources

The firewall ACLs are an additional, service-wide authorization step that requests must pass in addition to the fine-grained ACLs configured within the hierarchical namespace. This gives the service operator an option to withdraw some of the self-service privileges that would otherwise be granted to clients who upload content. So, even though a client might be an "owner" of an object or namespace sub-tree, the firewall ACLs might require that they also belong to a special curator group in order to further modify state.

The default is used incrementally to supply any missing firewall ACL in the case that the service configuration sparsely populates the set of ACL names.

### `read_only`

When `true`, changes the default `firewall_acls` ACL content from `["*"]` to `[]`.

This backwards-compatibility feature translates the legacy `read_only` configuration field as a short-hand to supply all firewall ACLs with empty lists, approximating the old feature which blocked all mutation requests with one boolean setting. However, this translation only affects the default ACL value supplied for unconfigured firewall ACL names. In a mixed configuration, the `read_only` option will have no effect on firewall ACLs that are populated in the configuration file.

### `storage_backend`

The name of one of the built-in storage backends for the service. Currently must be one of:

- `"filesystem"`
- `"amazons3"`
- `"overlay"`

Each backend introduces additional backend-specific configuration syntax as well.

### `storage_path` (filesystem backend)

The mounted path where the `"filesystem"` backend reads and writes bulk objects. The default storage path is `"/var/www/hatrac"`.

### `s3_config` (amazons3 backend)

A nested document containing many configuration fields for the amazons3 backend.

```
{
...
"storage_backend": "amazons3",
"s3_config": {
"default_session": { ... },
"buckets": { ... },
"legacy_mapping": <boolean>,
}
...
}
```

#### `s3_config`.`default_session`

A sub-document passed through as a keyword arguments dictionary for the Python boto3 session constructor, i.e. `boto3.session.Session(**default_session)`. The default when unconfigured is `{}` which uses the built-in default behavior of the API.

This default session will be used to access the S3 API unless a more specific session config is configured for a specific bucket in the `buckets` config described later.

For backwards compatibility, either `default_session` or `session` are recognized as the configuration field name for this concept.

#### `s3_config`.`buckets`

A sub-document mapping one or more sets of bucket-specific configuration to different path prefixes in the Hatrac namespace hierarchy.

```
{
...
"storage_backend": "amazons3",
"s3_config": {
"buckets": {
<prefix>: {
"bucket_name": <s3 bucket name string>,
"bucket_path_prefix": <s3 bucket path prefix string>,
"hatrac_s3_method": <hatrac s3 method name string>,
"unquote_object_keys": <boolean>,
"presigned_url_threshold": <integer byte count>,
"presigned_url_expiration_secs": <integer number of seconds>,
"session_config": { ... },
"client_config": { ... }
}
...
},
}
...
}
```

Each _prefix_ should be a path string such as `"namespace1"` or `"namespace1/namespace2"`. When routing request paths for object access, the *most specific* matching prefix will be found to choose the corresponding _bucket config_ that controls object storage.

Leading and trailing `/` characters will be stripped, so a single configuration for the `""` path prefix is sufficient to configure one bucket for the entire Hatrac namespace.

For backwards compatibility, either `buckets` or `bucket_mappings` are recognized as the configuration field name for this concept.

#### `s3_config`.`legacy_mapping`

When `true`, simplifies the interpretation of the `s3_config`.`bucket` config to ignore deeper path prefixes. The default is `false`.

This is a backwards-compatibility option to emulate the way the legacy codebase would ignore path suffixes and treat `"/prefix/suffix"` the same as `"/prefix"`.

#### `s3_config`.`buckets`.`bucket_name`

The name of the configured bucket in the S3 API.

#### `s3_config`.`buckets`.`bucket_path_prefix`

A path prefix to add to Hatrac storage names when producing S3 object keys. Default `"hatrac"`.

The default is a backwards-compatibility mechanism. It is recommended that new deployments use `""` so that the hierarchical Hatrac namespace maps directly to object prefixes in a dedicated storage bucket.

#### `s3_config`.`buckets`.`hatrac_s3_method`

The name of the desired naming scheme for mapping Hatrac object names to S3 object keys. Default `"pref/**/hname"`.

These names reference specific built-in methods:

- `pref/**/hname`: Object keys combine _bucket path prefix_ and _hierarchical hatrac name_
- `pref/**/hname:hver`: Object keys combine _bucket path prefix_ , _hierarchical hatrac name_ , and hatrac object-version ID.

The set of available methods may be extended in later releases.

#### `s3_config`.`buckets`.`unquote_object_keys`

When `true`, unquote URL-quoting in Hatrac hierarchical object names embedded in S3 storage keys. Default `false`.

Set this `true` for backwards compatibility with buckets written by earlier versions of Hatrac, or if more human-readable object keys are desired. However, it may have undesirable artifacts since an individual namespace or object name fragment in Hatrac might unquote to contain punctuation or Unicode characters.

The new default, `false`, makes object keys less human readable but avoids these potential confusing scenarios.

#### `s3_config`.`buckets`.`presigned_url_threshold`

The smallest object size in bytes that will be served with signed URL redirection. Default `null` disables the feature entirely.

When an integer size greater than `0` is configured, objects smaller than the threshold will be returned immediately by proxying content, while larger objects will be returned indirectly via redirection with signed URLs that allow the client to directly retrieve object content from the S3-compatible object store.

#### `s3_config`.`buckets`.`presigned_url_expiration_secs`

The integer number of seconds that a presigned URL will delegate access privileges to the client. Default `300` (5 minutes).

After the URL expires, the client will need to repeat the Hatrac request to obtain a new signed URL.

#### `s3_config`.`buckets`.`session_config`

A sub-document passed through as a keyword arguments dictionary for the Python boto3 session constructor, i.e. `boto3.session.Session(**session_config)`. The default when unconfigured is to reuse the session from the `s3_config`.`default_session` configuration.

This session will be used to access the S3 API when Hatrac names are routed to this bucket configuration.

#### `s3_config`.`buckets`.`client_config`

A sub-document passed through as a keyword arguments dictionary for the Python boto3 client constructor, i.e. `session.Client(**client_config)`. The default `{}` uses built-in default behavior of the API.

### `error_templates`

A nested JSON document allows customization of HTTP error response content.

The first layer maps specific HTTP error code strings, e.g. `"404"` to a nested document. A special key of `"default"` can designate a generic configuration used for any error code not individually configured.
```
"error_templates": {
<error code>: {
...
},
"default": {
...
}
}
```

Each nested document, in turn, maps _lowercase_ HTTP content-types to error templates.
```
"error_templates": {
<error code>: {
<content type>: <template>,
...
}
}
```

During error handling, the type of the error response is _negotatied_ by considering the set content-types configured with error templates and the set ofo content-types accepted by the client. If there is a failure to negotiate, the service will choose a default content-type.

### Default error configuration

The built-in default configuration is effectively:
```
"error_templates": {
"default": {
"text/plain": "%(message)s",
"text/html": "<html><body><h1>$(title)s</h1><p>%(message)s</p></body></html>"
}
}
```

### Error template syntax

The error templates are strings which should use the Python dictionary-interpolation syntax. They are evaluated as `template % dict(...)`. The templates are interpolated with a prepared dictionary of error-specific information:
- `code`: The decimal numeric code of the HTTP error.
- `title`: A short textual label corresponding to the HTTP error code.
- `description`: A longer text description of the error.
- `message`: An alias for the `description` key.


#### Support for legacy error template configurations

The service includes limited backwards-compatibility logic to support an earlier configuration syntax for error templates. A top-level configuration of the form:

```
"<code>_<short type>": <template>
```

will be detected and translated to act as the config:

```
"error_templates": {
<code>: {
<type>: <template>
}
}
```

but *only* for the shortened types `html` and `plain` which are understood as `text/html` and `text/plain`, respectively.

20 changes: 20 additions & 0 deletions docs/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,26 @@ This configuration works for an Amazon AWS S3 deployment:
"database_max_retries": 5
}

This configuration block enables an optional static firewall to require clients
to have certain attributes in order to perform certain operations:

{
...
"firewall_acls": {
"create": ["object-or-ns-uploader-group"],
"delete": ["object-or-ns-deletion-group"],
"manage_acls": ["acl-admin-group"],
"manage_metadata": ["metadata-curator-group"]
}
}

For backwards compatibility, `firewall_acls` uses a default ACL of
`["*"]` when an ACL is not configured. This permissive mode then
allows the normal fine-grained authorization checks to proceed for
each request. A more restrictive firewall ACL can block a request
that would normally be allowed due to the fine-grained ACL state in
the hatrac namespace hierarchy.

## REST API testing

You can perform system testing of the whole web service stack, if
Expand Down
28 changes: 26 additions & 2 deletions hatrac/core.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

#
# Copyright 2015-2022 University of Southern California
# Copyright 2015-2023 University of Southern California
# Distributed under the Apache License, Version 2.0. See LICENSE for more info.
#

Expand All @@ -18,11 +18,35 @@
from webauthn2.util import merge_config, jsonWriter

config = merge_config(
jsonFileName='hatrac_config.json'
jsonFileName='hatrac_config.json',
built_ins={},
)
# emulate legacy config for backwards compat
default_firewall_acl = [] if config.get("read_only", False) else ["*"]
# add defaults incrementally in case local config is sparsely populated
config.setdefault("firewall_acls", {})
for aclname in ["create", "delete", "manage_acl", "manage_metadata"]:
config["firewall_acls"].setdefault(aclname, default_firewall_acl)
# digest firewall acls into sets once for reuse across requests...
config["firewall_acls"] = { k: set(v) for k, v in config['firewall_acls'].items() }

max_request_payload_size_default = 1024 * 1024 * 128 # ~135MB

def set_acl_match_attributes(client_context):
"""Idempotently set client_context.acl_match_attributes"""
if hasattr(client_context, 'acl_match_attributes'):
return

match_attributes = set([
attr['id'] if isinstance(attr, dict) else attr
for attr in client_context.attributes
])
match_attributes.add('*')
if client_context.client:
client = client_context.client
match_attributes.add(client['id'] if isinstance(client, dict) else client)
client_context.acl_match_attributes = match_attributes

def hatrac_debug(*args):
"""Shim for non-logger diagnostics
Expand Down
Loading

0 comments on commit 1bfa23c

Please sign in to comment.