Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logical object renaming feature #71

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions bin/hatrac-migrate
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,13 @@ class HatracMigrateCLI (BaseCLI):
if name.is_object():
versions = self.directory.object_enumerate_versions(name)
for version in versions:
if linked:
if version.aux.get("rename_to"):
# the renamed object does not need to be migrated...
# instead, content will be migrated under the new/renamed name
# the renamed record is a stub to forward to the new one
# and does not use its own backing storage after migration
pass
elif linked:
url = version.aux.get("url")
if url:
objects[url] = version
Expand Down Expand Up @@ -147,7 +153,7 @@ class HatracMigrateCLI (BaseCLI):
else:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(msg)
self.directory.version_aux_url_update(obj, base_url)
self.directory.version_aux_update(obj, {"url": base_url + obj.asurl()})
succeeded += 1
except Exception as e:
logger.warning(
Expand Down Expand Up @@ -355,10 +361,15 @@ class HatracMigrateCLI (BaseCLI):

# 3.3 Post-transfer database updates
# We only care about storing the returned aux version in the non-filesystem backend case
version = version if self.backend != "filesystem" else None
self.directory.version_aux_version_update(obj, version)
if self.backend != "filesystem":
self.directory.version_aux_update(obj, {"version": version})
else:
self.directory.version_aux_pop(obj, "version", None)
# On a successful transfer, we will automatically delete the aux redirect url
self.directory.version_aux_url_delete(obj)
# We will also forget about any legacy storage name inherited from the original system
self.directory.version_aux_pop(obj, "hname", None)
self.directory.version_aux_pop(obj, "hversion", None)
self.directory.version_aux_pop(obj, "url", None)
success = True
except Exception as e:
logger.error("Error during transfer of [%s] to backend storage: %s" %
Expand Down
71 changes: 71 additions & 0 deletions docs/INTERNALS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

This document summarizes some internal database state for developers or
expert service administrators.

## Version Aux Column

The `aux` column of the `hatrac.version` table stores a JSON formatted
value that can override some service behaviors. It is typically empty
(`null`) in a basic deployment scenario.

If populated, it should be a JSON object with a sparse set of
key-value pairs. When present, these keys introduce special
behavior. They are detected and handled in the following priority
order, so the first detected field may change behavior before other
fields can be processed:

1. `rename_to`: preferred name and version key to service content.
2. `url`: a URL to the version content at a remote hatrac service.
3. `hname` and `hversion`: name and version to override URL parsed values.
4. `version`: version to override backend storage version keying.

The `rename_to` field stores a pair `[` _hname_ `,` _hversion_ `]` which
is used to lookup a preferred object version that obsoletes the
annotated object version. The service resolves this reference (similar
to a symbolic link in a filesystem) and performs the actual content
retrieval via the record found with that _hname_ and _hversion_. Access
control is processed using the preferred version and the HTTP
`Location` response header is also set to identify the preferred name.

The `url` field triggers an HTTP redirect to a remote Hatrac object
version that should have the same content. This is primarily used
during an online migration from an old to new server with the
`hatrac-migrate` utility script.

The `hname` and `hversion` fields override the default behavior when
retrieving content from the storage backend. The default behavior is
to use the actual `name` and `version` columns of the respective
Hatrac database records as input to the addressing function of the
storage backend. The `h` prefix means the "Hatrac" value as parsed
from URLs.

The `version` field overrides the backend storage version ID,
currently only meaningful in the S3 backend. This is relevant when
accessing a versioned bucket, where the addressing function maps the
Hatrac name and version values (e.g. from the URL) to an object key
but there might be a different version ID to access the correct
version of the backend object.


## Object Renaming

The object renaming feature (achieved with POST requests passing the
`{"command": "rename_from", ...}` batch command description) are
implemented by making coordinated changes to the `aux` column fields
described above:

1. A new version record is created under the new/preferred name with
its `hname`, `hversion`, and `version` aux fields set to refer to the
existing backend storage content addressed by the old/legacy name in
use when it was actually stored.

2. The old version record has its `rename_to` aux field set to point
to the new/preferred version record.

During migration, existing object renaming is slightly normalized:

1. The content is transferred and stored under the new/preferred name,
rather than recreating the content under the old/legacy storage address.

2. The old/legacy records are kept with `rename_to` so that they
continue to allow HTTP access via legacy URLs.
100 changes: 74 additions & 26 deletions docs/REST-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ The REST API supports the following operations.
- [Delete nested namespace](#nested-namespace-deletion)
2. Object operations
- [Create or update object](#object-creation-and-update)
- [Rename object](#object-renaming)
- [Get object content](#object-retrieval)
- [Delete object](#object-deletion)
3. Object version operations
Expand Down Expand Up @@ -363,10 +364,10 @@ an existing object. Literal object content is provided as input:
Content-Type: text/plain
Content-Length: 14
Content-MD5: ZXS/CYPMeEBJpBYNGYhyjA==
Content-SHA256: 5+aEMqzlEZxe9xPaDUZ0GyBvTUaZf4s0yMpPgV/0yt0=
Content-Disposition: filename*=UTF-8''test.txt
If-Match: etag_value
If-None-Match: *
Content-SHA256: 5+aEMqzlEZxe9xPaDUZ0GyBvTUaZf4s0yMpPgV/0yt0=
Content-Disposition: filename*=UTF-8''test.txt
If-Match: etag_value
If-None-Match: *

...content...

Expand Down Expand Up @@ -464,6 +465,53 @@ solution is to first create an empty object (e.g. with `Content-Type:
text/plan`) and then immediately update its content with the desired
content.

### Object Renaming

The POST operation with a special batch command input is used to
logically rename versions of an existing *old* object under the *new*
object name specified in the request URL.

This logical renaming operation combines two complementary effects:

1. The old versions are mirrored (copied) under the new object name.
2. The old versions are adjusted to forward to the new object name.

Subsequent data retrieval under either the old or new name will
include a `Location` response header providing the new object
name. Data access will enforce access control based on the ACLs
configured on the new name.

A JSON command description is provided as input:

POST /new_namespace_path/new_object_name
Host: authority_name
Content-Type: application/json
If-Match: etag_value
If-None-Match: *

{
"command": "rename_from",
"source_name": "/oldnamespace/old_object_name",
"source_versions": [ "oldversion1", "oldversion2", ... ],
"copy_acls": true,
}

The JSON command description is an object with several fields:

- `"command"`: The fixed string `"rename_from"`.
- `"source_name"`: The existing old object path within the service instance.
- `"source_versions"`: Which old object versions to mirror under the new object name (optional, defaults to all versions).
- `"copy_acls"`: Whether to copy ACLs from old to new versions (optional, defaults to `false` to set minimal new version ownership).

In order to be authorized to perform renaming, the client must have
ownership privileges on the old object and must either be allowed to
create the new object name or have ownership of the existing new
object name.

This is done via service-internal metadata reconfiguration, not
consuming additional bulk storage consumption nor incurring any bulk
data movement in the storage backend.

### Object Retrieval

The GET operation is used to retrieve the current version of an object:
Expand Down Expand Up @@ -525,9 +573,9 @@ for which a successful response is:
Content-Type: content_type
Content-Length: N
Content-MD5: hash_value
Content-SHA256: hash_value
Content-Disposition: filename*=UTF-8''filename
Content-Location: /namespace_path/object_name:version
Content-SHA256: hash_value
Content-Disposition: filename*=UTF-8''filename
Content-Location: /namespace_path/object_name:version

The HEAD operation is essentially equivalent to the GET operation but
with the actual object content elided.
Expand Down Expand Up @@ -648,8 +696,8 @@ for which the successful response is:
Accept-Ranges: bytes
Content-Type: content_type
Content-MD5: hash_value
Content-SHA256: hash_value
Content-Disposition: filename*=UTF-8''filename
Content-SHA256: hash_value
Content-Disposition: filename*=UTF-8''filename
Content-Length: N

with the same interpretation as documented for Object Metadata
Expand Down Expand Up @@ -723,21 +771,21 @@ The GET operation is used to retrieve all metadata sub-resources en masse
as a document:

GET /resource_name;metadata
Host: authority_name
Accept: application/json
If-None-Match: etag_value
Host: authority_name
Accept: application/json
If-None-Match: etag_value

for which the successful response is:

200 OK
Content-Type: application/json
Content-Length: N
ETag: etag_value
Content-Type: application/json
Content-Length: N
ETag: etag_value

{"content-type": content_type,
"content-md5": hash_value,
"content-sha256": hash_value,
"content-disposition": disposition}
{"content-type": content_type,
"content-md5": hash_value,
"content-sha256": hash_value,
"content-disposition": disposition}

The standard
[object version metadata retrieval](#object-version-metadata-retrieval),
Expand All @@ -750,18 +798,18 @@ The GET operation is used to retrieve one metadata sub-resource as a
text value:

GET /resource_name;metadata/fieldname
Host: authority_name
Accept: text/plain
If-None-Match: etag_value
Host: authority_name
Accept: text/plain
If-None-Match: etag_value

for which the successful response is:

200 OK
Content-Type: text/plain
Content-Length: N
ETag: etag_value
Content-Type: text/plain
Content-Length: N
ETag: etag_value

value
value

The textual _value_ is identical to what would be present in the HTTP
response header value when retrieving the main resource content.
Expand Down
14 changes: 11 additions & 3 deletions hatrac/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,11 @@ def get_content(self, client_context, get_data=True):
def encode(self):
return self


class _Sentinel (object): pass
_sentinel = _Sentinel()


class Metadata (dict):

_all_keys = {
Expand Down Expand Up @@ -400,10 +405,13 @@ def update(self, updates):
k = k.lower()
if k not in self or self[k] != v:
self[k.lower()] = v
def pop(self, k):

def pop(self, k, default=_sentinel):
k = k.lower()
return dict.pop(self, k)
if not isinstance(default, _Sentinel):
return dict.pop(self, k, default)
else:
return dict.pop(self, k)


class Redirect(object):
Expand Down
Loading
Loading