Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETag #88

Closed
25 tasks done
Syndesi opened this issue Sep 3, 2023 · 3 comments
Closed
25 tasks done

ETag #88

Syndesi opened this issue Sep 3, 2023 · 3 comments
Labels
Feature Introducing new capabilities.
Milestone

Comments

@Syndesi
Copy link
Member

Syndesi commented Sep 3, 2023

Implement the Etag header (based on timestamp) for most data returning endpoints (elements, token, file).

Tasks:

  • Add Etag header to the following endpoints:
    • GET /
    • GET /<uuid>
    • GET /<uuid>/parents
    • GET /<uuid>/children
    • GET /<uuid>/related
  • Update documentation.
  • Add tests:
    • Unit tests, especially for events and event listeners.
    • Feature tests:
      • general.etag.create-node: Check that creating nodes resets Etags for related elements.
      • general.etag.create-normal-relation: Check that creating normal relations resets Etag for the start and end nodes, related endpoints only.
      • general.etag.create-owns-relation: Check that creating owns relations resets Etags for the start and end nodes, related and children/parents endpoints only.
      • general.etag.delete-node: Check that deleting nodes resets Etags for related elements.
      • general.etag.delete-normal-relation: Check that deleting normal relations resets Etags for start and end nodes, related endpoints only.
      • general.etag.delete-owns-relation: Check that deleting owns relations resets Etags for start and end nodes, related, children and parents endpoints only.
      • general.etag.update-node: Check that updating a node resets Etags of all related elements.
      • general.etag.update-normal-relation: Check that updating a normal relation resets Etags of the start and end nodes, limited to related endpoints.
      • general.etag.update-owns-relation: Check that updating an owns relation resets Etags of the start and end nodes, limited to related, children and parents endpoints.
      • general.etag.maximum: Check that no Etags are calculated for collections with more than 100 nodes.
    • Examples within documentation.
  • Add commands:
    • etag:expire-cache to expire all Etags currently saved in Redis.
      • Add documentation.
      • Add tests.
@Syndesi Syndesi added this to the 0.2.0 milestone Sep 3, 2023
@Syndesi
Copy link
Member Author

Syndesi commented Dec 23, 2023

Etags are defined as:

The ETag (or entity tag) HTTP response header is an identifier for a specific version of a resource. [...]
If the resource at a given URL changes, a new Etag value must be generated. A comparison of them can determine whether two representations of a resource are the same.

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

While the content of Etags are not standardized and can contain anything, different implementations in the wild are using the last modified timestamp, the content hash, inode data, content size and similar identifiers. Many implementations combine them and hash this result. Hashes might be returned as base encoded strings.

Etag collisions between multiple versions of the same resource on a server should not happen at all, but might be still present even with a really small chance.
Though even if such a collision might happen, the real world implications are low and generally risk free - in the worst case clients with cached Etags are not informed about the new version, but can still manually request the data, e.g. through setting no-cache-headers, adding additional parameters to the resource, or force loading the website without the browser's cache.

Implementation requirements

There are multiple questions and takeoffs to consider:

  • How are Etags calculated for different resources on the server - nodes, relations, collections, files, static and system endpoints?
  • How fast can Etags be calculated? Both the size of required data to be loaded as well as the hashing algorithm affect this number hugely. Not using a cryptographic hashing algorithm seems to be beneficial in a risk/performance-consideration.
  • Are there situations, where Etags can not be provided? This might be the case for huge collection endpoints, where the API must load, sort and hash thousands of elements.
  • Should Etags be cached? If so, then likely in Redis. Cached Etags must also be expired automatically. A command for clearing such cached Etags might also be beneficial.
  • How small can the final Etag be to still be useable, but not blow up all HTTP headers?
  • Etags and related headers need to be compatible to cors.

Implementation details

These are the current ideas for the actual implementation:

  • All Etags regardles of actual imput will use the xxh3 hashing algorithm and base58encode function for their final step.
    The pseudocode might look like this: Etag = base58encode(xxh3(content))
  • The content of nodes and relations will likely be just their UUID in binary concatenated by their updated-timestamp as an integer.
  • The content of files will likely include their UUID in binary concatenated with their updated-timestamp as an integer and their own Etag - provided through the S3 API. This implementation will happen once files are implemented.
  • The content of collection endpoints (parents, children, related, index) are likely to be a list containing a tuple for each of the collection's elements containing the element's UUID (binary) and timestamp (int). The list must be sorted - using the UUID as the sort key is better than using the timestamp as the UUIDs are unique.

Implementation details, especially the exact byte representation, might be changed. Furthermore the exact process must not be forward compatible, as clients should not try to extract any data from the Etag.

It might be benefitial to use some sort of salt or seed for the hashing algorithm, which is static for one specific API instance. This would support a feature where changing the salt/seed will result in different Etags when newly computed. Might be used in combination with the Etag expire command.

@Syndesi
Copy link
Member Author

Syndesi commented Jan 27, 2024

Some assertions within the test general.etag.maximum are blocked by #238.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Introducing new capabilities.
Projects
Status: Done
Development

No branches or pull requests

1 participant