Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@ os: linux

language: python

addons:
apt:
packages:
# for docs
- graphviz

install: pip install tox

jobs:
Expand Down
139 changes: 139 additions & 0 deletions docs/arch.gv
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
digraph {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rendered form of this graph can be seen here:

https://rohanpm.github.io/cdn-lambda/arch.html

ranksep="1.4";

# These are arranged and labelled to communicate the
# sequence of events when a request is processed.
# Try to keep them in this order.
client:sw -> controller [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>1</td></tr></table> >
]

controller:sw -> origin_request [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>2</td></tr></table> >
]

origin_request -> db [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>3</td></tr></table> >,
dir=both
]

origin_request -> controller:s [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>4</td></tr></table> >
]

controller -> S3 [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>5</td></tr></table> >,
dir=both
]

controller:se -> origin_response [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>6</td></tr></table> >,
dir=both
]

controller -> client:se [
xlabel=< <table bgcolor="white" border="0" cellborder="0" cellpadding="0" style="rounded"><tr><td>7</td></tr></table> >
]

# publishing tools are mentioned, but do not participate
# in the request processing.
# Connection order here is reversed to force the publishing tools to the bottom
# of the graph, which makes them stand out a bit more.
S3 -> publish_tools [dir="back"]
db -> publish_tools [dir="back"]

client [label="💻 client"]
publish_tools [label="publishing tools", style="rounded", rank="max", shape="box"]

db [
shape=plaintext
fontsize=9
label=<

<table border='1' cellborder='1' cellspacing='0'>
<tr><td colspan='3'><font point-size="14"><b>☁ DynamoDB</b></font></td></tr>
<tr>
<td><b>web_uri (partition key)</b></td>
<td><b>from_date (sort key)</b></td>
<td><b>object_key</b></td>
</tr>
<tr>
<td>/content/dist/rhel/server/7/7Server/x86_64/os/Packages/t/tar-1.26-34.el7.x86_64.rpm</td>
<td>2020-03-26T01:07:39+00:00</td>
<td>8e7750e50734f...</td>
</tr>
<tr>
<td>/content/dist/rhel/server/7/7Server/x86_64/os/Packages/z/zlib-1.2.7-18.el7.x86_64.rpm</td>
<td>2020-03-26T01:07:39+00:00</td>
<td>db8dd5164d117...</td>
</tr>
<tr>
<td>/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml</td>
<td>2020-03-26T01:07:39+00:00</td>
<td>aec070645fe53...</td>
</tr>
<tr>
<td>/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml</td>
<td>2020-01-22T02:07:20+00:00</td>
<td>5d70f436aa013...</td>
</tr>
<tr><td colspan='3'>...</td></tr>
</table>
>
];

S3 [
shape=plaintext
fontsize=9
label=<

<table border='1' cellborder='1' cellspacing='0'>
<tr><td colspan='3'><font point-size="14"><b>☁ S3</b></font></td></tr>
<tr>
<td><b>key</b></td>
<td><b>object</b></td>
<td><b>metadata</b></td>
</tr>
<tr>
<td>8e7750e50734f...</td>
<td><i>[blob tar-1.26-34.el7.x86_64.rpm]</i></td>
<td>-</td>
</tr>
<tr>
<td>db8dd5164d117...</td>
<td><i>[blob zlib-1.2.7-18.el7.x86_64.rpm]</i></td>
<td>-</td>
</tr>
<tr>
<td>aec070645fe5...</td>
<td><i>[blob some repomd.xml]</i></td>
<td>{ContentType: application/xml}</td>
</tr>
<tr>
<td>5d70f436aa01...</td>
<td><i>[blob other repomd.xml]</i></td>
<td>{ContentType: application/xml}</td>
</tr>
<tr>
<td>49ae93732fcf...</td>
<td><i>[blob some primary.sqlite.bz2]</i></td>
<td>{ContentType: application/x-bzip2}</td>
</tr>
<tr><td colspan='3'>...</td></tr>
</table>
>
];

subgraph cluster_0 {
label=< <b>🖧 CloudFront CDN</b> >
style="rounded";
controller;
subgraph cluster_1 {
label=<<b>cdn-lambda</b>>;
style="dashed";
rank=same
origin_request;
origin_response;
}
}
}
111 changes: 111 additions & 0 deletions docs/arch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
Architecture
============


Overview
--------

This diagram shows the relationship between all major components used
in the delivery of content via the CDN.

.. graphviz:: arch.gv

- Numbered connections represent the sequence of events when the CDN processes a request.
- For clarity, SHA256 checksums have been truncated (as in ``8e7750e50734f...``). In reality,
the system stores complete checksums.
- The CloudFront CDN shown in the above diagram may itself be hosted behind another CDN,
so client requests may pass through additional layers not expressed here.


Components
----------

client
A client requesting data from the CDN.

This could be ``dnf``, ``yum``, Satellite, ``curl``, a web browser, etc.

CloudFront CDN
The `Amazon CloudFront`_ content delivery network.

controller
An abstract component representing the built-in behaviors of CloudFront,
such as:

- basic HTTP request handling
- serving responses from cache
- invoking Lambda functions
- delegating requests to S3

...and so on.

DynamoDB
`Amazon DynamoDB`_ NoSQL database service.

The CDN uses a single DynamoDB table which primarily contains mappings
between URIs and S3 object keys.

For more information about the data contained here, see :ref:`schema_ref`.

S3
`Amazon S3`_, Simple Storage Service.

The CDN uses S3 to store the binary objects retrievable by clients.
A single bucket is used, configured as the origin of the CloudFront CDN.

One object corresponds to one file which can be downloaded from the CDN;
this includes files considered to be content (such as RPMs) and files considered
to be metadata (such as yum repo metadata files).

Each object's key is its own SHA256 checksum, ensuring that content accessible
via many paths on the CDN need only be stored once.

S3 metadata is used in some cases to customize the response behavior of each object;
for example, metadata is used to adjust ``Content-Type`` headers in responses.
Publishing tools are responsible for setting this metadata accurately.

For more information about the data contained here, see :ref:`schema_ref`.

cdn-lambda
A project including Python-based implementations of `Lambda@Edge`_ functions for the CDN.

You are currently reading the documentation of this project.

origin_request
A `Lambda@Edge`_ function connected to "origin request" events in CloudFront.

This function is primarily responsible for translating the path given in the client's
request into an S3 object key via a DynamoDB query. Assuming the client has requested
existing content, this Lambda function will rewrite the request's URI into a valid S3
object key before returning the request to the controller. The function itself does
not request data from S3, nor generate a response directly.

For more information about this function's behavior, see :ref:`function_ref`.

origin_response
A `Lambda@Edge`_ function connected to "origin response" events in CloudFront.

This function is primarily responsible for tweaking certain response headers
before allowing CloudFront to serve the response to clients. For example,
caching behavior is influenced by setting a Cache-Control header for certain
responses.

For more information about this function's behavior, see :ref:`function_ref`.

publishing tools
Represents the tools used by Red Hat to publish content onto the CDN.

These tools insert data into the CDN's S3 and DynamoDB services in order to publish
content.

A further explanation of these tools is out of scope for this document; it suffices
to know that the tools are designed with an awareness of the CDN architecture
described here.

.. _Lambda@Edge: https://aws.amazon.com/lambda/edge/

.. _Amazon CloudFront: https://aws.amazon.com/cloudfront/

.. _Amazon DynamoDB: https://aws.amazon.com/dynamodb/

.. _Amazon S3: https://aws.amazon.com/s3/
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
"sphinx.ext.napoleon",
"sphinx.ext.githubpages",
"sphinx.ext.viewcode",
"sphinx.ext.graphviz",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -133,3 +134,4 @@
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
}
graphviz_output_format = "png"
2 changes: 2 additions & 0 deletions docs/function-reference.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _function_ref:

Function Reference
==================

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ AWS Lambda functions for Red Hat's Content Delivery Network
:maxdepth: 2
:caption: Contents:

arch
function-reference
schema-reference
2 changes: 2 additions & 0 deletions docs/schema-reference.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _schema_ref:

Schema Reference
================

Expand Down