Skip to content

Commit

Permalink
Merge pull request #716 from heroku/s5cmd
Browse files Browse the repository at this point in the history
Port all binary build tooling to s5cmd, add sync tests, simplify removals
  • Loading branch information
dzuelke committed May 29, 2024
2 parents 6b58b7f + 62319d2 commit 2485a3e
Show file tree
Hide file tree
Showing 28 changed files with 804 additions and 435 deletions.
11 changes: 9 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ env:
HATCHET_BUILDPACK_BRANCH: ${{ github.head_ref || github.ref_name }}
HEROKU_API_KEY: ${{ secrets.HEROKU_API_KEY }}
HEROKU_API_USER: ${{ secrets.HEROKU_API_USER }}
S5CMD_VERSION: 2.2.2
S5CMD_HASH: "392c385320cd5ffa435759a95af77c215553d967e4b1c0fffe52e4f14c29cf85 s5cmd_2.2.2_linux_amd64.deb"

jobs:
integration-test:
Expand All @@ -47,8 +49,13 @@ jobs:
with:
php-version: "8.2"
tools: "composer:2.7"
- name: Install packages from requirements.txt (for some tests)
run: pip install -r requirements.txt
- name: Install packages from requirements.txt, plus s5cmd (for some tests)
run: |
pip install -r requirements.txt
curl -sSLO https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_linux_amd64.deb
echo "$S5CMD_HASH" | shasum -c -
dpkg -x "s5cmd_${S5CMD_VERSION}_linux_amd64.deb" .
echo "$HOME/usr/bin" >> "$GITHUB_PATH"
- name: Hatchet setup
run: bundle exec hatchet ci:setup
- name: Export HEROKU_PHP_PLATFORM_REPOSITORIES to …-develop (since we are not building main or a tag)
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/platform-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,5 +136,5 @@ jobs:
echo '## Package changes available for syncing to production bucket' >> "$GITHUB_STEP_SUMMARY"
echo '**This is output from a dry-run**, no changes have been synced to production:' >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
sed -n '/The following packages will/,$p' sync.out >> "$GITHUB_STEP_SUMMARY"
sed -n '/^The following packages will/,/POTENTIALLY DESTRUCTIVE ACTION/{/POTENTIALLY DESTRUCTIVE ACTION/!p}' sync.out >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
16 changes: 12 additions & 4 deletions .github/workflows/platform-remove.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,18 @@ jobs:
set -f
set -o pipefail
(yes 2>/dev/null || true) | docker run --rm -i --env-file=support/build/_docker/env.default heroku-php-build-${{inputs.stack}}:${{github.sha}} remove.sh ${{inputs.manifests}} 2>&1 | tee remove.out
- name: Output job summary
- name: Output dry-run summary
if: ${{ inputs.dry-run == true }}
run: |
echo '## Packages which would be removed from production bucket' >> "$GITHUB_STEP_SUMMARY"
echo '**This is output from a dry-run**, no packages have been removed:' >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
sed -n '/^The following packages will/,/POTENTIALLY DESTRUCTIVE ACTION/{/POTENTIALLY DESTRUCTIVE ACTION/!p}' remove.out >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
- name: Output removal summary
if: ${{ inputs.dry-run == false }}
run: |
echo '## Packages${{ inputs.dry-run == true && ' which would be' || '' }} removed from production bucket' >> "$GITHUB_STEP_SUMMARY"
echo "${{ inputs.dry-run == true && '**This is output from a dry-run**, no packages have been removed:' || '-n' }}" >> "$GITHUB_STEP_SUMMARY"
echo '## Packages removed from production bucket' >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
sed -n '/The following packages will/,$p' remove.out >> "$GITHUB_STEP_SUMMARY"
cat remove.out >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
16 changes: 12 additions & 4 deletions .github/workflows/platform-sync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,12 +104,20 @@ jobs:
with:
name: synclog-${{matrix.stack}}
path: sync-${{matrix.stack}}.log
- name: Output job summary
- name: Output dry-run summary
if: ${{ inputs.dry-run == true }}
run: |
echo '## Package changes ${{ inputs.dry-run == true && 'available for syncing' || 'synced' }} to ${{matrix.stack}} production bucket' >> "$GITHUB_STEP_SUMMARY"
echo "${{ inputs.dry-run == true && '**This is output from a dry-run**, no changes have been synced to production:' || '-n' }}" >> "$GITHUB_STEP_SUMMARY"
echo '## Package changes available for syncing to ${{matrix.stack}} production bucket' >> "$GITHUB_STEP_SUMMARY"
echo '**This is output from a dry-run**, no changes have been synced to production:' >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
sed -n '/The following packages will/,$p' sync-${{matrix.stack}}.log >> "$GITHUB_STEP_SUMMARY"
sed -n '/^The following packages will/,/POTENTIALLY DESTRUCTIVE ACTION/{/POTENTIALLY DESTRUCTIVE ACTION/!p}' sync-${{matrix.stack}}.log >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
- name: Output sync summary
if: ${{ inputs.dry-run == false }}
run: |
echo '## Package changes synced to ${{matrix.stack}} production bucket' >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
cat sync-${{matrix.stack}}.log >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
devcenter-generate:
needs: sync
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
bob-builder>=0.0.20
s3cmd>=1.6.0
natsort
2 changes: 1 addition & 1 deletion support/build/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -562,7 +562,7 @@ The normal flow is to run `deploy.sh` first to deploy one or more packages, and

~ $ mkrepo.sh --upload

This will generate `packages.json` and upload it right away, or, if the `--upload` is not given, print upload instructions for `s3cmd`.
This will generate `packages.json` and upload it right away, or, if the `--upload` is not given, print upload instructions for `s5cmd`.

Alternatively, `deploy.sh` can be called with `--publish` as the first argument, in which case `mkrepo.sh --upload` will be called after the package deploy and manifest upload was successful:

Expand Down
13 changes: 13 additions & 0 deletions support/build/_docker/heroku-20.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
FROM heroku/heroku:20-build.v84

ARG TARGETARCH

WORKDIR /app
ENV WORKSPACE_DIR=/app/support/build
ENV S3_BUCKET=lang-php
Expand All @@ -17,6 +19,17 @@ ENV PATH="/app/support/build/_util:$VIRTUAL_ENV/bin:$PATH"

COPY requirements.txt /app/requirements.txt

RUN pip install wheel
RUN pip install -r /app/requirements.txt

ARG S5CMD_VERSION=2.2.2
RUN curl -sSLO https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_linux_${TARGETARCH}.deb
# copy/paste relevant shasums from s5cmd_checksums.txt in the release, remember to preserve the "\\n\" at the end of each line
RUN printf "\
392c385320cd5ffa435759a95af77c215553d967e4b1c0fffe52e4f14c29cf85 s5cmd_${S5CMD_VERSION}_linux_amd64.deb\\n\
939bee3cf4b5604ddb00e67f8c157b91d7c7a5b553d1fbb6890fad32894b7b46 s5cmd_${S5CMD_VERSION}_linux_arm64.deb\\n\
" | shasum -c - --ignore-missing

RUN dpkg -i s5cmd_${S5CMD_VERSION}_linux_${TARGETARCH}.deb

COPY . /app
13 changes: 13 additions & 0 deletions support/build/_docker/heroku-22.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
FROM heroku/heroku:22-build.v84

ARG TARGETARCH

WORKDIR /app
ENV WORKSPACE_DIR=/app/support/build
ENV S3_BUCKET=lang-php
Expand All @@ -17,6 +19,17 @@ ENV PATH="/app/support/build/_util:$VIRTUAL_ENV/bin:$PATH"

COPY requirements.txt /app/requirements.txt

RUN pip install wheel
RUN pip install -r /app/requirements.txt

ARG S5CMD_VERSION=2.2.2
RUN curl -sSLO https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_linux_${TARGETARCH}.deb
# copy/paste relevant shasums from s5cmd_checksums.txt in the release, remember to preserve the "\\n\" at the end of each line
RUN printf "\
392c385320cd5ffa435759a95af77c215553d967e4b1c0fffe52e4f14c29cf85 s5cmd_${S5CMD_VERSION}_linux_amd64.deb\\n\
939bee3cf4b5604ddb00e67f8c157b91d7c7a5b553d1fbb6890fad32894b7b46 s5cmd_${S5CMD_VERSION}_linux_arm64.deb\\n\
" | shasum -c - --ignore-missing

RUN dpkg -i s5cmd_${S5CMD_VERSION}_linux_${TARGETARCH}.deb

COPY . /app
2 changes: 1 addition & 1 deletion support/build/_util/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -74,5 +74,5 @@ echo "Uploading manifest..."

if $publish; then
echo "Updating repository..."
"$(dirname "$BASH_SOURCE")/mkrepo.sh" --upload "$S3_BUCKET" "${S3_PREFIX}"
"$(dirname "$BASH_SOURCE")/mkrepo.sh" --upload
fi
3 changes: 2 additions & 1 deletion support/build/_util/include/manifest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ print_or_export_manifest_cmd() {
}

generate_manifest_cmd() {
echo "s3cmd --host=s3.${S3_REGION:-}${S3_REGION:+.}amazonaws.com --host-bucket='%(bucket)s.s3.${S3_REGION:-}${S3_REGION:+.}amazonaws.com' --ssl -m application/json put $(pwd)/${1} s3://${S3_BUCKET}/${S3_PREFIX}${1}"
cmd=(s5cmd ${S5CMD_NO_SIGN_REQUEST:+--no-sign-request} ${S5CMD_PROFILE:+--profile "$S5CMD_PROFILE"} cp ${S3_REGION:+--destination-region "$S3_REGION"} --content-type application/json "$(pwd)/${1}" "s3://${S3_BUCKET}/${S3_PREFIX}${1}")
echo "${cmd[*]@Q}"
}

soname_version() {
Expand Down
165 changes: 165 additions & 0 deletions support/build/_util/include/sync.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
import sys, json, os, glob, datetime, re, argparse
from contextlib import contextmanager
from enum import IntFlag
from pathlib import Path

class ManifestDifference(IntFlag):
CONTENTS = 1
SRC_NEWER = 2
DST_NEWER = 4

# for Python < 3.10, where glob.glob() has no root_dir kwarg
@contextmanager
def chdir(path):
cwd = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(cwd)

def serialize_datetime(obj):
if isinstance(obj, (datetime.datetime, datetime.date)):
return obj.strftime("%Y-%m-%d %H:%M:%S")
raise TypeError ("Cannot serialize type %s as JSON" % type(obj))

def parse_manifest(path):
manifest = json.load(open(path))
try:
dt = datetime.datetime.strptime(manifest.pop("time"), "%Y-%m-%d %H:%M:%S").replace(tzinfo=datetime.timezone.utc)
except (KeyError, ValueError):
dt = datetime.datetime.fromtimestamp(os.path.getmtime(path), tz=datetime.timezone.utc)
print("WARNING: manifest {} has invalid time entry, using mtime: {}".format(os.path.basename(path), serialize_datetime(dt)), file=sys.stderr)
manifest["time"] = dt
return manifest

def manifests_difference(src_manifest, dst_manifest):
src_copy = src_manifest.copy()
dst_copy = dst_manifest.copy()

ret = 0

# a newer source time means we will copy
if src_copy["time"] > dst_copy["time"]:
ret |= ManifestDifference.SRC_NEWER
elif src_copy["time"] < dst_copy["time"]:
ret |= ManifestDifference.DST_NEWER

src_copy.pop("time")
dst_copy.pop("time")
src_copy.pop("dist")
dst_copy.pop("dist")

if src_copy != dst_copy:
ret |= ManifestDifference.CONTENTS

return ret

def rewrite_dist(manifest, src_region, src_bucket, src_prefix, dst_region, dst_bucket, dst_prefix):
# pattern for basically "https://lang-php.(s3.us-east-1|s3).amazonaws.com/dist-heroku-22-stable/"
# this ensures old packages are correctly handled even when they do not contain the region in the URL
s3_url_re=re.escape("https://{}.".format(src_bucket))
s3_url_re+=r"(?:s3.{}|s3)".format(re.escape(src_region))
s3_url_re+=re.escape(".amazonaws.com/{}".format(src_prefix))
s3_url_re+=r"([^?]+)(\?.*)?"
url=manifest.get("dist",{}).get("url","")
r = re.match(s3_url_re, url)
if r:
# rewrite dist URL in manifest to destination bucket
manifest["dist"]["url"] = r.expand("https://{}.s3.{}.amazonaws.com/{}\\1\\2".format(dst_bucket, dst_region, dst_prefix))
return {"kind": "dist", "source": "s3://{}/{}{}".format(src_bucket, src_prefix, r.group(1)), "source-region": src_region, "destination": "s3://{}/{}{}".format(dst_bucket, dst_prefix, r.group(1)), "destination-region": dst_region}
else:
return {"kind": "dist", "skip": True, "source": url, "reason": "file located outside of bucket"}

parser = argparse.ArgumentParser()
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("src_region")
parser.add_argument("src_bucket")
parser.add_argument("src_prefix")
parser.add_argument("src_dir")
parser.add_argument("dst_region")
parser.add_argument("dst_bucket")
parser.add_argument("dst_prefix")
parser.add_argument("dst_dir")
args = parser.parse_args()

src_region = args.src_region
src_bucket = args.src_bucket
src_prefix = args.src_prefix
src_dir = Path(args.src_dir)

dst_region = args.dst_region
dst_bucket = args.dst_bucket
dst_prefix = args.dst_prefix
dst_dir = Path(args.dst_dir)

# we cannot use Path.glob, since we need the file names only, for comparison
# using our chdir context manager for compatibility with Python < 3.10, where glob.glob() has no root_dir kwarg
with chdir(src_dir):
src = set(glob.glob("*.composer.json"))
with chdir(dst_dir):
dst = set(glob.glob("*.composer.json"))

add = src - dst
rem = dst - src
upd = src & dst

ops = []

# anything not in dst is copied from src
for manifest_file in add:
package = re.sub(r"\.composer\.json$", "", manifest_file)
manifest = parse_manifest(src_dir/manifest_file)
# rewrite dist.url to point to dst bucket (if it's a URL in src bucket)
dist_op = rewrite_dist(manifest, src_region, src_bucket, src_prefix, dst_region, dst_bucket, dst_prefix)
dist_op["op"] = "add"
dist_op["package"] = package
# copy operation for the dist file (might also be an ignore if the URL isn't in the src bucket)
ops.append(dist_op)
# create dst manifest
args.dry_run or json.dump(manifest, open(dst_dir / manifest_file, "w"), sort_keys=True, default=serialize_datetime)
# copy operation from local dst manifest to dst bucket
ops.append({"kind": "manifest", "op": "add", "package": package, "source": dst_dir/manifest_file, "destination": "s3://{}/{}{}".format(dst_bucket, dst_prefix, manifest_file), "destination-region": dst_region})

for manifest_file in rem:
package = re.sub(r"\.composer\.json$", "", manifest_file)
manifest = parse_manifest(dst_dir / manifest_file)
# we're just checking if this file qualifies for copying - that'll tell us whether we have to remove it or not
dist_op = rewrite_dist(manifest, dst_region, dst_bucket, dst_prefix, src_region, src_bucket, src_prefix)
if dist_op.get("skip", False) == False:
# it would be copied, so it's in the bucket, and we can actually remove it
ops.append({"kind": dist_op["kind"], "op": "remove", "package": package, "destination": dist_op["source"], "destination-region": dist_op["source-region"]})
else:
# it's a URL somewhere else, so we just re-use the ignore operation
ops.append({"kind": dist_op["kind"], "op": "remove", "skip": dist_op["skip"], "package": package, "destination": dist_op["source"], "reason": dist_op["reason"]})
# drop the package from dst_dir (that's what we'll be syncing up, and what we'll be running mkrepo.sh off of)
args.dry_run or (dst_dir/manifest_file).unlink()
# removal operation from dst bucket
ops.append({"kind": "manifest", "op": "remove", "package": package, "destination": "s3://{}/{}{}".format(dst_bucket, dst_prefix, manifest_file), "destination-region": dst_region})

for manifest_file in upd:
package = re.sub(r"\.composer\.json$", "", manifest_file)
src_manifest = parse_manifest(src_dir/manifest_file)
dst_manifest = parse_manifest(dst_dir/manifest_file)
# compare the two manifests
diff = manifests_difference(src_manifest, dst_manifest)
if diff:
# the manifests differ somehow
if diff & ManifestDifference.SRC_NEWER:
# source is newer than destination, so we copy both dist and manifest
# take updated manifest from src
dist_op = rewrite_dist(src_manifest, src_region, src_bucket, src_prefix, dst_region, dst_bucket, dst_prefix)
dist_op["op"] = "update"
dist_op["package"] = package
ops.append(dist_op)
# write out updated manifest (remember we updated the newer src manifest with a dst dist url)
args.dry_run or json.dump(src_manifest, open(dst_dir / manifest_file, "w"), sort_keys=True, default=serialize_datetime)
# so we're passing in the dst_dir here
ops.append({"kind": "manifest", "op": "update", "package": package, "source": dst_dir/manifest_file, "destination": "s3://{}/{}{}".format(dst_bucket, dst_prefix, manifest_file), "destination-region": dst_region})
elif diff & ManifestDifference.DST_NEWER:
# destination is newer - do not overwrite
ops.append({"kind": "manifest", "op": "update", "skip": True, "package": package, "source": src_dir/manifest_file, "reason": "destination is newer"})
elif diff & ManifestDifference.CONTENTS:
ops.append({"kind": "manifest", "op": "update", "skip": True, "package": package, "source": src_dir/manifest_file, "reason": "contents differ, but times are identical"})

json.dump(ops, sys.stdout, default=str)
Loading

0 comments on commit 2485a3e

Please sign in to comment.