Skip to content

Commit

Permalink
testing adding CI to run basic commands
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Jan 17, 2023
1 parent 09871df commit fa834db
Show file tree
Hide file tree
Showing 9 changed files with 166 additions and 39 deletions.
79 changes: 79 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: conda-mirror test

permissions:
contents: read

on:
pull_request: []

jobs:
linting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check Spelling
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: ./README.md
- name: Setup Environment
run: conda create --quiet --name mirror pre-commit
- name: Lint Conda Oci Mirror
run: |
export PATH="/usr/share/miniconda/bin:$PATH"
source activate mirror
pip install -r .github/dev-requirements.txt
pre-commit run --all-files
mirror-pkgs:
runs-on: ubuntu-latest
services:
registry:
image: ghcr.io/oras-project/registry:latest
ports:
- 5000:5000
strategy:
max-parallel: 12
matrix:
package: [redo]
subdir:
- linux-64
- osx-64
- osx-arm64
- win-64
- linux-aarch64
- linux-ppc64le
- noarch
fail-fast: true
steps:
- uses: actions/checkout@v3
- name: Setup Environment
run: conda create --quiet --name mirror pre-commit
- name: Install Conda Oci Mirror
run: |
export PATH="/usr/share/miniconda/bin:$PATH"
conda install conda-build
source activate mirror
pip install -e .
- name: Test Conda Oci Mirror
shell: bash -l {0}
env:
channel: conda-forge
subdir: ${{ matrix.subdir }}
package: ${{ matrix.package }}
registry_host: localhost
registry_port: ${{ job.services.registry.ports[5000] }}
run: |
export PYTHONUNBUFFERED=1
export PATH="/usr/share/miniconda/bin:$PATH"
source activate mirror
# First run mirror
conda-oci mirror --channel ${channel} --subdir ${subdir} --package ${package} --user dinosaur --host http://${registry_host}:${registry_port}
# Then run pull-cache and push-cache
conda-oci pull-cache --user dinosaur --subdir ${subdir} --package ${package} --host http://${registry_host}:${registry_port}
conda-oci push-cache --user dinosaur --subdir ${subdir} --package ${package} --host http://${registry_host}:${registry_port}
- name: View Cache
run: sudo apt-get install -y tree && tree ./cache
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,20 +106,20 @@ account to push to:
$ conda-oci mirror --channel conda-forge --package zlib --user researchapps
```

## TODO
You can also develop with a local registry (instead of ghcr.io):

- [conda-package-handling](https://github.com/conda/conda-package-handling) is not installable via setup.cfg
- ask wolf why deid (and others I maintain) not in conda-forge noarch listing?
- add better formatting for logger (colors?)
- add `--debug` mode to see what is happening at all steps
- Question: I added size and creationTime annoations to layers - is that OK?
- it would be nice to have a version regular expression for those we want to mirror (there are often a lot). It's not obvious the best way to do this - since the user can specify multiple packages either we would have a package be like `--package zlib@1.2.11` or we would need to scope the action to be just for one package.
```bash
$ docker run -it --rm -p 5000:5000 ghcr.io/oras-project/registry:latest
```

And then specify the host and namespace for that registry - oras
will fall back to insecure mode given that you've provided http://.

Note that we aren't specifying any kind of credential or even subdirectory - we are just asking
to do a dry run for deid on conda-forge.
```bash
$ conda-oci mirror --channel conda-forge --package testinfra --user dinosaur --host http://127.0.0.0:5000 --subdir noarch
```

Notice that `--dry-run` is set, and this is appropriate because we haven't logged into
(or specified) a registry to push the mirror to.
See [TODO.md](TODO.md) for some questions and items to do.

### Linting

Expand Down
30 changes: 30 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## TODO

### High Priority

#### Support the new .conda package format.

> We do not yet support the "new" .conda package format in the OCI mirroring tools. Luckily the new conda-package-handling (and conda-package-streaming) packages are pretty OK and we can use them to do what we need: https://conda.github.io/conda-package-handling/api.html instead of using `tarfile`. The new package format files appear under the `packages.conda` key in the repodata.json file.
#### Deep and Shallow Modes

> currently we're checking against all tags on the ghcr registry which is a bit slow. We could have a "deep" and a "shallow" mode (where in the shallow mode it would use the "latest uploaded repodata" as reference from here: https://github.com/orgs/channel-mirrors/packages/container/package/conda-forge%2Fnoarch%2Frepodata.json
#### Compressed repodata

> It would be good to upload `repodata.json.zst` as a file compressed with zstd. In "regular" servers we ask for the gzip encoded response to get a compressed file over the wire but we need to be explicit with OCI registries as they don't support the on-the-fly encoding. Support for zst encoded repodata is being added to mamba soon.
### General

- [conda-package-handling](https://github.com/conda/conda-package-handling) is not installable via setup.cfg
- add better formatting for logger (colors?)
- add `--debug` mode to see what is happening at all steps
- It would be nice to have a version regular expression for those we want to mirror (there are often a lot). It's not obvious the best way to do this - since the user can specify multiple packages either we would have a package be like `--package zlib@1.2.11` or we would need to scope the action to be just for one package.

### Questions

- why is deid (and others I maintain) not in conda-forge noarch listing?
- I added size and creationTime annoations to layers - is that OK?
- why was repodata.json copied to original_repodata.json? Why do we need to save it (it doesn't seem to get used later)
- Can we have the push/pull-cache also be done in parallel?
- What does it mean for a package to start with an underscore (and in the code to change to `name = f"zzz{name}"`)?
2 changes: 1 addition & 1 deletion conda_oci_mirror/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def main():
click.option("-s", "--subdir", default=defaults.DEFAULT_SUBDIRS, multiple=True),
click.option("-p", "--package", help="Select packages", default=[], multiple=True),
click.option("--user", default=None, help="Username for ghcr.io"),
click.option("--host", default="ghcr.io", help="Host to push packages to"),
click.option("--host", default="ghcr.io", help="Host for your registry"),
click.option("--dry-run/--no-dry-run", default=False, help="Dry run?"),
click.option("--cache-dir", default=default_cache, help="Path to cache directory"),
click.option("-c", "--channel", help="Select channel", default="conda-forge"),
Expand Down
37 changes: 27 additions & 10 deletions conda_oci_mirror/mirror.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import logging
import os
import pathlib
import shutil
import subprocess

import requests
Expand Down Expand Up @@ -63,6 +64,10 @@ def __init__(
self.quiet = quiet
self.announce()

# Ensure the oras registry is set to insecure or not based on host
global oras
oras.prefix = "http" if host.startswith("http://") else "https"

# Set listing of (undistributable) packages to skip
self.skip_packages = (
get_forbidden_packages() if "conda-forge" in channels else None
Expand Down Expand Up @@ -97,8 +102,9 @@ def update(self, dry_run=False):

# If they think they are pushing but no auth, they are not :)
if not oras.has_auth and dry_run is False:
logger.warning("ORAS is not authenticated, this will be a dry run.")
dry_run = True
logger.warning(
"ORAS is not authenticated, if you registry requires auth this will not work"
)

for channel, subdir, cache_dir in self.iter_channels():
repo = repository.PackageRepo(channel, subdir, cache_dir)
Expand Down Expand Up @@ -164,12 +170,9 @@ def pull_latest(self, dry_run=False):
repodata = util.read_json(index_file)
packages = set([p["name"] for _, p in repodata["packages"].items()])
logger.info(f"Found len(packages) packages from {uri}")
renamed = os.path.join(
os.path.dirname(index_file), "original_repodata.json"
)
os.rename(index_file, renamed)

except Exception as e:
packages = set()
print(f"Issue retriving uri: {uri}: {e}")

for package in packages:
Expand Down Expand Up @@ -204,12 +207,21 @@ def push_new(self, dry_run=False):

# The channel cache is one level up from our subdir cache
channel_root = os.path.dirname(cache_dir)
conda_index(channel_root)
orig_repodata = os.path.join(cache_dir, "original_repodata.json")

# Create new repodata or load existing
# Backup the original repository data so we can index and replace it
backup_repodata = os.path.join(cache_dir, "original_repodata.json")
orig_repodata = os.path.join(cache_dir, "repodata.json")

# If we already have repository data, make a copy
if os.path.exists(orig_repodata):
repodata = util.read_json(orig_repodata)
shutil.copyfile(orig_repodata, backup_repodata)

# This nukes the repodata.json
conda_index(channel_root)

# Create new repodata or load existing from backup (before nuke)
if os.path.exists(backup_repodata):
repodata = util.read_json(backup_repodata)
else:
repodata = {"packages": []}
files = list(pathlib.Path(cache_dir).rglob("*.tar.bz2"))
Expand All @@ -230,3 +242,8 @@ def push_new(self, dry_run=False):
existing_file=str(package_name),
)
package.upload(dry_run, timestamp)

# If we cleanup, remove repodata.json and replace back with original
os.remove(orig_repodata)
if os.path.exists(backup_repodata):
shutil.move(backup_repodata, orig_repodata)
17 changes: 9 additions & 8 deletions conda_oci_mirror/oras.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def push(self, uri):
uri is the registry name with tag.
"""
# Add some custom annotations!
logger.info(f"⭐️ Pushing {uri}: {self.created_at}")
print(f"⭐️ Pushing {uri}: {self.created_at}")

# The context should be the file root
with oraslib.utils.workdir(self.root):
Expand Down Expand Up @@ -117,13 +117,14 @@ def pull_by_media_type(self, container, dest, media_type=None):
outfile = oraslib.utils.sanitize_path(dest, os.path.join(dest, artifact))

# If it already exists with the same digest, don't do it :)
expected_digest = f"sha256:{util.sha256sum(outfile)}"
if os.path.exists(outfile) and layer["digest"] == expected_digest:
print(
f"{outfile} already exists with expected hash, not re-downloading."
)
paths.append(outfile)
continue
if os.path.exists(outfile):
expected_digest = f"sha256:{util.sha256sum(outfile)}"
if layer["digest"] == expected_digest:
print(
f"{outfile} already exists with expected hash, not re-downloading."
)
paths.append(outfile)
continue

# this function handles creating the output directory if does not exist
print(f"Downloading {artifact} to {outfile}")
Expand Down
7 changes: 6 additions & 1 deletion conda_oci_mirror/package.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,11 @@ def upload(self, dry_run=False, extra_tags=None, timestamp=None):
Upload a conda package archive.
"""
extra_tags = extra_tags or ["latest"]

# If we are not given an iterable
if not isinstance(extra_tags, (list, set, tuple)):
extra_tags = set([extra_tags])

with tempfile.TemporaryDirectory() as staging_dir:
pusher = Pusher(staging_dir, timestamp=timestamp)
upload_files_path = pathlib.Path(staging_dir)
Expand Down Expand Up @@ -236,6 +241,6 @@ def upload(self, dry_run=False, extra_tags=None, timestamp=None):

# Push main tag and extras
uri = f"{self.namespace}/{self.channel}/{self.subdir}/{name}"
for tag in [version_and_build] + extra_tags:
for tag in [version_and_build] + list(extra_tags):
pusher.push(f"{uri}:{tag}")
return index
9 changes: 2 additions & 7 deletions conda_oci_mirror/repo.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self, channel, subdir, cache_dir):

@property
def repodata(self):
return os.path.join(self.cache_dir, self.name, "repodata.json")
return os.path.join(self.cache_dir, "repodata.json")

@property
def name(self):
Expand All @@ -54,12 +54,7 @@ def get_repodata(self):
"""
Get respository metadata
"""
# Cut out early if we already have it
if self.exists():
self.ensure_timestamp()
return self.repodata

# If we get here, we need to download it freshly
# TODO we should have a check here for timestamp, and re-retrieve if older than X
util.mkdir_p(os.path.dirname(self.repodata))
r = requests.get(
f"https://conda.anaconda.org/{self.channel}/{self.subdir}/repodata.json",
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ install_requires =
build
requests
oras >= 0.1.13
conda_package_handling @ git+ssh://git@github.com/conda/conda-package-handling.git#egg=main
conda_package_handling @ git+https://github.com/conda/conda-package-handling.git#egg=main
[options.entry_points]
console_scripts =
conda-oci = conda_oci_mirror.cli:main

0 comments on commit fa834db

Please sign in to comment.