Skip to content

Commit

Permalink
Support accessing remote registries via ssh (#589)
Browse files Browse the repository at this point in the history
* Support accessing remote registries via ssh
* Factored out a function that tells whether a registry path is local
* Make sure the URL is used, not self.source which could be a local path

The URL has to be given as "ssh://[user@]host.xz[:port]/path/to/repo.git"
rather than the shorter version "[user@]host.xz:path/to/repo.git"

* Make self.source include the subdir from the start. Allows implementing iter_modules in the base class
* The check already happens in _update_cache()
* Moved is_path_local to shpc.utils
* Added a safeguard to prevent cloning multiple times
* clone() is actually only supported by VersionControl
* No need to yield self.source in iter_modules since it's constant and accessible from outside (and not all callers want it !)
* It's more practical to yield the the registry object (provider) rather than using the "source" path (which is undefined for remote registries anyway)
* Optimised the "update all" mode by directly using Result objects from the registries. Otherwise, it wastes time finding modules that we know are there
* Clones only ever exist within a function
* Optimised iter_modules method for remote registries (using the cache)
* Moved back iter_modules to Filesystem since VersionControl has its own, optimised, version
* Stopped using self.source in VersionControl, to avoid confusion with Filesystem
* url, not source, is to be used for remote registries
* Cannot do these heuristics as we need to report unexisting local paths
* str.split can limit the number of splits
* The main Registry object, not the settings, should decide whether there is a local registry or not
* To avoid duplicating the code that assesses whether a path or local or not, check which sub-class of Provider is used
* The parent class shouldn't know that much about the subclasses
* Restored back the automatic addition of https://
* Restructured to avoid an unnecessary else
* shpc convention: no else when the then ends with a return
* Unnecessary due to operator precedence rule
* Added a cache in `library_url`
* Fixed the implementation of the cache in VersionControl.exists
* exists has its own implementation in VersionControl, so this implementation is in fact specific to Filesystem
* iter_registry is basically iter_modules with an extra filter
* Yield relative paths rather than full paths since *all* consumers need relative paths
* Proper method to cleanup a clone
* Removed a cleanup() call that was expected to do nothing
* Increased the symmetry to simplify the maintainability
* NotImplementedError is more useful than pass
* The tuplized version is not the preference here
* Easier to understand
* Made the clone return a Filesystem object independent from VersionControl
* Extra comment
* Back to a subclass of VersionControl for each forge
* Pre-parse the URL
* VersionControl should not be directly used
* Renamed the variable for clarity
* Removing yaml because it's the only file we have for a container
* Defensive programming: local could still be None
* bugfix: iter_modules needs to yield paths to container.yaml
* Moved the cleanup call up to _sync()
* bugfix: iter_modules now returns path to the container.yaml
* Need to check here too that the clone still exists
* Also need to reset self._clone if the directory is gone
* More checks on local and remote
* The temp directory may have been deleted in the meantime
* It makes more sense to cleanup tmplocal than self, and it works because self expects the cleanup may happen
* Moved this to the parent class
* Another implementation that doesn't make it too obvious the base-class knows about GitHub and GitLab
* Silly typo: self._clone is a Filesystem object, not a string
* No colon
* You shall use American spelling
* Added a test to showcase ssh

* Revert "bugfix: iter_modules needs to yield paths to container.yaml"

This reverts commit f069f48.

* Revert "bugfix: iter_modules now returns path to the container.yaml"

This reverts commit c5b4cb9.
  • Loading branch information
muffato committed Nov 8, 2022
1 parent c035442 commit d059bf9
Show file tree
Hide file tree
Showing 9 changed files with 225 additions and 200 deletions.
14 changes: 9 additions & 5 deletions shpc/main/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,12 +124,16 @@ def update(self, name=None, dryrun=False, filters=None):
"""
# No name provided == "update all"
if name:
modules = [name]
# find the module in the registries. _load_container
# calls `container.ContainerConfig(result)` like below
configs = [self._load_container(name)]
else:
modules = [x[1] for x in list(self.registry.iter_modules())]

for module_name in modules:
config = self._load_container(module_name)
# directly iterate over the content of the registry
configs = []
for result in self.registry.iter_registry():
configs.append(container.ContainerConfig(result))
# do the update
for config in configs:
config.update(dryrun=dryrun, filters=filters)

def test(
Expand Down
23 changes: 13 additions & 10 deletions shpc/main/modules/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,12 @@ def add(self, image, module_name=None, **kwargs):
"""
Add a container to the registry to enable install.
"""
self.settings.ensure_filesystem_registry()
local_registry = self.registry.filesystem_registry

if not local_registry:
logger.exit(
"This command is only supported for a filesystem registry! Add one or use --registry."
)

# Docker module name is always the same namespace as the image
if image.startswith("docker"):
Expand All @@ -185,7 +190,7 @@ def add(self, image, module_name=None, **kwargs):

# Assume adding to default registry
dest = os.path.join(
self.settings.filesystem_registry,
local_registry.source,
module_name.split(":")[0],
"container.yaml",
)
Expand Down Expand Up @@ -235,10 +240,9 @@ def docgen(self, module_name, registry=None, out=None, branch="main"):
aliases = config.get_aliases()
template = self.template.load("docs.md")
registry = registry or defaults.github_url
github_url = "%s/blob/%s/%s/container.yaml" % (registry, branch, module_name)
raw_github_url = shpc.main.registry.get_module_config_url(
registry, module_name, branch
)
remote = self.registry.get_registry(registry, tag=branch)
github_url = remote.get_container_url(module_name)
raw_github_url = remote.get_raw_container_url(module_name)

# Currently one doc is rendered for all containers
result = template.render(
Expand Down Expand Up @@ -306,10 +310,9 @@ def _get_module_lookup(self, base, filename, pattern=None):
A shared function to get a lookup of installed modules or registry entries
"""
modules = {}
for fullpath in utils.recursive_find(base, pattern):
if fullpath.endswith(filename):
module_name, version = os.path.dirname(fullpath).rsplit(os.sep, 1)
module_name = module_name.replace(base, "").strip(os.sep)
for relpath in utils.recursive_find(base, pattern):
if relpath.endswith(filename):
module_name, version = os.path.dirname(relpath).rsplit(os.sep, 1)
if module_name not in modules:
modules[module_name] = set()
modules[module_name].add(version)
Expand Down
80 changes: 47 additions & 33 deletions shpc/main/registry/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from shpc.main.settings import SettingsBase

from .filesystem import Filesystem, FilesystemResult
from .remote import GitHub, GitLab, get_module_config_url
from .remote import GitHub, GitLab


def update_container_module(module, from_path, existing_path):
Expand All @@ -23,13 +23,12 @@ def update_container_module(module, from_path, existing_path):
"""
if not os.path.exists(existing_path):
shpc.utils.mkdir_p(existing_path)
for filename in shpc.utils.recursive_find(from_path):
relative_path = filename.replace(from_path, "").strip("/")
for relative_path in shpc.utils.recursive_find(from_path):
to_path = os.path.join(existing_path, relative_path)
if os.path.exists(to_path):
shutil.rmtree(to_path)
shpc.utils.mkdir_p(os.path.dirname(to_path))
shutil.copy2(filename, to_path)
shutil.copy2(os.path.join(from_path, relative_path), to_path)


class Registry:
Expand All @@ -44,21 +43,29 @@ def __init__(self, settings=None):
# and they must exist.
self.registries = [self.get_registry(r) for r in self.settings.registry]

@property
def filesystem_registry(self):
"""
Return the first found filesystem registry.
"""
for registry in self.registries:
if isinstance(registry, Filesystem):
return registry

def exists(self, name):
"""
Determine if a module name *exists* in any local registry, return path
Determine if a module name *exists* in any registry, return the first one
"""
for reg in self.registries:
if reg.exists(name):
return os.path.join(reg.source, name)
return reg

def iter_registry(self, filter_string=None):
"""
Iterate over all known registries defined in settings.
"""
for reg in self.registries:
for entry in reg.iter_registry(filter_string=filter_string):
yield entry
yield from reg.iter_registry(filter_string=filter_string)

def find(self, name, path=None):
"""
Expand All @@ -80,19 +87,19 @@ def iter_modules(self):
"""
Iterate over modules found across the registry
"""
for reg in self.registries:
for registry, module in reg.iter_modules():
for registry in self.registries:
for module in registry.iter_modules():
yield registry, module

def get_registry(self, source):
def get_registry(self, source, **kwargs):
"""
A registry is a local or remote registry.
We can upgrade from, or otherwise list
"""
for Registry in PROVIDERS:
if Registry.matches(source):
return Registry(source)
return Registry(source, **kwargs)
raise ValueError("No matching registry provider for %s" % source)

def sync(
Expand Down Expand Up @@ -128,20 +135,10 @@ def _sync(
local=None,
sync_registry=None,
):
# Registry to sync from
sync_registry = sync_registry or self.settings.sync_registry

# Create a remote registry with settings preference
Remote = GitHub if "github.com" in sync_registry else GitLab
remote = Remote(sync_registry, tag=tag)
local = self.get_registry(local or self.settings.filesystem_registry)

# We sync to our first registry - if not filesystem, no go
if not local.is_filesystem_registry:
logger.exit(
"sync is only supported for a remote to a filesystem registry: %s"
% sync_registry.source
)
remote = self.get_registry(
sync_registry or self.settings.sync_registry, tag=tag
)

# Upgrade the current registry from the remote
self.sync_from_remote(
Expand All @@ -152,6 +149,8 @@ def _sync(
add_new=add_new,
local=local,
)

#  Cleanup the remote once we've done the sync
remote.cleanup()

def sync_from_remote(
Expand All @@ -163,26 +162,41 @@ def sync_from_remote(
If the registry module is not installed, we install to the first
filesystem registry found in the list.
"""
updates = False

## First get a valid local Registry
# A local (string) path provided
if local and isinstance(local, str) and os.path.exists(local):
if local and isinstance(local, str):
if not os.path.exists(local):
logger.exit("The path %s doesn't exist." % local)
local = Filesystem(local)

# No local registry provided, use default
if not local:
local = Filesystem(self.settings.filesystem_registry)
local = self.filesystem_registry
# We sync to our first registry - if not filesystem, no go
if not local:
logger.exit("No local registry to sync to. Check the shpc settings.")

if not isinstance(local, Filesystem):
logger.exit(
"Can only synchronize to a local file system, not to %s." % local
)

tmpdir = remote.source
if tmpdir.startswith("http") or not os.path.exists(tmpdir):
tmpdir = remote.clone()
## Then a valid remote Registry
if not remote:
logger.exit("No remote provided. Cannot sync.")

if not isinstance(remote, Filesystem):
# Instantiate a local registry, which will have to be cleaned up
remote = remote.clone()

# These are modules to update
for regpath, module in remote.iter_modules():
updates = False
for module in remote.iter_modules():
if name and module != name:
continue

from_path = os.path.join(regpath, module)
from_path = os.path.join(remote.source, module)
existing_path = local.exists(module)

# If we have an existing module and we want to replace all files
Expand Down
30 changes: 18 additions & 12 deletions shpc/main/registry/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,20 +75,31 @@ def override_exists(self, tag):


class Filesystem(Provider):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.source = os.path.abspath(self.source)
def __init__(self, source):
if not self.matches(source):
raise ValueError(
"Filesystem registry source must exist on the filesystem. Got %s"
% source
)
self.source = os.path.abspath(source)

@classmethod
def matches(cls, source):
return os.path.exists(source) or source == "."

def exists(self, name):
return os.path.exists(os.path.join(self.source, name))

def iter_modules(self):
"""
yield module names
"""
# Find modules based on container.yaml
for filename in shpc.utils.recursive_find(self.source, "container.yaml"):
module = os.path.dirname(filename).replace(self.source, "").strip(os.sep)
module = os.path.dirname(filename)
if not module:
continue
yield self.source, module
yield module

def find(self, name):
"""
Expand All @@ -110,14 +121,9 @@ def iter_registry(self, filter_string=None):
"""
Iterate over content in filesystem registry.
"""
for filename in shpc.utils.recursive_find(self.source):
if not filename.endswith("container.yaml"):
continue
module_name = (
os.path.dirname(filename).replace(self.source, "").strip(os.sep)
)

for module_name in self.iter_modules():
# If the user has provided a filter, honor it
if filter_string and not re.search(filter_string, module_name):
continue
filename = os.path.join(self.source, module_name)
yield FilesystemResult(module_name, filename)
56 changes: 31 additions & 25 deletions shpc/main/registry/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

import os

import shpc.utils


class Result:
@property
Expand Down Expand Up @@ -32,36 +34,40 @@ class Provider:
A general provider should retrieve and provide registry files.
"""

def __init__(self, source, *args, **kwargs):
if not (source.startswith("https://") or os.path.exists(source)):
raise ValueError(
"Registry source must exist on the filesystem or be given as https://."
)
self.source = source

def exists(self, name):
return os.path.exists(os.path.join(self.source, name))

@property
def is_filesystem_registry(self):
return not self.source.startswith("http") and os.path.exists(self.source)

@property
def name(self):
return self.__class__.__name__.lower()

@classmethod
def matches(cls, source_url: str):
pass
def matches(cls, source):
"""
Returns true if this class understands the source
"""
raise NotImplementedError

def find(self, name):
pass
"""
Returns a Result object if the module can be found in the registry
"""
raise NotImplementedError

def exists(self, name):
"""
Returns true if the module can be found in the registry
"""
raise NotImplementedError

def cleanup(self):
pass
"""
Cleanup the registry
"""
raise NotImplementedError

def iter_registry(self):
pass
def iter_registry(self, filter_string=None):
"""
Iterates over the modules of this registry (that match the filte, if
provided) as Result instances
"""
raise NotImplementedError

def iter_modules(self):
pass
"""
Iterates over the module names of this registry
"""
raise NotImplementedError

0 comments on commit d059bf9

Please sign in to comment.