Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite package install strategy. #1093

Merged
merged 1 commit into from Aug 6, 2016
Merged

Commits on Aug 4, 2016

  1. Rewrite package install strategy.

    This changeset contains several logical changes, all of which are
    coupled together as they were discovered in the process of refactoring
    the installation logic. The logical changes are broken out into sections
    below.
    
    Pull-Through Local Caching Package Installation
    -----------------------------------------------
    
    The implementation for package installation was re-written in order to
    aggressively use the artifact cache (by default `/hab/cache/artifacts`
    on disk). The previous implementation was built up over time with
    installation from local artifacts added at a later date. Consequently,
    many installation inconsistencies were present:
    
    * When performing an installation from a local package artifact, all
      dependencies were blindly installed from an upstream Depot--whether or
      not they were installed or even cached on disk.
    * When installing from a package identifier string, the fully dependency
      list was queried from an upstream Depot even though the fetched
      package artifact contained this metadata. Furthermore, the artifact's
      own metadata should be trusted over any other source, assuming
      the package is confirmed "verified".
    * The local package artifact installation read package dependencies from
      its metadata file correctly which uncovered a bug (fixed in #1013)
      which would have been found earlier had all installation strategies
      uses the same code path.
    * When performing an installation from a local package artifact, the
      artifact file was never copied into the artifact cache for future
      re-use. Neither were any locally available dependency package
      artifacts.
    
    This change introduces a new internal struct called an `InstallTask`
    that maintains knowledge of various directories as well as a common
    Depot client which is only constructed once. There are 2 primary
    installation strategies still present: install from a package identifier
    string (ex: `core/redis`) or install from a local package artifact (ex:
    `*.hart`). Effort was made to ensure that both approaches use the same
    code path as much as possible to minimize future behavioral bugs.
    
    The updated installation high level strategies are summarized here:
    
    From Package Identifier:
    
    1. Is the given package identifier fully qualified? If not, query the
       upstream Depot for the latest package identifier that satisfies the
       given "fuzzy" package identifier. In this way a given `core/redis`
       may return a fully qualified `core/redis/3.0.7/20160614001713`
       identifier.
    2. Is the specific release already installed for the determined fully
       qualified package identifier? If so, return early as there is no
       further work to do.
    3. Is there a package artifact for the specific release in the artifact
       cache? If there is, we use this one. Otherwise, fetch the specific
       release from the upstream Depot and write the package artifact into
       the artifact cache.
    4. Verify the package artifact to ensure that its fully qualified package
       identifier precisely matches the intended target package identifier.
       Next, verify that the package artifact is correctly signed and is
       self-consistent.
    5. Determine the fully list of runtime dependencies by extracting the
       `TDEPS` metadata file from the local package artifact. For each
       dependency, perform steps 2 through 4 above so that all dependencies
       are either confirmed pre-installed or each cached package artifact
       has been fully verified.
    6. Extract/unpack each package artifact dependency that was not found to
       be pre-installed, followed by the target package artifact last. The
       order is important to ensure that target package is not installed
       until all of its dependencies are correctly installed.
    
    From Local Package Artifact:
    
    1. Extract the fully qualified package identifier from the `IDENT`
       metadata file in the package artifact.
    2. Is the specific release already installed for the determined fully
       qualified package identifier? If so, return early as there is no
       further work to do.
    3. Copy the artifact into the artifact cache and rewrite the file name
       based on the computed artifact name. This means that we can install
       from a Habitat package with any arbitrary name like `file.1` and it
       would be rewritten to
       `core-redis-3.0.7-20160614001713-x86_64-linux.hart` in the artifact
       cache.
    4. Verify the package artifact to ensure that its fully qualified package
       identifier precisely matches the intended target package identifier.
       Next, verify that the package artifact is correctly signed and is
       self-consistent.
    5. Determine the fully list of runtime dependencies by extracting the
       `TDEPS` metadata file from the local package artifact. For each
       dependency, perform steps 2 through 4 in the "From Package
       Identifier" section so that all dependencies are either confirmed
       pre-installed or each cached package artifact has been fully
       verified. There is one difference however: for each dependency, first
       check to see if a package artifact exists in the same directory that
       contained the original target package artifact. There is a good
       chance that if the directory contains many other package artifacts
       that we can use these rather than re-fetching the exact same ones
       from an upstream Depot. Unlike the "From Package Identifier"
       scenario, we have an explicit directory we can check, so there is
       no harm in trying.
    6. Extract/unpack each package artifact dependency that was not found to
       be pre-installed, followed by the target package artifact last. The
       order is important to ensure that target package is not installed
       until all of its dependencies are correctly installed.
    
    New: Supervisor Start From Package Artifact
    -------------------------------------------
    
    In the course of the installation refactoring above, an easy new
    improvement (or feature) presented itself: the ability to start
    supervising a package that has not yet been installed, given a local
    package artifact. In other words, this change now supports the
    following:
    
        hab start ./results/core-redis-3.0.7-20160614001713-x86_64-linux.hart
    
    or:
    
        hab sup start /tmp/redis.hart
    
    assuming that `/tmp/redis.hart` is a legitimate Habitat package artifact
    and was simply renamed.
    
    As with the installation implementation rewrite above, if the package is
    already installed, then there is no work to do, otherwise the "From
    Local Package Artifact" strategy above is used.
    
    The intended primary use case for this pattern in for Plan authors
    working on their Plans in a Studio instance. This will allow them to
    build a package artifact and them immediately start it without having to
    install it beforehand. Secondarily, this pattern may be useful in
    environments with slow or limited internet access and eventually fully
    "air gapped" environments.
    
    Fixed: at-once Update Strategy
    ------------------------------
    
    Also in the course of the above refactoring 2 bugs were found in the
    "at-once" update strategy in the Supervisor:
    
    1. Once a new version of a package is found in an upstream Depot, it is
       installed but the new version is not loaded when `start_package()`
       is called (source:
    https://github.com/habitat-sh/habitat/blob/9e25233fafe0433864ab47d0d8ede6344d30be8a/components/sup/src/command/start.rs#L120)
       with the consequence that the original release is re-loaded and
       new version is never used until the Supervisor completely
       restarts.
    2. The former custom install logic in the "at-once" strategy only
       fetched and extracted the new target package meaning that if the
       new release contained updated dependency versions, these would
       not be installed and could result in the supervised process
       entering a crash loop. The fix for this was simply to invoke the
       same installation logic as used everywhere else in the system and is
       explained in much greater detail above.
    
    Improved: Removed Optional Depot URL in Supervisor
    --------------------------------------------------
    
    While refactoring the Supervisor component, it became clear that earlier
    versions of the codebase accepted an *optional* upstream Depot URL.
    Currently, fall-back default URL values are used and so we can guarantee
    that a value is always present for the Supervisor. The config value of
    `url` was updated to simply be a `String` (formerly an `Option<String>`)
    and all extra checking was removed where a value may not have been
    present. Less branching logic is almost always a good thing!
    
    Bonus: Remove Unused Imports
    ----------------------------
    
    In the course of regression testing, several "unused import" warnings
    were found and removed. A few were due to code refactoring, but others
    were most likely due to upgrading the Rust compiler toolchain in the
    last few weeks. Just keeping the shop floor swept.
    
    Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
    fnichol committed Aug 4, 2016
    Copy the full SHA
    2e19a32 View commit details
    Browse the repository at this point in the history