Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read multiple blocks if available while loading Metafile contents #1013

Merged
merged 1 commit into from Jul 6, 2016

Conversation

bookshelfdave
Copy link
Contributor

@bookshelfdave bookshelfdave commented Jun 29, 2016

Fixes #1011

This adds a components/core/tests/fixtures/unhappyhumans-possums-8.1.4-20160427165340-x86_64-linux.hart file with 1024 lines in TDEPS to test with.

cc @habitat-sh/habitat-core-maintainers

@thesentinels
Copy link
Contributor

By analyzing the blame information on this pull request, we identified @fnichol and @reset to be potential reviewers

Fixes #1011

Signed-off-by: Dave Parfitt <dparfitt@chef.io>
@miketheman
Copy link
Contributor

I concur - the proposed fix solves the problem.

@bookshelfdave
Copy link
Contributor Author

@habitat-sh/habitat-core-maintainers this should be reviewed/merged asap!

@smith
Copy link
Contributor

smith commented Jul 6, 2016

Looks good!

@thesentinels r+

gif-keyboard-5936216153901027413

@thesentinels
Copy link
Contributor

📌 Commit 8e5681b has been approved by smith

@thesentinels
Copy link
Contributor

⌛ Testing commit 8e5681b with merge 13c4b53...

thesentinels added a commit that referenced this pull request Jul 6, 2016
Read multiple blocks if available while loading Metafile contents

Fixes #1011

This adds a `components/core/tests/fixtures/unhappyhumans-possums-8.1.4-20160427165340-x86_64-linux.hart` file with 1024 lines in TDEPS to test with.

cc @habitat-sh/habitat-core-maintainers
@thesentinels
Copy link
Contributor

☀️ Test successful - travis

@thesentinels thesentinels merged commit 8e5681b into master Jul 6, 2016
@fnichol fnichol deleted the dp_trace_install branch July 7, 2016 20:33
@fnichol
Copy link
Collaborator

fnichol commented Jul 7, 2016

Sweet, I was looking for this PR. Nice to see that it's merged.

@bookshelfdave
Copy link
Contributor Author

oops, just noticed the branch name is off. my bad :-(

On Jul 7, 2016, at 4:33 PM, Fletcher Nichol notifications@github.com wrote:

Sweet, I was looking for this PR. Nice to see that it's merged.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

fnichol added a commit that referenced this pull request Jul 21, 2016
This changeset contains several logical changes, all of which are
coupled together as they were discovered in the process of refactoring
the installation logic. The logical changes are broken out into sections
below.

Pull-Through Local Caching Package Installation
-----------------------------------------------

The implementation for package installation was re-written in order to
agressively use the artifact cache (by default `/hab/cache/artifacts` on
disk). The previous implementation was built up over time with
installation from local artifacts added at a later date. Consequently,
many installation inconsistencies were present:

* When performing an installation from a local package artifact, all
  dependencies were blindly installed from an upstream Depot--whether or
  not they were installed or even cached on disk.
* When installing from a package identifier string, the fully dependency
  list was queried from an upstream Depot even though the fetched
  package artifact contained this metadata. Furthermore, the artifact's
  own metadata should be trusted over any other source, assuming
  the package is confirmed "verified".
* The local package artifact installation read package dependencies from
  its metadata file correctly which uncovered a bug (fixed in #1013)
  which would have been found earlier had all installation strategies
  uses the same code path.
* When performing an installation from a local package artifact, the
  artifact file was never copied into the artifact cache for future
  re-use. Neither were any locally available dependency package
  artifacts.

This change introduces a new internal struct called an `InstallTask`
that maintains knowledge of various directories as well as a common
Depot client which is only constructed once. There are 2 primary
installation strategies still present: install from a package identifier
string (ex: `core/redis`) or install from a local package artifact (ex:
`*.hart`). Effort was made to ensure that both approaches use the same
code path as much as possible to minimize future behavioral bugs.

The updated installation high level strategies are summarized here:

From Package Identifier:

1. Is the given package identifer fully qualified? If not, query the
   upstream Depot for the latest package identifier that satisfies the
   given "fuzzy" package identifer. In this way a given `core/redis` may
   return a fully qualified `core/redis/3.0.7/20160614001713` identifier.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Is there a package artifact for the specific release in the artifact
   cache? If there is, we use this one. Otherwise, fetch the specific
   release from the upstream Depot and write the package artifact into
   the artifact cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 above so that all dependencies
   are either confirmed pre-installed or each cached package artifact
   has been fully verified.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

From Local Package Artifact:

1. Extract the fully qualified package identifier from the `IDENT`
   metadata file in the package artifact.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Copy the artifact into the artifact cache and rewrite the file name
   based on the computed artifact name. This means that we can install
   from a Habitat package with any arbitrary name like `file.1` and it
   would be rewritten to
   `core-redis-3.0.7-20160614001713-x86_64-linux.hart` in the artifact
   cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 in the "From Package
   Identifier" section so that all dependencies are either confirmed
   pre-installed or each cached package artifact has been fully
   verified. There is one difference however: for each dependency, first
   check to see if a package artifact exists in the same directory that
   contained the original target package artifact. There is a good
   chance that if the directory contains many other package artifacts
   that we can use these rather than re-fetching the exact same ones
   from an upstream Depot. Unlike the "From Package Identifier"
   scenario, we have an explicit directory we can check, so there is
   no harm in trying.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

New: Supervisor Start From Package Artifact
-------------------------------------------

In the course of the installation refactoring above, an easy new
improvement (or feature) presented itself: the ability to start
supervising a package that has not yet been installed, given a local
package artifact. In other words, this change now supports the
following:

    hab start ./results/core-redis-3.0.7-20160614001713-x86_64-linux.hart

or:

    hab sup start /tmp/redis.hart

assuming that `/tmp/redis.hart` is a legitimate Habitat package artifact
and was simply renamed.

As with the installation implementation rewrite above, if the package is
already installed, then there is no work to do, otherwise the "From
Local Package Artifact" strategy above is used.

The intended primary use case for this pattern in for Plan authors
working on their Plans in a Studio instance. This will allow them to
build a package artifact and them immediately start it without having to
install it beforehand. Secondarily, this pattern may be useful in
environments with slow or limited internet access and eventually fully
"air gapped" environments.

Fixed: at-once Update Strategy
------------------------------

Also in the course of the above refactoring 2 bugs were found in the
"at-once" update strategy in the Supervisor:

1. Once a new version of a package is found in an upstream Depot, it is
   installed but the new version is not loaded when `start_package()`
   is called (source:
https://github.com/habitat-sh/habitat/blob/9e25233fafe0433864ab47d0d8ede6344d30be8a/components/sup/src/command/start.rs#L120)
   with the consequence that the original release is re-loaded and
   new version is never used until the Supervisor completely
   restarts.
2. The former custom install logic in the "at-once" strategy only
   fetched and extracted the new target package meaning that if the
   new release contained updated dependency versions, these would
   not be installed and could result in the supervised process
   entering a crash loop. The fix for this was simply to invoke the
   same installation logic as used everywhere else in the system and is
   explained in much greater detail above.

Improved: Removed Optional Depot URL in Supervisor
--------------------------------------------------

While refactoring the Supervisor component, it became clear that earlier
versions of the codebase accepted an *optional* upstream Depot URL.
Currently, fall-back default URL values are used and so we can guarentee
that a value is always present for the Supervisor. The config value of
`url` was updated to simply be a `String` (formerly an `Option<String>`)
and all extra checking was removed where a value may not have been
present. Less branching logic is almost always a good thing!

Bonus: Remove Unused Imports
----------------------------

In the course of regression testing, several "unused import" warnings
were found and removed. A few were due to code refactoring, but others
were most likely due to upgrading the Rust compiler toolchain in the
last few weeks. Just keeping the shop floor swept.

Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
fnichol added a commit that referenced this pull request Jul 21, 2016
This changeset contains several logical changes, all of which are
coupled together as they were discovered in the process of refactoring
the installation logic. The logical changes are broken out into sections
below.

Pull-Through Local Caching Package Installation
-----------------------------------------------

The implementation for package installation was re-written in order to
aggressively use the artifact cache (by default `/hab/cache/artifacts`
on disk). The previous implementation was built up over time with
installation from local artifacts added at a later date. Consequently,
many installation inconsistencies were present:

* When performing an installation from a local package artifact, all
  dependencies were blindly installed from an upstream Depot--whether or
  not they were installed or even cached on disk.
* When installing from a package identifier string, the fully dependency
  list was queried from an upstream Depot even though the fetched
  package artifact contained this metadata. Furthermore, the artifact's
  own metadata should be trusted over any other source, assuming
  the package is confirmed "verified".
* The local package artifact installation read package dependencies from
  its metadata file correctly which uncovered a bug (fixed in #1013)
  which would have been found earlier had all installation strategies
  uses the same code path.
* When performing an installation from a local package artifact, the
  artifact file was never copied into the artifact cache for future
  re-use. Neither were any locally available dependency package
  artifacts.

This change introduces a new internal struct called an `InstallTask`
that maintains knowledge of various directories as well as a common
Depot client which is only constructed once. There are 2 primary
installation strategies still present: install from a package identifier
string (ex: `core/redis`) or install from a local package artifact (ex:
`*.hart`). Effort was made to ensure that both approaches use the same
code path as much as possible to minimize future behavioral bugs.

The updated installation high level strategies are summarized here:

From Package Identifier:

1. Is the given package identifier fully qualified? If not, query the
   upstream Depot for the latest package identifier that satisfies the
   given "fuzzy" package identifier. In this way a given `core/redis`
   may return a fully qualified `core/redis/3.0.7/20160614001713`
   identifier.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Is there a package artifact for the specific release in the artifact
   cache? If there is, we use this one. Otherwise, fetch the specific
   release from the upstream Depot and write the package artifact into
   the artifact cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 above so that all dependencies
   are either confirmed pre-installed or each cached package artifact
   has been fully verified.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

From Local Package Artifact:

1. Extract the fully qualified package identifier from the `IDENT`
   metadata file in the package artifact.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Copy the artifact into the artifact cache and rewrite the file name
   based on the computed artifact name. This means that we can install
   from a Habitat package with any arbitrary name like `file.1` and it
   would be rewritten to
   `core-redis-3.0.7-20160614001713-x86_64-linux.hart` in the artifact
   cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 in the "From Package
   Identifier" section so that all dependencies are either confirmed
   pre-installed or each cached package artifact has been fully
   verified. There is one difference however: for each dependency, first
   check to see if a package artifact exists in the same directory that
   contained the original target package artifact. There is a good
   chance that if the directory contains many other package artifacts
   that we can use these rather than re-fetching the exact same ones
   from an upstream Depot. Unlike the "From Package Identifier"
   scenario, we have an explicit directory we can check, so there is
   no harm in trying.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

New: Supervisor Start From Package Artifact
-------------------------------------------

In the course of the installation refactoring above, an easy new
improvement (or feature) presented itself: the ability to start
supervising a package that has not yet been installed, given a local
package artifact. In other words, this change now supports the
following:

    hab start ./results/core-redis-3.0.7-20160614001713-x86_64-linux.hart

or:

    hab sup start /tmp/redis.hart

assuming that `/tmp/redis.hart` is a legitimate Habitat package artifact
and was simply renamed.

As with the installation implementation rewrite above, if the package is
already installed, then there is no work to do, otherwise the "From
Local Package Artifact" strategy above is used.

The intended primary use case for this pattern in for Plan authors
working on their Plans in a Studio instance. This will allow them to
build a package artifact and them immediately start it without having to
install it beforehand. Secondarily, this pattern may be useful in
environments with slow or limited internet access and eventually fully
"air gapped" environments.

Fixed: at-once Update Strategy
------------------------------

Also in the course of the above refactoring 2 bugs were found in the
"at-once" update strategy in the Supervisor:

1. Once a new version of a package is found in an upstream Depot, it is
   installed but the new version is not loaded when `start_package()`
   is called (source:
https://github.com/habitat-sh/habitat/blob/9e25233fafe0433864ab47d0d8ede6344d30be8a/components/sup/src/command/start.rs#L120)
   with the consequence that the original release is re-loaded and
   new version is never used until the Supervisor completely
   restarts.
2. The former custom install logic in the "at-once" strategy only
   fetched and extracted the new target package meaning that if the
   new release contained updated dependency versions, these would
   not be installed and could result in the supervised process
   entering a crash loop. The fix for this was simply to invoke the
   same installation logic as used everywhere else in the system and is
   explained in much greater detail above.

Improved: Removed Optional Depot URL in Supervisor
--------------------------------------------------

While refactoring the Supervisor component, it became clear that earlier
versions of the codebase accepted an *optional* upstream Depot URL.
Currently, fall-back default URL values are used and so we can guarantee
that a value is always present for the Supervisor. The config value of
`url` was updated to simply be a `String` (formerly an `Option<String>`)
and all extra checking was removed where a value may not have been
present. Less branching logic is almost always a good thing!

Bonus: Remove Unused Imports
----------------------------

In the course of regression testing, several "unused import" warnings
were found and removed. A few were due to code refactoring, but others
were most likely due to upgrading the Rust compiler toolchain in the
last few weeks. Just keeping the shop floor swept.

Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
fnichol added a commit that referenced this pull request Aug 4, 2016
This changeset contains several logical changes, all of which are
coupled together as they were discovered in the process of refactoring
the installation logic. The logical changes are broken out into sections
below.

Pull-Through Local Caching Package Installation
-----------------------------------------------

The implementation for package installation was re-written in order to
aggressively use the artifact cache (by default `/hab/cache/artifacts`
on disk). The previous implementation was built up over time with
installation from local artifacts added at a later date. Consequently,
many installation inconsistencies were present:

* When performing an installation from a local package artifact, all
  dependencies were blindly installed from an upstream Depot--whether or
  not they were installed or even cached on disk.
* When installing from a package identifier string, the fully dependency
  list was queried from an upstream Depot even though the fetched
  package artifact contained this metadata. Furthermore, the artifact's
  own metadata should be trusted over any other source, assuming
  the package is confirmed "verified".
* The local package artifact installation read package dependencies from
  its metadata file correctly which uncovered a bug (fixed in #1013)
  which would have been found earlier had all installation strategies
  uses the same code path.
* When performing an installation from a local package artifact, the
  artifact file was never copied into the artifact cache for future
  re-use. Neither were any locally available dependency package
  artifacts.

This change introduces a new internal struct called an `InstallTask`
that maintains knowledge of various directories as well as a common
Depot client which is only constructed once. There are 2 primary
installation strategies still present: install from a package identifier
string (ex: `core/redis`) or install from a local package artifact (ex:
`*.hart`). Effort was made to ensure that both approaches use the same
code path as much as possible to minimize future behavioral bugs.

The updated installation high level strategies are summarized here:

From Package Identifier:

1. Is the given package identifier fully qualified? If not, query the
   upstream Depot for the latest package identifier that satisfies the
   given "fuzzy" package identifier. In this way a given `core/redis`
   may return a fully qualified `core/redis/3.0.7/20160614001713`
   identifier.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Is there a package artifact for the specific release in the artifact
   cache? If there is, we use this one. Otherwise, fetch the specific
   release from the upstream Depot and write the package artifact into
   the artifact cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 above so that all dependencies
   are either confirmed pre-installed or each cached package artifact
   has been fully verified.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

From Local Package Artifact:

1. Extract the fully qualified package identifier from the `IDENT`
   metadata file in the package artifact.
2. Is the specific release already installed for the determined fully
   qualified package identifier? If so, return early as there is no
   further work to do.
3. Copy the artifact into the artifact cache and rewrite the file name
   based on the computed artifact name. This means that we can install
   from a Habitat package with any arbitrary name like `file.1` and it
   would be rewritten to
   `core-redis-3.0.7-20160614001713-x86_64-linux.hart` in the artifact
   cache.
4. Verify the package artifact to ensure that its fully qualified package
   identifier precisely matches the intended target package identifier.
   Next, verify that the package artifact is correctly signed and is
   self-consistent.
5. Determine the fully list of runtime dependencies by extracting the
   `TDEPS` metadata file from the local package artifact. For each
   dependency, perform steps 2 through 4 in the "From Package
   Identifier" section so that all dependencies are either confirmed
   pre-installed or each cached package artifact has been fully
   verified. There is one difference however: for each dependency, first
   check to see if a package artifact exists in the same directory that
   contained the original target package artifact. There is a good
   chance that if the directory contains many other package artifacts
   that we can use these rather than re-fetching the exact same ones
   from an upstream Depot. Unlike the "From Package Identifier"
   scenario, we have an explicit directory we can check, so there is
   no harm in trying.
6. Extract/unpack each package artifact dependency that was not found to
   be pre-installed, followed by the target package artifact last. The
   order is important to ensure that target package is not installed
   until all of its dependencies are correctly installed.

New: Supervisor Start From Package Artifact
-------------------------------------------

In the course of the installation refactoring above, an easy new
improvement (or feature) presented itself: the ability to start
supervising a package that has not yet been installed, given a local
package artifact. In other words, this change now supports the
following:

    hab start ./results/core-redis-3.0.7-20160614001713-x86_64-linux.hart

or:

    hab sup start /tmp/redis.hart

assuming that `/tmp/redis.hart` is a legitimate Habitat package artifact
and was simply renamed.

As with the installation implementation rewrite above, if the package is
already installed, then there is no work to do, otherwise the "From
Local Package Artifact" strategy above is used.

The intended primary use case for this pattern in for Plan authors
working on their Plans in a Studio instance. This will allow them to
build a package artifact and them immediately start it without having to
install it beforehand. Secondarily, this pattern may be useful in
environments with slow or limited internet access and eventually fully
"air gapped" environments.

Fixed: at-once Update Strategy
------------------------------

Also in the course of the above refactoring 2 bugs were found in the
"at-once" update strategy in the Supervisor:

1. Once a new version of a package is found in an upstream Depot, it is
   installed but the new version is not loaded when `start_package()`
   is called (source:
https://github.com/habitat-sh/habitat/blob/9e25233fafe0433864ab47d0d8ede6344d30be8a/components/sup/src/command/start.rs#L120)
   with the consequence that the original release is re-loaded and
   new version is never used until the Supervisor completely
   restarts.
2. The former custom install logic in the "at-once" strategy only
   fetched and extracted the new target package meaning that if the
   new release contained updated dependency versions, these would
   not be installed and could result in the supervised process
   entering a crash loop. The fix for this was simply to invoke the
   same installation logic as used everywhere else in the system and is
   explained in much greater detail above.

Improved: Removed Optional Depot URL in Supervisor
--------------------------------------------------

While refactoring the Supervisor component, it became clear that earlier
versions of the codebase accepted an *optional* upstream Depot URL.
Currently, fall-back default URL values are used and so we can guarantee
that a value is always present for the Supervisor. The config value of
`url` was updated to simply be a `String` (formerly an `Option<String>`)
and all extra checking was removed where a value may not have been
present. Less branching logic is almost always a good thing!

Bonus: Remove Unused Imports
----------------------------

In the course of regression testing, several "unused import" warnings
were found and removed. A few were due to code refactoring, but others
were most likely due to upgrading the Rust compiler toolchain in the
last few weeks. Just keeping the shop floor swept.

Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot install a .hart file into another studio
6 participants