Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple directories and path glob pattern support for pipeline caching #2834

Merged
merged 33 commits into from Oct 19, 2020

Conversation

ethanis
Copy link
Member

@ethanis ethanis commented Mar 3, 2020

This extends the pipeline caching tasks so that multiple paths can be cached with a single task. The following are supported scenarios:

  • Single path segment with absolute path outside of Pipeline.Workspace.
  • Single path segment with path relative to System.DefaultWorkingDirectory.
  • Multiple path segments, all paths are within Pipeline.Workspace.
  • Glob patterns with all matches within Pipeline.Workspace

The following are unsupported scenarios:

  • Glob patterns that match anything outside of Pipeline.Workspace
  • Multiple path segments with absolute path outside of Pipeline.Workspace

The limitation of having multiple paths being rooted in Pipeline.Workspace is because the tarball needs to be created somewhere, and Pipeline.Workspace seemed like the most build agent agnostic location to anchor on.

There will need to be a PR to the azure-pipelines-tasks repository to update the docs for the task inputs when/if these changes make it in.

Finally, here a some examples:

# cache 'out' and 'dist' directory (iff 'out' and 'dist' exists at root of repository)
- task: Cache@2
  displayName: Cache
  inputs:
    key: 'foo.key'
    path: 'out | dist'
    cacheHitVar: 'OUTPUT_RESTORED'
# cache all node_modules
- task: Cache@2
  displayName: Cache
  inputs:
    key: '$(Agent.OS) | npm | package-lock.json'
    path: '**/node_modules,!**/node_modules/**/node_modules'
    cacheHitVar: 'PACKAGES_RESTORED'
# cache binaries directory and 'out' directory (iff 'out' exists at root of repository)
- task: Cache@2
  displayName: Cache
  inputs:
    key: 'foo.key'
    path: '$(Pipeline.Workspace)/b | out'
    cacheHitVar: 'OUTPUT_RESTORED'
# cache node from tools directory
- task: Cache@2
  displayName: Cache
  inputs:
    key: 'foo.key'
    path: '$(Agent.ToolsDirectory)/node'
    cacheHitVar: 'NODE_RESTORED'

The following will fail because 1 path segment is outside Pipeline.Workspace and 1 is within

# cache node from tools directory
- task: Cache@2
  displayName: Cache
  inputs:
    key: 'foo.key'
    path: '$(Agent.ToolsDirectory)/node | out'

details = matchedDirectories.Values.ToArray();
resolvedSegments.AddRange(matchedDirectories.Values);

// TODO: Is it the right behavior to throw an exception if a path segment isn't resolveable?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love the team's input on what this behavior should be. For any given path segment, if there are no matches, should the tasks fail or simply not include it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, We should fail imo

}
return uploadPath;
// TODO: what is the right way to handle !ContentFormat.SingleTar
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite clear on when the ContentFormat will be something other than SingleTar so I could use some guidance on how to handle that case here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContentFormat would be 'Files' for older Cache entries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 'Files' input a single item? How do I test backwards compatibility with ContentFormat.Files entries?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, initially we didn't had tarring, and hence for the cache entries before tarring, we had the input as 'Files'. Your approach LGTM.

@ethanis
Copy link
Member Author

ethanis commented Mar 25, 2020

@fadnavistanmay should the path input be included in the cache version now? If someone adds a new item to cache, but doesn't change the key the cache entry won't be updated to include the new path. This would also allow for easy salt-ing of cache entries making backwards compat easier.

For reference, GitHub Actions includes the path in a cache's version via this PR

@fadnavistanmay
Copy link
Collaborator

This PR addresses microsoft/azure-pipelines-tasks#11219

}
else
{
DownloadDedupManifestArtifactOptions options = DownloadDedupManifestArtifactOptions.CreateWithManifestId(
manifestId,
targetDirectory,
pathSegments[0], // TODO: is this the right format here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right.

@fadnavistanmay
Copy link
Collaborator

Thanks for the contribution Ethan. LGTM - @b-barthel - do you mind being the second pair of eyes for this. Thanks

@ethanis
Copy link
Member Author

ethanis commented Apr 6, 2020

@fadnavistanmay / @b-barthel what are the next steps for this PR? My main concern is still with backwards compatibility with existing caches..

@GoMino
Copy link

GoMino commented May 2, 2020

Edit: just realized that this is not released yet 😒
Do you have any idea when this feature will be available?

===============

@ethanis the following doesn't seem to work on Mac agent:

# cache all node_modules
- task: Cache@2
  displayName: Cache
  inputs:
    key: '$(Agent.OS) | npm | package-lock.json'
    path: '**/node_modules,!**/node_modules/**/node_modules'
    cacheHitVar: 'PACKAGES_RESTORED'

Because I am trying the following task in my mono repo (using yarn workspace):

pool:
  vmImage: "macOS 10.14"

variables:
  - name: YARN_CACHED_PATH
    value: '**/node_modules,!**/node_modules/**/node_modules'

 - task: Cache@2
    displayName: Cache Yarn packages
    inputs:
      key: 'yarn | "$(Agent.OS)" | yarn.lock'
      restoreKeys: |
        yarn | "$(Agent.OS)"
        yarn
      path: $(YARN_CACHED_PATH)
           

result in the following error:

Starting: Cache Yarn packages
==============================================================================
Task         : Cache
Description  : Cache files between runs
Version      : 2.0.0
Author       : Microsoft Corporation
Help         : https://aka.ms/pipeline-caching-docs
==============================================================================
Resolving key:
 - yarn      [string]
 - "Darwin"  [string]
 - yarn.lock [file] --> 8F7DDEB929E5F8FE04BAC62DC69D57AC641BCDCB2E6518905F114FC86D237861
Resolved to: yarn|"Darwin"|qocY/6XZs7snurcY99+jFkYGbc+KutavQiN7/4Owleo=
Information, ApplicationInsightsTelemetrySender will correlate events with X-TFS-Session a0973c5a-71c8-4c16-9ca0-a7bdfeb1a2ac
Information, Getting a pipeline cache artifact with one of the following fingerprints:
Information, Fingerprint: `yarn|"Darwin"|qocY/6XZs7snurcY99+jFkYGbc+KutavQiN7/4Owleo=`
Information, There is a cache miss.
tar: could not chdir to '/Users/runner/runners/2.166.4/work/1/s/**/node_modules,!**/node_modules/**/node_modules'

Information, ApplicationInsightsTelemetrySender correlated 1 events with X-TFS-Session a0973c5a-71c8-4c16-9ca0-a7bdfeb1a2ac
##[error]Process returned non-zero exit code: 1
Finishing: Cache Yarn packages

I have also tried using the env variable AZP_CACHING_CONTENT_FORMAT=Files like:

variables:
  - name: YARN_CACHED_PATH
    value: '**/node_modules,!**/node_modules/**/node_modules'
  - name: AZP_CACHING_CONTENT_FORMAT
    value: Files

but in this case it result in the following error:

Starting: Cache Yarn packages
==============================================================================
Task         : Cache
Description  : Cache files between runs
Version      : 2.0.0
Author       : Microsoft Corporation
Help         : https://aka.ms/pipeline-caching-docs
==============================================================================
Resolving key:
 - yarn      [string]
 - "Darwin"  [string]
 - yarn.lock [file] --> 8F7DDEB929E5F8FE04BAC62DC69D57AC641BCDCB2E6518905F114FC86D237861
Resolved to: yarn|"Darwin"|qocY/6XZs7snurcY99+jFkYGbc+KutavQiN7/4Owleo=
Information, ApplicationInsightsTelemetrySender will correlate events with X-TFS-Session 66febed0-4ed9-47b2-942d-b7a8ffe02d37
Information, Getting a pipeline cache artifact with one of the following fingerprints:
Information, Fingerprint: `yarn|"Darwin"|qocY/6XZs7snurcY99+jFkYGbc+KutavQiN7/4Owleo=`
Information, There is a cache miss.
Information, DedupManifestArtifactClient will correlate http requests with X-TFS-Session 66febed0-4ed9-47b2-942d-b7a8ffe02d37
Information, ApplicationInsightsTelemetrySender correlated 3 events with X-TFS-Session 66febed0-4ed9-47b2-942d-b7a8ffe02d37
##[error]The path provided is invalid.
Finishing: Cache Yarn packages

Do you have any idea how to make it work?

@ethanis
Copy link
Member Author

ethanis commented May 2, 2020

@fadnavistanmay: do you have any rough estimates on timeline for getting this PR completed for @GoMino?

@covertbert
Copy link

Any further updates on this PR?

@aminya
Copy link

aminya commented Aug 22, 2020

This seems to be an interesting PR. Does this have any performance benefits compared to using separate tasks?

@DeeDeeG
Copy link

DeeDeeG commented Aug 23, 2020

Hi folks.

I strongly support adding this feature. This is definitely a huge feature missing compared to the unofficial/V1 cache task.

Without a feature like this PR is adding, I prefer the old cache task. Having to cache three different node_modules directories separately, based on all three of their package.json and package-lock.json files, either requires a huge key value repeated N nuber of time (in our case, three times), or can lead to mis-matched node_modules folders in different areas of the project, restored from different states of the repo, if not done carefully.

Our key value could be much more compact with globbing. And restoring all three node_modules dirs based on the same key with no code repetition would be the icing on the cake. I stress that this was all available, and as far as I could tell was the primary use pattern, of the unofficial/V1 cache task.

Would be great to have that again here on the official/V2 task.

Thanks to the author for posting, and to the maintainers for considering this PR.

@fadnavistanmay
Copy link
Collaborator

@ethanis / @mjroghelia - Let's get this PR in.

@DeeDeeG
Copy link

DeeDeeG commented Oct 12, 2020

Hi again, folks. Any update on this? Thanks.

@mjroghelia
Copy link
Contributor

@fadnavistanmay sorry, I missed your earlier approval of this.

@ethanis If you rebase this on master and remove the TODO comments with questions that were addressed in @fadnavistanmay's earlier review I will merge it.

@ethanis
Copy link
Member Author

ethanis commented Oct 16, 2020

@mjroghelia @fadnavistanmay so excited to get this shipped!

@ethanis
Copy link
Member Author

ethanis commented Oct 16, 2020

Not sure what's going on with this failing Functional Test - it's coming from a project not touched by this PR..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants