Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip Task: Enable filtering #5169

Closed
IvanLieckens opened this issue Mar 10, 2020 · 3 comments · Fixed by #6018
Closed

Unzip Task: Enable filtering #5169

IvanLieckens opened this issue Mar 10, 2020 · 3 comments · Fixed by #6018
Labels
help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. needs-design Requires discussion with the dev team before attempting a fix. triaged

Comments

@IvanLieckens
Copy link
Contributor

Desired functionality

Project file

<Project>
  <Target Name="Build">
    <Unzip SourceFiles="MyZipFile.zip" DestinationFolder="path\to\unzip\to" IncludeEntries="regex/for/path/inside/zip/.*$" ExcludeEntries="regex/for/path/inside/zip/exclusion/.*$"/>
  </Target>
</Project>

Directory contents:

/
- MyZipFile.zip

MyZipFile.zip contents:

/
- root.txt
/regex/for/path/inside/zip
- included.txt
/regex/for/path/inside/zip/exclusion
- excluded.txt

Expected behavior

The MyZipFile.zip is unzipped to the desired location, unzipping only entries that match up with inclusion (if present) and are not excluded.

In the example root.txt is not unzipped because it's not included and excluded.txt is not unzipped because it's excluded.

Resulting Directory contents:

/
- MyZipFile.zip
/path/to/unzip/to/regex/for/path/inside/zip
- included.txt

Actual behavior

No filtering of Unzip is possible at this time.

Environment data

msbuild /version output:
16.4.0.56107
OS info:
Windows 10 Enterprise
If applicable, version of the tool that invokes MSBuild (Visual Studio, dotnet CLI, etc):
/

@rainersigwald rainersigwald added needs-design Requires discussion with the dev team before attempting a fix. help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. labels Mar 16, 2020
@rainersigwald
Copy link
Member

Team triage: this is an interesting idea. We would potentially accept a PR that did this, but we'd like to first see a rough design about the filter mechanism, including whether it's easy to implement with the zip APIs we use, or if there's an easier one to implement.

@IvanLieckens
Copy link
Contributor Author

IvanLieckens commented Mar 23, 2020

@rainersigwald I'm sorry it's not a PR but what I did for now to quickly have this working for my own build is the following custom task, it only adds a few small modifications to the existing one (and some because I couldn't access the internal classes being used in some places):

using System;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Resources;
using System.Text.RegularExpressions;
using System.Threading;

using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

using Tasks.Properties;

namespace Tasks
{
    public class FilteredUnzip : Task, ICancelableTask
    {
        // We pick a value that is the largest multiple of 4096 that is still smaller than the large object heap threshold (85K).
        // The CopyTo/CopyToAsync buffer is short-lived and is likely to be collected at Gen0, and it offers a significant
        // improvement in Copy performance.
        private const int _DefaultCopyBufferSize = 81920;

        /// <summary>
        /// Stores a <see cref="CancellationTokenSource"/> used for cancellation.
        /// </summary>
        private readonly CancellationTokenSource _cancellationToken = new CancellationTokenSource();

        public FilteredUnzip()
        {
            Log.TaskResources = Resources.ResourceManager;
        }

        /// <summary>
        /// Gets or sets a <see cref="ITaskItem"/> with a destination folder path to unzip the files to.
        /// </summary>
        [Required]
        public ITaskItem DestinationFolder { get; set; }

        /// <summary>
        /// Gets or sets a value indicating whether read-only files should be overwritten.
        /// </summary>
        public bool OverwriteReadOnlyFiles { get; set; }

        /// <summary>
        /// Gets or sets a value indicating whether files should be skipped if the destination is unchanged.
        /// </summary>
        public bool SkipUnchangedFiles { get; set; } = true;

        /// <summary>
        /// Gets or sets an array of <see cref="ITaskItem"/> objects containing the paths to .zip archive files to unzip.
        /// </summary>
        [Required]
        public ITaskItem[] SourceFiles { get; set; }

        /// <summary>
        /// Gets or sets a regular expression that will be used to include files to be unzipped.
        /// </summary>
        public string Include { get; set; }

        /// <summary>
        /// Gets or sets a regular expression that will be used to exclude files to be unzipped.
        /// </summary>
        public string Exclude { get; set; }

        /// <inheritdoc cref="ICancelableTask.Cancel"/>
        public void Cancel()
        {
            _cancellationToken.Cancel();
        }

        /// <inheritdoc cref="Task.Execute"/>
        public override bool Execute()
        {
            DirectoryInfo destinationDirectory;
            try
            {
                destinationDirectory = Directory.CreateDirectory(DestinationFolder.ItemSpec);
            }
            catch (Exception e)
            {
                Log.LogErrorWithCodeFromResources("Unzip.ErrorCouldNotCreateDestinationDirectory", DestinationFolder.ItemSpec, e.Message);

                return false;
            }

            BuildEngine3.Yield();

            try
            {
                foreach (ITaskItem sourceFile in SourceFiles.TakeWhile(i => !_cancellationToken.IsCancellationRequested))
                {
                    if (!File.Exists(sourceFile.ItemSpec))
                    {
                        Log.LogErrorWithCodeFromResources("Unzip.ErrorFileDoesNotExist", sourceFile.ItemSpec);
                        continue;
                    }

                    try
                    {
                        using (FileStream stream = new FileStream(sourceFile.ItemSpec, FileMode.Open, FileAccess.Read, FileShare.Read, 0x1000, false))
                        {
                            using (ZipArchive zipArchive = new ZipArchive(stream, ZipArchiveMode.Read, false))
                            {
                                try
                                {
                                    Extract(zipArchive, destinationDirectory);
                                }
                                catch (Exception e)
                                {
                                    // Unhandled exception in Extract() is a bug!
                                    Log.LogErrorFromException(e, true);
                                    return false;
                                }
                            }
                        }
                    }
                    catch (OperationCanceledException)
                    {
                        break;
                    }
                    catch (Exception e)
                    {
                        // Should only be thrown if the archive could not be opened (Access denied, corrupt file, etc)
                        Log.LogErrorWithCodeFromResources("Unzip.ErrorCouldNotOpenFile", sourceFile.ItemSpec, e.Message);
                    }
                }
            }
            finally
            {
                BuildEngine3.Reacquire();
            }

            return !_cancellationToken.IsCancellationRequested && !Log.HasLoggedErrors;
        }

        /// <summary>
        /// Extracts all files to the specified directory.
        /// </summary>
        /// <param name="sourceArchive">The <see cref="ZipArchive"/> containing the files to extract.</param>
        /// <param name="destinationDirectory">The <see cref="DirectoryInfo"/> to extract files to.</param>
        private void Extract(ZipArchive sourceArchive, DirectoryInfo destinationDirectory)
        {
            foreach (ZipArchiveEntry zipArchiveEntry in sourceArchive.Entries.TakeWhile(i => !_cancellationToken.IsCancellationRequested))
            {
                FileInfo destinationPath = new FileInfo(Path.Combine(destinationDirectory.FullName, zipArchiveEntry.FullName));

                // Zip archives can have directory entries listed explicitly.
                // If this entry is a directory we should create it and move to the next entry.
                if (Path.GetFileName(destinationPath.FullName).Length == 0)
                {
                    // The entry is a directory
                    Directory.CreateDirectory(destinationPath.FullName);
                    continue;
                }

                if (!destinationPath.FullName.StartsWith(destinationDirectory.FullName, StringComparison.OrdinalIgnoreCase))
                {
                    // ExtractToDirectory() throws an IOException for this but since we're extracting one file at a time
                    // for logging and cancellation, we need to check for it ourselves.
                    Log.LogErrorFromResources("Unzip.ErrorExtractingResultsInFilesOutsideDestination", destinationPath.FullName, destinationDirectory.FullName);
                    continue;
                }

                if (ShouldSkipEntry(zipArchiveEntry, destinationPath))
                {
                    Log.LogMessageFromResources(MessageImportance.Low, "Unzip.DidNotUnzipBecauseOfFileMatch", zipArchiveEntry.FullName, destinationPath.FullName, nameof(SkipUnchangedFiles), "true");
                    continue;
                }

                try
                {
                    destinationPath.Directory?.Create();
                }
                catch (Exception e)
                {
                    Log.LogErrorWithCodeFromResources("Unzip.ErrorCouldNotCreateDestinationDirectory", destinationPath.DirectoryName, e.Message);
                    continue;
                }

                if (OverwriteReadOnlyFiles && destinationPath.Exists && destinationPath.IsReadOnly)
                {
                    try
                    {
                        destinationPath.IsReadOnly = false;
                    }
                    catch (Exception e)
                    {
                        Log.LogErrorWithCodeFromResources("Unzip.ErrorCouldNotMakeFileWriteable", zipArchiveEntry.FullName, destinationPath.FullName, e.Message);
                        continue;
                    }
                }

                try
                {
                    Log.LogMessageFromResources(MessageImportance.Normal, "Unzip.FileComment", zipArchiveEntry.FullName, destinationPath.FullName);

                    using (Stream destination = File.Open(destinationPath.FullName, FileMode.Create, FileAccess.Write, FileShare.None))
                    using (Stream stream = zipArchiveEntry.Open())
                    {
                        stream.CopyToAsync(destination, _DefaultCopyBufferSize, _cancellationToken.Token)
                            .ConfigureAwait(false)
                            .GetAwaiter()
                            .GetResult();
                    }

                    destinationPath.LastWriteTimeUtc = zipArchiveEntry.LastWriteTime.UtcDateTime;
                }
                catch (IOException e)
                {
                    Log.LogErrorWithCodeFromResources("Unzip.ErrorCouldNotExtractFile", zipArchiveEntry.FullName, destinationPath.FullName, e.Message);
                }
            }
        }

        /// <summary>
        /// Determines whether or not a file should be skipped when unzipping.
        /// </summary>
        /// <param name="zipArchiveEntry">The <see cref="ZipArchiveEntry"/> object containing information about the file in the zip archive.</param>
        /// <param name="fileInfo">A <see cref="FileInfo"/> object containing information about the destination file.</param>
        /// <returns><code>true</code> if the file should be skipped, otherwise <code>false</code>.</returns>
        private bool ShouldSkipEntry(ZipArchiveEntry zipArchiveEntry, FileInfo fileInfo)
        {
            bool result = SkipUnchangedFiles && fileInfo.Exists
                                             && zipArchiveEntry.LastWriteTime == fileInfo.LastWriteTimeUtc
                                             && zipArchiveEntry.Length == fileInfo.Length;

            if (!string.IsNullOrWhiteSpace(Include))
            {
                result |= !Regex.IsMatch(zipArchiveEntry.FullName, Include);
            }

            if (!string.IsNullOrWhiteSpace(Exclude))
            {
                result |= Regex.IsMatch(zipArchiveEntry.FullName, Exclude);
            }

            return result;
        }
    }
}

@IvanLieckens
Copy link
Contributor Author

There's some difference between the PR code and the code I originally posted here. By using that custom task I found some flaws in the original code found here which have been resolved in the PR. For 1 it doesn't fail with "PathTooLong" in case you exclude the archive entry that would cause this by moving the validation to the first position. Secondly it adds its own message making logs clearer as to why a certain file wasn't unzipped. Any and all feedback is very welcome.

Forgind pushed a commit that referenced this issue Feb 6, 2021
Added Include/Exclude filtering capability to Unzip Task via globs
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim. needs-design Requires discussion with the dev team before attempting a fix. triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants