Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR (Performance): Refactor and replace TempPath with a native cache path #49

Closed
anonhostpi opened this issue Jan 18, 2024 · 15 comments · Fixed by #52
Closed

FR (Performance): Refactor and replace TempPath with a native cache path #49

anonhostpi opened this issue Jan 18, 2024 · 15 comments · Fixed by #52
Assignees
Labels
enhancement New feature or request

Comments

@anonhostpi
Copy link
Contributor

anonhostpi commented Jan 18, 2024

As discussed in #46, deprecating the TempPath may be warranted.

There are 4 approaches considered to this problem:

  1. Don't deprecate or change it
  2. Support 3 caching paths (1 for CachePath, 1 for the Legacy TempPath, 1 for the new native cache path)
  3. Replace the old TempPath, but only comment out its code so that it can be maintained.
  4. Another option is to move the legacy code into this issue instead of commenting it out.
    • This would allow us to have a copy of the legacy code to revert to without a performance impact

Probably going to resort to the latter.

EDIT: Currently planned solution:
- #49 (comment)

@anonhostpi anonhostpi added the enhancement New feature or request label Jan 18, 2024
@anonhostpi anonhostpi self-assigned this Jan 18, 2024
@anonhostpi
Copy link
Contributor Author

Prior Code: https://github.com/pwsh-cs-tools/core/tree/v0.5.4-beta

TempPath generation:

Import-Package.psm1:

Module Initialization Garbage Collector (called once per PowerShell session):

& {
    # Clear the Cache
    Write-Verbose "[Import-Package:Init] Clearing Temp Files..."
    Resolve-Path (Join-Path $PSScriptRoot "Temp" "*") | ForEach-Object -Parallel {
        $id = $_ | Split-Path -Leaf
        [bool] $freed = $false
        $m = New-Object System.Threading.Mutex( $false, "Global\ImportPackage-$id", [ref] $freed )
        If( $freed ){
            Remove-Item $_ -Recurse -ErrorAction Stop
        }
        $m.Dispose()
    } -ThrottleLimit 12 -AsJob | Out-Null
}

Import-Package Function

Parameters block

param(
    # ...
    [string] $TempPath
    # ...
)

TempPath path-string and mutex-lock generation

$temp_path_generated = If( [string]::IsNullOrWhiteSpace( $TempPath ) ){
    $TempPath = & {
        $parent = & {
            Join-Path ($CachePath | Split-Path -Parent) "Temp"
        }
        [string] $uuid = [System.Guid]::NewGuid()

        # Cut dirname in half by compressing the UUID from base16 (hexadecimal) to base36 (alphanumeric)
        $id = & {
            $bigint = [uint128]::Parse( $uuid.ToString().Replace("-",""), 'AllowHexSpecifier')
            $compressed = ""
            
            # Make hex-string more compressed by encoding it in base36 (alphanumeric)
            $chars = "0123456789abcdefghijklmnopqrstuvwxyz"
            While( $bigint -gt 0 ){
                $remainder = $bigint % 36
                $compressed = $chars[$remainder] + $compressed
                $bigint = $bigint/36
            }
            Write-Verbose "[Import-Package:Preparation] UUID $uuid (base16) converted to $compressed (base$( $chars.Length ))"

            $compressed
        }

        $mutexes."$id" = New-Object System.Threading.Mutex($true, "Global\ImportPackage-$id") # Lock the directory from automatic removal

        Join-Path $parent $id

        # Resolve-Path "."
    }
    $true
} Else {
    $false
}

Runtime Garbage Collector (called on every Import-Package call)

If( $temp_path_generated -and (Test-Path $TempPath) ){
    If( Test-Path (Join-Path $TempPath "*") ){
        Write-Verbose "[Import-Package:Loading] Temp files: $TempPath"
    } Else {
        Remove-Item -Path $TempPath -ErrorAction Stop
    }
}

The function also makes calls to $Bootstrapper.LoadNative() which actually handles the resource copying. Use Ctrl+Shift+F in VS Code and See below

@pl4nty
Copy link
Contributor

pl4nty commented Jan 18, 2024

+1, sharing the temp path causes errors if two packages contain the same file eg large depencency trees like Azure.Monitor.OpenTelemetry.Exporter

@anonhostpi
Copy link
Contributor Author

That's something else I will have to look at.

The problem with native files, is that NuGet doesn't define how C# is supposed to load them, just that they are associated with the wrapping package.

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 18, 2024

I really liked how temppath gen worked in my script. I put a lot of work and thought into it. Using cross-process mutex-locks for garbage collection was definitely one of my best pieces of work, considering the size of the gc code.

I am sad to see it go, but it definitely needs to happen.

I wrote a .psm1 gist that separates my temppath creation and gc in case I want to use it in a different module later:

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 18, 2024

Azure.Monitor.OpenTelemetry.Exporter

What error are you getting? I'm getting PackageManagement/Install-Package errors.

Install-Package: Package 'Microsoft.Extensions.DependencyInjection.Abstractions' failed to be installed because: The file 'C:\Users\xxx\AppData\Local\Temp\2stgcgys\packageIcon.png' already exists.

Opened a OneGet issue for my problem

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 18, 2024

Another problem I am going to have to tackle is caching of dependency native dlls.

How TempPath works right now

The purpose of the TempPath was to provide each of the NuGet libraries a central place to deposit their native files. A lot of native files require that their dependencies be in the same root directory (this is a widely adopted practice in native C-based langs)

So switching from a TempPath to a native cache may introduce a new and different file bloat problem.

The Problem with The Solution:

Each dependency in the dependency tree with native files would need to have its own implementation of the new TempPath. This would be a blessing and a problem.

  • CON: A lot of files would be duplicated
    • this would have a negative impact on Import-Package's total file footprint
  • PRO: The files would be pre-extracted, if cached
    • this would have a positive impact on Import-Package's load speed

Possible Solution:

So what I'm thinking I do is:

  • just remove TempPath's garbage collector
  • keep the majority of TempPath logic the same
    • change the naming convention to match that of CachePath
      • rename TempPath to NativePath
      • rename Temp Folder to Natives
      • change default name from a UUID to the same as CachePath
    • add a cache check to look for pre-extracted files when iterating over dependencies
    • only generate a new TempPath for the package at the base of the dependency tree (which is the current legacy behavior)

@anonhostpi
Copy link
Contributor Author

c118580:

Introduced the FullName field on the PackageData object generated by Build-PackageData

This uses the LeafBase of the package's Source property. This should cover these cases:

  • Packages downloaded from NuGet.org and loaded with -Path and NOT renamed
  • Cached SemVer2 packages
  • PackageManagement packages

This will not cover these 2 cases:

  • Package downloaded from NuGet or built locally, and not named using PackageManagement's current naming convention
    • This could potentially be fixed by parsing the nuspec
    • This would present a design challenge, as the Cached files are designed to be handled by Resolve-CachedPackage, but the nuspec is handled later in Build-PackageData
  • Changes to PackageManagement's naming convention will be covered, but will negatively affect cached packages coverage
    • If PackageManagement or NuGet open sources its naming convention and makes it available as a library, it may be possible to import said library and use that in Resolve-CachedPackage

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 19, 2024

Holy moly! The last change has been implemented and the load performance from caching the native files is insane.

Loading Avalonia.Desktop and Microsoft.ClearScript has been brought down from around 10-13 seconds to 2 and a half seconds according to the test script

@anonhostpi
Copy link
Contributor Author

+1, sharing the temp path causes errors if two packages contain the same file eg large depencency trees like Azure.Monitor.OpenTelemetry.Exporter

@pl4nty see #50, #51, and the OneGet Issue

@pl4nty
Copy link
Contributor

pl4nty commented Jan 19, 2024

Thanks, you're right it's in OneGet. For now I'm loading the conflicting packages separately. The load time improvements are very noticeable by the way, well done!

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 19, 2024

For now I'm loading the conflicting packages separately.

Also just introduced a -SkipDependencies switch with v0.6.1, so you can load each library (similarly to -Path), without having to separately download the library.

  • so you can use Import-Package -Name "yada.yada.yada" -Version "v0.0.0" -SkipDependencies
  • instead of Import-Package -Path "/path/to/yada.yada.yada.0.0.0.nupkg"

@pl4nty
Copy link
Contributor

pl4nty commented Jan 19, 2024

Thanks, much appreciated. For this bug I can just load the conflicting package first, but for others I've had to resort to nupkgs

@pl4nty
Copy link
Contributor

pl4nty commented Jan 19, 2024

btw - I used your package to port .NET OpenTelemetry to PowerShell! it was really useful for testing, and figuring out how to call C# extension methods

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 19, 2024

For this bug I can just load the conflicting package first, but for others I've had to resort to nupkgs

It looks like for that problem, you could likely use a newer version of the package. Import-Package's version handling for NuGet version ranges is limited. It currently can't handle unbounded version ranges very well (which is likely why Import-Package is trying to use such an old version of that package). I'm working on that problem with #50

@anonhostpi
Copy link
Contributor Author

anonhostpi commented Jan 21, 2024

btw I started the repo split for New-ThreadController and Import-Package, since issues like #50 require a complete refactor:

Once I can get a release published from the new repo, I will archive this one.

The refactor is mostly targeted at the $bootstrapper variable which makes up a lot of Import-Package's underlying code.

I've got most of the changes I want implemented done, I just have to port the rest of the code to the new $bootstrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants