Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows CI that got silently broken recently #7700

Merged
merged 6 commits into from
May 31, 2024

Conversation

deivid-rodriguez
Copy link
Member

@deivid-rodriguez deivid-rodriguez commented May 30, 2024

What was the end-user or developer problem that led to this PR?

After #7664, Gem::StubSpecification sorting is no longer stable, which causes spec failures under some platforms.

What is your fix for the problem, implemented in this PR?

Restore a stable criteria for sorting specifications, that keeps the current tests green.

Also, make sure Windows can actually run our tests properly and does not give back a green status without running anything.

Make sure the following tasks are checked

@deivid-rodriguez deivid-rodriguez marked this pull request as draft May 30, 2024 12:16
@MSP-Greg
Copy link
Contributor

As mentioned in #7664, the CI running for this PR is generating a false positive for all Windows workflow jobs.

Open the job log for any Windows run and look at the 'Install Dependencies' and 'Run Tests' step logs, and no code is running.

I'll open an issue in the PowerShell repo, as this should be throwing an error...

@deivid-rodriguez
Copy link
Member Author

Nice catch @MSP-Greg, let me cherry-pick your commit in #7701 here so I can see if CI is fixed.

And thanks for also reporting in powershell, it's not good that these are not reported as failures.

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/sort-regression branch from 015b55d to 547acfd Compare May 30, 2024 14:32
@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/sort-regression branch from d6ec367 to 4c3db8c Compare May 30, 2024 14:51
@deivid-rodriguez deivid-rodriguez marked this pull request as ready for review May 30, 2024 14:51
@deivid-rodriguez deivid-rodriguez changed the title Make StubSpecification sorting stable Fix Windows CI that got silently broken recently May 30, 2024
@MSP-Greg
Copy link
Contributor

@deivid-rodriguez

Thanks for your work on this. Glad to see it passing.

The only thing I'm not sure of is whether activation_priority should set to calculate once, as it might be called often when a large number of specs exist, with a lot of created arrays. You probably know better than anyone, and I've never debugged Array#sort calls...

JFYI, Locally, I believe I've got the same Rubies as GHA. The 3.0 and 3.1 rake 'bin' files seem identical, so the issue may RubyGems 3.2.33. If you want me to check (installing rake in the Ruby 3.0.z folder), I can.

Normally, I do everything in 'user-install', as I like to keep my Ruby installs 'clean'. My normal Rubies on Ubuntu & Windows are head/master builds...

@deivid-rodriguez
Copy link
Member Author

I extracted the method to make this sort a bit more clear, but indeed it's a few more arrays. Not sure if I should care or not. I guess I'll make some tests to see if it's noticiable.

Regarding the bin files, I'm not sure what to check. I thought what GHA provides is a pristine Windows Ruby install. In any case, since it's something specific to an old Ruby, and our tests, I would not give it more attention.

@MSP-Greg
Copy link
Contributor

Not sure if I should care or not.

What, me worry? I have no idea how often or when the specs are sorted, so whatever.

In any case, since it's something specific to an old Ruby, and our tests, I would not give it more attention.

Not sure, it might have to do with overwriting the original rake install, but as you said, both the Ruby & RubyGems are old. We've got better things to work on...

Make it not also install the gem in the default GEM_HOME.
It's consistent with the previous test and makes print debugging easier.
@deivid-rodriguez
Copy link
Member Author

On my system with 105 gems, the difference is minimal:

On master

Total allocated: 3.09 MB (23064 objects)
Total retained: 289.20 kB (2834 objects)

With this PR

Total allocated: 3.09 MB (23072 objects)
Total retained: 289.20 kB (2834 objects)

We could back to the previous allocations with this alternative patch

diff --git a/lib/rubygems/specification.rb b/lib/rubygems/specification.rb
index a51a498158..da55a2e6d3 100644
--- a/lib/rubygems/specification.rb
+++ b/lib/rubygems/specification.rb
@@ -828,7 +828,11 @@ def self._resort!(specs) # :nodoc:
     specs.sort! do |a, b|
       names = a.name <=> b.name
       next names if names.nonzero?
-      b.activation_priority <=> a.activation_priority
+      versions = b.version <=> a.version
+      next versions if versions.nonzero?
+      platforms = Gem::Platform.sort_priority(b.platform) <=> Gem::Platform.sort_priority(a.platform)
+      next platforms if platforms.nonzero?
+      b.base_dir == Gem.path.first ? 1 : -1
     end
   end

I'll revert to that alternative, and I'll consider introducing the activation_priority method in the next PR.

@deivid-rodriguez deivid-rodriguez force-pushed the deivid-rodriguez/sort-regression branch from 4c3db8c to a99f09a Compare May 31, 2024 10:46
@deivid-rodriguez
Copy link
Member Author

Merging to get our Windows CI back. Thanks @MSP-Greg!

@simi
Copy link
Member

simi commented May 31, 2024

💪 @MSP-Greg

@deivid-rodriguez deivid-rodriguez merged commit 30e55de into master May 31, 2024
83 checks passed
@deivid-rodriguez deivid-rodriguez deleted the deivid-rodriguez/sort-regression branch May 31, 2024 11:57
@MSP-Greg
Copy link
Contributor

Re the PowerShell behavior that caused this error, it's messy.

GUI OS's often link file 'types' with applications. For instance, in PowerShell, if one types a valid Excel file path, it opens the file with Excel. I don't have a GUI Ubuntu, and I haven't tried similar things on macOS. I think PowerShell may make some changes. One idea is raising an error if the passed file does not have an extension, which was the case here.

Also, I'm suggesting that GHA add a step keyword similar to 'minimum-time', which would fail the step if it took less time than the keyword value. So, if you know your test suite step will take a least 60 seconds to complete, set the 'minimum-time' to 10. If used here, this would have caught the error.

@deivid-rodriguez
Copy link
Member Author

Linking to the PowerShell ticket for visibility: PowerShell/PowerShell#23874. Subscribed!

@MSP-Greg
Copy link
Contributor

The Actions 'minimum-time' feature:

https://github.com/orgs/community/discussions/126651

I may open an issue in https://github.com/actions/runner/ if the discussion leads nowhere

@deivid-rodriguez
Copy link
Member Author

I think the right place to fix this is in powershell, I'm not sure about mininum-time actions feature, particularly if it's just driven by having found a bug somewhere else.

@MSP-Greg
Copy link
Contributor

I'd agree. Conversely, CI false positives are dangerous. This one may have gone on for a bit longer, as it was noticed only because the commits are pushed to ruby/ruby. I noticed it when I saw a ruby-loco failure, and others noticed it in ruby/ruby.

minimum-time should be optional, just like timeout-minutes. If it existed, and was used in the workflows here, the problem would have been identified immediately.

'a bug somewhere else' is messy, in this case we're considering something to be a bug that's related to error handling of a unique situation. PowerShell is a well-used, mature product (like the repos we contribute to), and you can bury yourself handling 'unique situation' errors...

deivid-rodriguez added a commit that referenced this pull request Jun 12, 2024
Fix Windows CI that got silently broken recently

(cherry picked from commit 30e55de)
deivid-rodriguez added a commit that referenced this pull request Jun 12, 2024
Fix Windows CI that got silently broken recently

(cherry picked from commit 30e55de)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants