Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Livecheck: Replace OpenURI#open with Curl #10834

Merged

Conversation

samford
Copy link
Member

@samford samford commented Mar 11, 2021

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This is a recreation of #9535, as that PR somehow lost its association with the related branch in my fork and the PR stopped reflecting new pushes.

This migrates brew livecheck from open-uri to Utils::Curl, as Curl is used throughout brew for network requests and livecheck is the only place where open-uri is used (outside of a vendored gem).

Notable Changes

  • Replaces URI.parse(url).open in Livecheck::Strategy#page_content with #curl_with_workarounds. Strategies use #page_content to fetch a page, so this effectively replaces open-uri in livecheck with Utils::Curl.

@BrewTestBot
Copy link
Member

Review period will end on 2021-03-12 at 17:04:38 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label Mar 11, 2021
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to separate the curl refactoring from the download_strategy changes from the livecheck changes so this is 2-3 PRs instead of 1. Otherwise: looking good so far!

Library/Homebrew/download_strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
@Bo98
Copy link
Member

Bo98 commented Mar 12, 2021

I managed to reproduce a hang with this backtrace:

	from /usr/local/Homebrew/Library/Homebrew/brew.rb:122:in `<main>'
	from /usr/local/Homebrew/Library/Homebrew/dev-cmd/livecheck.rb:114:in `livecheck'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:122:in `run_checks'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:122:in `with_index'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:122:in `map'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:165:in `block in run_checks'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:451:in `latest_version'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:451:in `each_with_index'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:451:in `each'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/livecheck.rb:506:in `block in latest_version'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/strategy/gnu.rb:71:in `find_versions'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/strategy/page_match.rb:93:in `find_versions'
	from /usr/local/Homebrew/Library/Homebrew/livecheck/strategy.rb:141:in `page_content'
	from /usr/local/Homebrew/Library/Homebrew/utils/curl.rb:70:in `curl_with_workarounds'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:26:in `system_command'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:38:in `run'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:51:in `run!'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:182:in `each_output_line'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:200:in `each_line_from'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:200:in `loop'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:203:in `block in each_line_from'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:203:in `reject'
	from /usr/local/Homebrew/Library/Homebrew/system_command.rb:203:in `eof?'

@MikeMcQuaid
Copy link
Member

That eof? check should have a timeout (if it doesn't already).

@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label Mar 12, 2021
@BrewTestBot
Copy link
Member

Review period ended.

BrewTestBot
BrewTestBot previously approved these changes Mar 12, 2021
@reitermarkus
Copy link
Member

I have that specific timeout for eof? in an old WIP branch here: reitermarkus@65ee0d9#diff-fca3f36bc892d787553c323ec2965a2f0dd25c8d8acf1eae53fcdd236c090117L77-L197

@samford
Copy link
Member Author

samford commented Mar 15, 2021

Would be good to separate the curl refactoring from the download_strategy changes from the livecheck changes so this is 2-3 PRs instead of 1.

Makes sense to me. It's possible to create a PR for the additional curl methods and then a follow-up PR for the changes that use these methods in DownloadStrategy and Curl. However, the curl refactoring and download_strategy changes are kind of intertwined, so it may make sense to handle that in one separate PR rather than two. I'll wait to split these until we've worked through any feedback here and decided on the shape the other PR(s) should take.

I have that specific timeout for eof? in an old WIP branch

@reitermarkus Do you have time to separate the related SystemCommand and Utils::Curl changes into a PR that we could merge before this? That would allow me to add the timeout parameter to the curl methods in Livecheck::Strategy and prevent the hang from becoming an issue when this is merged.

@MikeMcQuaid
Copy link
Member

However, the curl refactoring and download_strategy changes are kind of intertwined, so it may make sense to handle that in one separate PR rather than two.

I think it would be good to do any refactoring in a dedicated PR that should result in no changes being required to callers of the refactored code.

@MikeMcQuaid
Copy link
Member

@samford Anything blocking here?

@Bo98 Bo98 mentioned this pull request Apr 1, 2021
7 tasks
@Bo98
Copy link
Member

Bo98 commented Apr 1, 2021

It seems to me that the timeout solution proposed is targeted atIO.select. The hang I noticed was not IO.select. It is eof? itself - eof? is a blocking call.

We however don't need to call eof? as far as I can see. We can catch EOFError and act based on that. I've opened #10990 to propose doing this.

@samford
Copy link
Member Author

samford commented Apr 1, 2021

I'm working on wrapping up some refactoring of the Curl methods, making them a bit more general purpose so they can be used wherever we need to parse curl output (e.g., splitting head(s) and body, checking headers, etc.). That's pretty much done but I need to give it another pass before creating a PR. [I'll create a separate PR for the various changes that make use of these methods in Curl, DownloadStrategy, etc.]

This PR depends on the aforementioned Curl methods that work with the output, so that would need to be merged first. We also need to have definitively resolved any hangs before this can be merged. Markus's timeout work was merged but has been reverted and is being reintroduced in #10924.

The timeout solution seemed promising at first but I've been testing this more [with it based on #10924] and it has been hanging again but in a different manner. When this was hanging before, curl had finished/exited and the hang was happening in SystemCommand (i.e., the eof? issue). In recent testing, this will hang with the curl process still running in the background. This new hang also goes on indefinitely but livecheck will continue to the next formula/cask if you kill the curl process, which is notably different.

I'll rebase this on #10990 and test it as well, to see whether it also encounters this new hang. I'll try to get the Curl methods PR up sometime in the next few days and I'll rebase this PR on it, so everyone can work with the same code I have locally.

@Bo98
Copy link
Member

Bo98 commented Apr 3, 2021

The only other issue I'm aware of is one where the pipe outlives the process, though it would be odd if that affected curl.

This is seen with Homebrew/homebrew-cask#32364. I do know how to fix it: instead of waiting on the pipe to close we would wait on the process to close and stop processing after that. This does require pushing the pipe handling code into its own thread but I do have it working locally and can confirm it fixes the issue in Homebrew/homebrew-cask#32364.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label Apr 25, 2021
@samford samford force-pushed the livecheck-replace-open-uri-with-curl branch from b3a0466 to 4ea850f Compare April 26, 2021 16:49
BrewTestBot
BrewTestBot previously approved these changes Apr 26, 2021
@samford samford removed the stale No recent activity label Apr 26, 2021
@samford samford force-pushed the livecheck-replace-open-uri-with-curl branch from 4ea850f to 7fe8028 Compare May 4, 2021 19:28
BrewTestBot
BrewTestBot previously approved these changes May 4, 2021
@samford
Copy link
Member Author

samford commented May 4, 2021

To help this move forward to being merged (with respect to #11252 (comment)), I've replaced usage of the methods from #11252 with the previous code that duplicates the existing logic for separating the heads/body from curl output in Curl#curl_http_content_headers_and_checksum.

The duplicated code works for now and will be replaced in #11252, so I don't think we should fret over the while loop here and we can debate the shape of it in the aforementioned PR where the curl methods are implemented. Though I'm generally not a fan of while loops either, this one hasn't been a problem in months worth of regular testing and I haven't found a clear alternative that didn't introduce bugs.

I've continued to test this regularly and I haven't seen an unresolved hang after Markus and Bo's work was merged. Instead of hanging, we receive a Timeout::Error, as expected, and livecheck keeps going. The changes in this PR are comparatively minimal at this point and I believe it's ready for some final review before being merged.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Almost there.

Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label May 27, 2021
@samford samford force-pushed the livecheck-replace-open-uri-with-curl branch from 7fe8028 to c087d48 Compare May 27, 2021 17:40
BrewTestBot
BrewTestBot previously approved these changes May 27, 2021
@samford samford removed the stale No recent activity label May 27, 2021
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this. Almost there!

Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
max_iterations = 5
iterations = 0
output = output.lstrip
while output.match?(%r{\AHTTP/[\d.]+ \d+}) && output.include?("\r\n\r\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "\r\n\r\n"? Could that get a comment (perhaps with a variable/constant extraction, too).

Copy link
Member Author

@samford samford Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double CRLF (\r\n\r\n) is used to separate response head(s) and body, whereas a single CRLF (\r\n) is used to separate headers in a given head (i.e., the status line and each header line from one response). For example, if the output involves multiple redirections, you'll have multiple head responses with each separated by double CRLF. Regardless of the number of heads, the final head is also separated from the body by a double CRLF.

The body content can also contain \r\n\r\n, so this is why I've taken the approach of using lstrip and checking the start of the text to ensure it's an HTTP status line.

That said, it's technically possible for a server to use \n\n (instead of \r\n\r\n) as the separator. It's very rare but I've read that some folks have encountered this in practice (if Stack Overflow answers/comments are any indicator).

Properly separating the head(s) and body under a variety of conditions is a challenge and we only have to do this because we're working with the output from the curl application. Is there any particular reason why we do this instead of using a gem that wraps libcurl (e.g., Typhoeus/Ethon or Curb)?

libcurl can separate the head(s) from the body for you, so we wouldn't have to worry about parsing the CLI output and hoping a weird server doesn't break it someday. This would simplify usage and allow us to avoid the additional methods in #11252.

If that's something we're interested in, I would be more than happy to work on it (in a follow-up PR, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properly separating the head(s) and body under a variety of conditions is a challenge and we only have to do this because we're working with the output from the curl application. Is there any particular reason why we do this instead of using a gem that wraps libcurl (e.g., Typhoeus/Ethon or Curb)?

  1. consistency with formulae downloads (which have always shelled out)
  2. avoiding the need to have gems that require compilation for end users

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm content to shelve the libcurl idea for now (so many other things to work on) but just for the sake of clarification: by "gems that require compilation for end users" do you mean building native extensions or something else? Typhoeus/Ethon don't have to build native extensions but Curb does, if that matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's interesting. Yes, it might be interesting to explore those gems if they provided benefits to us, then.

Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
BrewTestBot
BrewTestBot previously approved these changes Jun 4, 2021
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here, super readable now. Thanks @samford!

@MikeMcQuaid MikeMcQuaid merged commit f1f7dc1 into Homebrew:master Jun 7, 2021
@samford samford deleted the livecheck-replace-open-uri-with-curl branch June 7, 2021 14:16
@samford
Copy link
Member Author

samford commented Jun 7, 2021

@MikeMcQuaid Thanks for all your feedback on this (and patience as this progressed over time).

Also big thanks to Markus and Bo for resolving the persistent hang that was holding this up for some time.

@MikeMcQuaid
Copy link
Member

@samford you're very welcome. team work, dream work, etc. 😁

@github-actions github-actions bot added the outdated PR was locked due to age label Jul 9, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants