Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use http1.1 #2177

Closed
wants to merge 1 commit into from
Closed

use http1.1 #2177

wants to merge 1 commit into from

Conversation

dradetsky
Copy link

I attempted to add https://perell.com/feed/ to my feedlist. Attempted to sync this feed produced

Error while retrieving https://perell.com/feed/: Stream error in the HTTP/2 framing layer

Since newsboat uses libcurl, I checked with curl (7.85.0) and sure enough:

1.292s ~/c/t/r/r/newsboat 11:51 $ curl https://perell.com/feed/
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

Some poking around curl's issues suggests that this is likely due to an issue with the server not implementing http2 correctly. In any case, even if it isn't, fixing curl is clearly outside the scope of newsboat. It's possible that another option change could fix this, but I don't know what.

When we check curl's debug output, we see

* Connected to perell.com (199.16.172.213) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1

In other words, the server claims to implement http2, but apparently does not (according to curl). I can open the url in a browser, and my browser reports that it's using h2, but that doesn't mean much; browsers probably do all kinds of sloppy protocol implementation handling.

Anyhow, I assume that since we don't currently specify a http version for libcurl, it just uses the highest version advertised. So it tries http2 and fails. AFAIK, curl doesn't have any kind of try-http2-and-if-that-fails-try-http1.1-etc option. newsboat could implement this kind of fallback logic itself, but this sounds like a lot of work. I'm not going to do it.

I could try to tell this Perell character to fix his web server, but I assume that this issue will occur elsewhere. Enough people seem to think RSS is obsolete that they may not be willing to devote a lot of effort to fixing an issue they don't really understand or may not control (in the case of hosted blogs).

So we need some way to force http1.1. This could be some kind of per-feed config, or just a blanket change for everything like I've done here. Which would probably be better but also more work and would require design decisions and I have no idea what a good design would be.

To the best of my knowledge, http2 is basically a browser-specific optimization. Like, it's better when you want to fetch 20 different documents from the same server (e.g. 19 css/js files and the html doc), but doesn't really make a difference for the fetch-one-rss-feed case. But I could be wrong about this.

Just as I was finishing this up, I identified another 11 urls in my feed that were experiencing this issue. So it's definitely not this one server. They mostly appear to originate from wordpress.com, although perell.com is coming from pressable. I believe in all cases the server identifies itself as nginx. In any case, it's sufficiently widespread that just telling people to fix their blogs seems like it won't work.

@dradetsky
Copy link
Author

I'm not sure what the deal with that check failure is, except that it's for gcc5 which is...um...old. Do you even need to support gcc5? In any case, all I see is

g++-5: internal compiler error: Killed (program cc1plus)

which for all I know could just be the OOM killer or something like that. I'll see if I can get gcc5 on my machine & replicate.

FWIW, it seems kind of odd that you would run such a massive test suite in response to a PR from an internet rando (don't people find projects that do this & inject bitcoin miners or something?), but it's your project.

@dradetsky
Copy link
Author

nm, I now see it's a docker thing that failed; reproing

@dradetsky
Copy link
Author

As I expected I couldn't repro it (probably some kind of resource limit like OOM killer; makes sense given the size of the job). I only tried as far as running

  docker run \
      --rm \
      --mount type=bind,source=$(pwd),target=/home/builder/src \
      --user $(id -u):$(id -g) \
      newsboat-build-tools \
      make

though

@expenses
Copy link

expenses commented Sep 8, 2022

I've also encountered this with wordpress blogs (turanszkij/WickedEngine#557).

@Minoru
Copy link
Member

Minoru commented Sep 11, 2022

Sorry for the late response! I'm down with COVID and don't have energy to investigate this fully right now, but I ran a quick check with curl and both https://perell.com/feed/ and https://wickedengine.net are fetched fine. Could this perhaps be dependent on the CURL version, or one of the libraries it uses? I'm running Debian stable and my curl is:

$ curl --version
curl 7.74.0 (x86_64-pc-linux-gnu) libcurl/7.74.0 OpenSSL/1.1.1n zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0 librtmp/2.3
Release-Date: 2020-12-09
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

If this is indeed dependent on the curl version, then a better solution IMHO would be to only downgrade to HTTP/1.1 if Newsboat is linked against an old version. This can be done with an #if that checks LIBCURL_VERSION (or maybe some related constant.

@dradetsky
Copy link
Author

 ~ 12:45 $ uname -a
Linux flap 5.19.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 05 Sep 2022 18:09:09 +0000 x86_64 GNU/Linux
~ 12:45 $ curl --version
curl 7.85.0 (x86_64-pc-linux-gnu) libcurl/7.85.0 OpenSSL/1.1.1q zlib/1.2.12 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.0) libssh2/1.10.0 nghttp2/1.49.0
Release-Date: 2022-08-31
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd
~ 12:46 $ curl https://perell.com/feed/
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

I think there's a decent chance this is dependent on the curl version, although it would be worth checking your curl -v to see if your curl is actually trying to use h2 or not.

@dradetsky
Copy link
Author

Actually, if I run curl from a debian:bullseye container, it does attempt to use h2 and works fine

root@06142438f561:/# curl --version
curl 7.74.0 (x86_64-pc-linux-gnu) libcurl/7.74.0 OpenSSL/1.1.1n zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0 librtmp/2.3
Release-Date: 2020-12-09
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

So it's definitely related to the curl version

@dradetsky
Copy link
Author

Also it works fine on alpine:latest

/ # curl --version
curl 7.80.0 (x86_64-alpine-linux-musl) libcurl/7.80.0 OpenSSL/1.1.1l zlib/1.2.11 brotli/1.0.9 nghttp2/1.46.0
Release-Date: 2021-11-10
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

I'm thinking it must be related to the nghttp2 version, not the curl version, but idk

@dradetsky
Copy link
Author

I also built curl 7.80 from git, which also failed. Since it worked fine from alpine, I assume it must be one of the libraries.

@dradetsky
Copy link
Author

Built curl locally with tip of curl and nghttp2 at v1.46.0 and it works:

0.378s (master) ~/g/t/r/t/curl 13:53 $ ./src/curl --version
curl 7.86.0-DEV (x86_64-pc-linux-gnu) libcurl/7.86.0-DEV GnuTLS/3.7.7 zlib/1.2.12 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.0) nghttp2/1.46.0 libgsasl/2.0.1
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli gsasl HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB PSL SSL threadsafe TLS-SRP UnixSockets zstd

@dradetsky
Copy link
Author

It also works locally with tip of both curl and nghttp2.

Basically I don't think we can fix this with the preprocessor and curl version, since curl is just dynamically loading nghttp2, and newsboat doesn't even know nghttp2 exists.

@dradetsky
Copy link
Author

So it's almost certainly this same issue:

The best solution for me is to switch to using the tip of nghttp2 & curl. It's actually necessary to use the tip of curl, since curl 7.85.0 won't attempt to use the fix introduced in the tip of nghttp2.

@Minoru if you want to fix this by checking the libcurl version, one way might be to check for libcurl version >= 7.86.0 (which doesn't actually exist yet), since that will fix it. I don't think any other libcurl version pin will help, since they can't exclude nghttp2 versions.

@dradetsky
Copy link
Author

@Minoru following up with curl & nghttp2 projects, I have determined that this issue is likely to be resolved in released versions in about 43 days, after which it is unlikely that this issue will occur for any but users of the shittiest package managers. Since I was able to mitigate this myself, and many other users will never hit it at all due to being behind the upstream release versions, I'm pretty sure we don't want to make the change in this PR just to address this specific issue.

However, it might be worth asking whether we want to do it anyway to avoid similar issues in the future. http2 is still under active development, and may well have more new revisions which will then be implemented incorrectly by server developers and lead to more issues like this in the future. RFC 9113, the root of this issue, was published in June. No doubt the same server developers are hard at work misimplementing whatever IETF has published since then. Do we actually gain anything by using http2? Are there any actual servers publishing rss docs which are http2-only? This seems unlikely; afaik http2 only is extremely rare to begin with, and I'm almost positive the overwhelmingly most common way to use http2 at all is via ALPN after initially connecting with http1.1. I'm not 100% sure using http2 without ALPN is possible.

Actually it might be better to change this from the equivalent of passing --http1.1 to the equivalent of passing --no-http2 (i.e. don't attempt to upgrade).

@Minoru
Copy link
Member

Minoru commented Sep 19, 2022

Thanks for digging up all the details @dradetsky! Looks like the only thing left to me here is to decide what to do with all this stuff :)

I agree that HTTP/2 doesn't bring much to the RSS/Atom table. The only thing I can think of is connection coalescing, which probably helps with sites that are hosted on github.io, wordpress.com, blogger.com and similar platforms; reusing a single TLS connection there should bring a noticeable improvements to reload times, especially for the common case of 304 Not Modified.

However, I feel uneasy dropping a protocol just because there could be more bugs in its implementations. I'm inclined to keep it and see if it really becomes a source of problems. We already have an escape hatch in the form of exec:curl … URL (or exec:wget -O- … URL or whatever one wants to use), so if the push comes to shove, the users won't be left completely out in the cold while we're turning HTTP/2 off.

@Minoru Minoru closed this Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants