-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLOW OR FAILED (500 ERROR) NODE.JS DOWNLOADS #4495
Comments
For some reason, iojs.org redirected to nodejs.org, causing errors with |
I wonder if the site/Cloudflare are just getting overwhelmed with traffic for people downloading the latest release? Node v16.14.1 was released about an hour ago... #4494 / https://github.com/nodejs/node/releases/tag/v16.14.1 It's the latest LTS release, so everybody in the world looking for the default version of Node is downloading uncached right now, basically. (Other than as apparently cached by Cloudflare). Should settle down in a few hours, or by tomorrow?? Note: I have nothing to do with Node's web servers or site, I'm just a regular person speculating at this point. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Closing as the issue isn't happening right now, but also pinging @nodejs/build in case there's anything to add here or something to be aware of. |
People are still reporting this over in the OpenJS Slack, so I'm going to re-open this until that gets sorted. |
I have CloudFlare LB error notices spanning from ~2:40 am UTC to ~3:40 am UTC, which I suppose correlates with the latest 16.x release. Overloaded primary server, switching to the backup. Users may have got the 500 while hitting the primary before it switched. I don't have a good explanation beyond that, I'm not sure what's caused the overload, I don't believe that server actually has trouble serving, but we have witnessed some weird I/O issues connected with rsync that maybe all happened at the same time. Perhaps during the next release someone should be on the server watching for weirdness. |
This seems to be happening again; https://iojs.org/dist/index.json is 500ing, and v12.22.11 just went out. (after about 5-10 minutes, the 500s seemed to stop, but it's still worth looking into) |
FWIW I am still seeing either "500 internal server error" or very slow file downloads (less than 50 kilobytes a second. Often less than 10 kilobytes a second, especially at the beginning of a download.) More details (click to expand):The "slow download" symptom is less obvious for small files, because they complete quickly anyway. I have seen a tarball download start, take a long time (over a minute), and ultimately fail to download mid-way through. I hope that's useful to diagnose the problem. (Results may vary across the globe, since the path through the CDN is probably not identical everywhere?) Edit to add: My experience is basically identical (as described in this comment) for nodejs.org/dist/ and iojs.org/dist. And identical experience today compared to 2 days ago. |
After promoting 12.22.11 I went to rerun the same release script to promote 14.19.1 and it just hung during the bit when it fetches the promotable builds. Same behaviour from another machine on a different network. Weirdly I was able to ssh into the machine in an interactive session and manually run the command the promotion script was trying to run 😕.
I logged into the machine, ran
The first of those, running as root, is from the backup server -- I've left that alone. I killed the second one, which is the coverage data, and suddenly my other terminal running the promotion script got unstuck. This would kind of suggest the coverage data syncing is partly, or wholly, responsible. In the past either @mhdawson or I would do an annoying manual clear up of old coverage data to reduce the volume of stuff being synced but I'm going to recommend now we turn off running coverage on Jenkins and the associated data shuffling to populate coverage.nodejs.org and switch exclusively to codecov.io. Actually as I typed the above paragraph my promotion execution has broken 😢 so there's still something up with the machine:
|
Currently there are no rsync processes -- but running the promotion script is "hung" again for me 😞.
|
I think I'm going to bounce nginx. |
Happening same here. Know this type of comments are not useful in most cases, but in this case at least serves to know that there are users still being affected 😬 Edit: worked now for me |
I did |
This is being reported again, probably due to an increase in traffic from the Node.js 12 that went out earlier. Grafana for the main server (DO):
|
No idea if this is related - but wanted to report it here just in case. I've found the download server has been incredibly slow for the past 30 mins or so, with some requests hanging and never completing. I spotted this during some
Also seen in https://ci.nodejs.org/job/citgm-smoker-nobuild/1190/nodes=rhel8-s390x/console (I hope the I also tried locally and was getting 20-30 KB/s 🐢 . After ~30 mins or so new requests started completing as normal. I did have to abort the jobs in Jenkins to stop them hanging. |
I have a flurry of notifications from CF around 6 hours ago about timeouts. The interesting thing this time is that both servers are getting timeout notifications. I'm not sure what that means. We should probably do a test to push all traffic to the backup server and invalidate cache to see what sort of load it gets and whether it can handle it. It'd be surprising to me if it can't. This might be deeper, or could be a CF issue. |
And having just written that, it's flapping again right now. I'm on the main server and it's pretty sluggish, load <3 though. What I can see happening in parallel is an I've reniced the Another thing I'm noticing as I tail the logs is that the Cloudflare-Traffic-Manager is getting really chatty. Each of their edge locations is pinging both servers to check their health, but that means that as Cloudflare expands their network, and we maintain a fairly frequent health check, that even at idle they're serving quite a bit of traffic. Nowhere near their capacity, it just means that background traffic is quite high, and then cache invalidation shoots it right up. So, I'm going to try something, currently our health check is: every 30 seconds, timeout after 4 seconds, retry 3 times. I'm bumping it to: every 60 seconds, timeout after 8 seconds, retry 2 times. Anyone got a better suggestion? |
I seem to have run into this a few minutes ago. It's working again now. |
Same; it was happening a bunch for me within the last hour. |
unable to get to any of the documentation now |
Seems to be resolved now, if having a comment like this with a timestamp helps anyone look through relevant logs or whatever. |
@Trott a few links are still 500ing for me |
It's interesting we did not get reports after 18.x was launched. I'd expect if it was driven by downloads we might have seen something yesterday. |
It definitely seems more driven by the build/release process itself than by downloads later. |
Problem still exists, downloading is too slow:
|
Hi. I'm getting 500s on https://nodejs.org/dist/v16.13.2/node-v16.13.2-win-x64.7z |
We're seeing 500s on https://nodejs.org/dist/v18.17.1/node-v18.17.1-darwin-arm64.tar.gz |
Seen on an Azure DevOps agent:
|
gyp ERR! stack Error: 500 status code downloading checksum |
Please reopen as it still happens multiple times a day! |
Same, still happening |
we were facing this as well for quite some time. only solution to mitigate this is to use cached nodejs that ships with container :-/ not the best thing in the world, but it is what it is microsoft/fluentui#29552 |
Have been facing this intermittently too: Downloading: https://nodejs.org/dist/v16.16.0/node-v16.16.0-win-x64.7z |
and happening again :( Downloading: https://nodejs.org/dist/v16.16.0/node-v16.16.0-darwin-x64.tar.gz |
This was happening continually today at https://nodejs.org/dist/v20.8.1/node-v20.8.1-darwin-arm64.tar.gz Any update on why this is closed? |
This is closed because it's a known issue, and we're working on it. |
I've been getting this on GitHub Actions (
|
I think it would better communicate your intent if you closed this issue after you have solved it. |
Edited by the Node.js Website Team
Learn more about this incident at https://nodejs.org/en/blog/announcements/node-js-march-17-incident
tl;dr
: The Node.js website team is aware of ongoing issues with intermittent download instability.More Details: nodejs/build#1993 (comment)
Original Issue Below
When trying to get files off of
nodejs.org/dist/...
ornodejs.org/download/...
, I get a server error.(error page served by nginx)
Full error message page (HTML snippet, click to expand)
Browsing around the dirs, like https://nodejs.org/dist/latest-v16.x/, seems to work. Also, Downloading really small files such as https://nodejs.org/dist/latest-v16.x/SHASUMS256.txt seems to work sporadically, whereas downloading tarballs doesn't seem to work.
Given that the outage seems sporadic: Maybe it's a resource exhaustion issue over at the server? Running out of RAM or something?? I don't know.
Edit to add: The error message page seems to be served by Cloudflare. (According to theActually that's probably not what that means.server: cloudflare
response header, when looking in browser dev tools). So I guess this is a Cloudflare issue?The text was updated successfully, but these errors were encountered: