New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
circleci: Change SDK download host to cdn.openwrt.org #10560
Conversation
This also switches from rsync to curl to download the SDK archive. Fixes openwrt#10358. Signed-off-by: Jeffery To <jeffery.to@gmail.com>
NAK on changing it to CDN as it adds unnecessary complexity, if we want to offload the main site just switch to a mirror instead such as https://ftp.snt.utwente.nl/pub/software/lede/ (there are plenty to choose from). |
ACK. Since we have direct control over the CDN (wrt. cache purges etc.), using it is preferable over 3rd party mirrors. Also in case it ever goes out of service we can simply move the cdn CNAME back to downloads.openwrt.org. |
...except that the CDN seems to be much slower which kinda defeats the point of using it in the first place? Test run on 2 different ISPs (and both peers differently so they're not using the same peering networks)...
|
The point of using it in the first place is to ensure a good average global reachability. Trading a single european pop for another is not going to achieve that. Test run from a Spanish EDIS VPS:
Test run from a DigitalOcean VM in San Francisco:
|
That doesn't seem to be the case? I poked the CDN from over the pond (US and Asia) and it points back to some rather small network(?) called proinity GmbH (AS44239) in Germany which also seems to operate in Zürich and M247 (AS9009) in Sweden. Anyhow, none are very good options in that regard as they're all based in Europe while Travis CI seems to run off GCE (Google Compute Engine) in US (https://docs.travis-ci.com/user/ip-addresses/). |
Regardless of where it points to, I consistently get better throughput there from the machines I do have access to. |
This change is for CircleCI, not Travis (though CircleCI builds appear to happen mostly in the US). The goal of this change isn't for faster download/CI speeds, it's to reduce bandwidth usage on downloads.openwrt.org. |
@jefferyto Worth mentioning is that previous suggestion moving it off the primary download site didn't get accepted back in 2017. |
ACK |
Merging. This is for master. If something breaks we can always revert. |
CI checks have been failing intermittently
There are also successful runs, for example: @aparcar Any idea what's going on? (Unrelated? - there is also a build that has been running for over 60 hours: https://circleci.com/gh/openwrt/packages/6146) |
@jefferyto - I think we need to switch to path versioning to tackle this issue. The CDN right now operates in caching reverse proxy mode which can lead to situations where e.g. the sha256sum, signature and sdk archive files are cached at different points in time, leading to the observed inconsistencies. I suppose moving the snapshot builds from something like |
@jow- I think I understand the caching situation, but the files are updated only once a day, so wouldn't there only be one time window each day where the files can be out-of-sync with each other? Also, the files are requested quickly one after another, so wouldn't it be more likely that all of the files are yesterday's version (but still in sync with each other), rather than some being yesterday's version and others today's version? (Or this has to do with different caching endpoints having different versions?) I don't have a strong opinion on path versioning. (With a HTTP redirect then the CI build doesn't need to figure out what is the latest revision.) |
@jefferyto - my assumption is that due to the anycast nature of the IPs, different request end up on different cache servers with different versions. But maybe the problem is at the source... would be interesting to know if downloads.openwrt.org had the same problem at the same time. |
Something weird is happening... CI builds are failing multiple times for the same package:
Hours have passed between each run. Yet in the middle of this, https://circleci.com/gh/openwrt/packages/6172 succeeded. |
ja-pa's branch is a little behind master and does not have this change. This should probably be reverted untill there's a fix. |
@neheb I think ja-pa's branch does have this: https://github.com/ja-pa/packages/commits/bind-9.14.8/.circleci/config.yml I don't want to block CI checks but it may be hard to figure out what the issue is with this reverted. |
I suppose we can change it back to downloads.openwrt.org to see if the issue continues to exist. |
@champtar Anyhow, just back this out as it's breaking things |
@diizzyy "big" might be a bit strong, but mirrors should only sync every 12 to 24h Looking at headers of downloads.openwrt.org
there is everything for the cdn to be able to do periodic cache revalidation |
Am I missing something or is the "last modified" header verybwrong? I remember setting the CDN to not cache headers and check ask the main server each time, however is that header is wrong it's not really the CDNs fault... |
From the download page (https://downloads.openwrt.org/snapshots/targets/ar71xx/generic/) it does look like the last time it was built was 17 Jun 2019. |
@champtar |
@diizzyy I'm not saying it's critical at all, my point is "if we can have no lag and reduce usage of downloads.openwrt.org why not ?" |
I finally managed to catch a build failure: https://circleci.com/gh/openwrt/packages/6199 The files do appear to be out-of-sync with each other:
From the download page right now:
I wonder if the CDN is invalidating its cache properly? |
This reverts commit 27fdddf due to it causing random failures. Change agreed on here: openwrt#10560 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
@jefferyto I see What config did you use to make sure it always revalidate ? Best solution is @jow- revision in the url, but need more work |
@champtar I didn't set up the CDN - I believe @aparcar knows more about this. Curious how all three requests went to I'm not against adding versioning to the path - I just hope to find the true cause of this issue (maybe it is because of different cache servers with different versions) before jumping to a solution. |
@jefferyto I can see two different IPs .2 and .3 in that log, so probably different machines/cluster? Anyway, thanks a lot for tracking this down! Putting this issues aside, I'm wondering if there is any particular reason for downloading of the SDK every time from scratch. |
I think the idea is to catch breakage as soon as possible and avoid manual intervention (respinning) as much as possible. I don't think you'd save much setting up some kind of bsdiff-ish solution or did you have something else in mind? |
Here are the current settings of keycdn, if anyone has an idea where the problem is, please let me know! I guess the CDN is not really suitable for snapshots due to it's short nature, maybe my idea was bad at the beginning. If we'd be testing release SDK it seems reasonable. @jefferyto my favorite solution would be to use the Docker containers as they are generated anyway. Dockers CDN would handle the bandwidth, therefore same achievement after all. |
You can have any variable you want in the cached dirname, so it works fine with multiple branches and other build keys/variables. One could simply expire the cache after fixed timeout (12-24h for master, could be much higher for releases where you don't introduce breaking changes that often :)) and still be fine in 99.9% of use cases. Or if one really needs fresh, then just download sha256sum every time, check for the checksum change, kill the cache on change and download fresh SDK.
It makes sense to me. With that image there is no need for SDK download step anymore. As a bonus OpenWrt Docker images would then get more visibility and testing (dog fooding is always good), thus likely get more fixes and improvements.
I've mixed feelings about that, maybe it's because of the registry proxy used by the CI platforms? I find it much faster to use local CI registry (don't know if CircleCI has such stuff) then Docker registry. Anyway, this is cloud, so unpredictable behavior every day by the definition... |
I updated the PR #9434 and added a random PR to test it. Seem to work, anyone dare a merge? |
@aparcar maybe set "Expire (in minutes)" to "-1" and "Ignore cache control" to "disable" I just launched a simple script to see how bad the issue is
|
Maintainer: @champtar @thess
Compile tested: N/A
Run tested: N/A
Description:
This also switches from rsync to curl to download the SDK archive.
Fixes #10358.
Signed-off-by: Jeffery To jeffery.to@gmail.com