Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

circleci: Change SDK download host to cdn.openwrt.org #10560

Merged
merged 1 commit into from Nov 18, 2019

Conversation

jefferyto
Copy link
Member

Maintainer: @champtar @thess
Compile tested: N/A
Run tested: N/A

Description:
This also switches from rsync to curl to download the SDK archive.

Fixes #10358.

Signed-off-by: Jeffery To jeffery.to@gmail.com

This also switches from rsync to curl to download the SDK archive.

Fixes openwrt#10358.

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
@diizzyy
Copy link
Contributor

diizzyy commented Nov 16, 2019

NAK on changing it to CDN as it adds unnecessary complexity, if we want to offload the main site just switch to a mirror instead such as https://ftp.snt.utwente.nl/pub/software/lede/ (there are plenty to choose from).

@jow-
Copy link
Contributor

jow- commented Nov 16, 2019

ACK. Since we have direct control over the CDN (wrt. cache purges etc.), using it is preferable over 3rd party mirrors. Also in case it ever goes out of service we can simply move the cdn CNAME back to downloads.openwrt.org.

@diizzyy
Copy link
Contributor

diizzyy commented Nov 16, 2019

...except that the CDN seems to be much slower which kinda defeats the point of using it in the first place?

Test run on 2 different ISPs (and both peers differently so they're not using the same peering networks)...

curl -O https://ftp.snt.utwente.nl/pub/software/lede/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  8645k      0  0:00:12  0:00:12 --:--:-- 10.8M
curl -O https://cdn.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  3825k      0  0:00:27  0:00:27 --:--:-- 3853k


curl -O https://ftp.snt.utwente.nl/pub/software/lede/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  10.6M      0  0:00:09  0:00:09 --:--:-- 12.9M
curl -O https://cdn.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  3713k      0  0:00:28  0:00:28 --:--:-- 3793k

@jow-
Copy link
Contributor

jow- commented Nov 16, 2019

The point of using it in the first place is to ensure a good average global reachability. Trading a single european pop for another is not going to achieve that.

Test run from a Spanish EDIS VPS:

jow@srv:/tmp$ curl -O https://cdn.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  11.1M      0  0:00:09  0:00:09 --:--:-- 13.6M
jow@srv:/tmp$ curl -O https://downloads.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  4099k      0  0:00:25  0:00:25 --:--:-- 2625k
jow@srv:/tmp$ curl -O https://ftp.snt.utwente.nl/pub/software/lede/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  1163k      0  0:01:31  0:01:31 --:--:-- 2091k
jow@srv:/tmp$ 

Test run from a DigitalOcean VM in San Francisco:

jow@vpn-us-01:/tmp$ curl -O https://ftp.snt.utwente.nl/pub/software/lede/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  5975k      0  0:00:17  0:00:17 --:--:-- 7634k
jow@vpn-us-01:/tmp$ curl -O https://downloads.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0  13.1M      0  0:00:07  0:00:07 --:--:-- 17.5M
jow@vpn-us-01:/tmp$ curl -O https://cdn.openwrt.org/snapshots/targets/x86/64/openwrt-sdk-x86-64_gcc-8.3.0_musl.Linux-x86_64.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103M  100  103M    0     0   251M      0 --:--:-- --:--:-- --:--:--  251M
jow@vpn-us-01:/tmp$ 

@diizzyy
Copy link
Contributor

diizzyy commented Nov 16, 2019

That doesn't seem to be the case? I poked the CDN from over the pond (US and Asia) and it points back to some rather small network(?) called proinity GmbH (AS44239) in Germany which also seems to operate in Zürich and M247 (AS9009) in Sweden. Anyhow, none are very good options in that regard as they're all based in Europe while Travis CI seems to run off GCE (Google Compute Engine) in US (https://docs.travis-ci.com/user/ip-addresses/).

@jow-
Copy link
Contributor

jow- commented Nov 16, 2019

Regardless of where it points to, I consistently get better throughput there from the machines I do have access to.

@jefferyto
Copy link
Member Author

This change is for CircleCI, not Travis (though CircleCI builds appear to happen mostly in the US).

The goal of this change isn't for faster download/CI speeds, it's to reduce bandwidth usage on downloads.openwrt.org.

@diizzyy
Copy link
Contributor

diizzyy commented Nov 16, 2019

@jefferyto
True my bad, CircleCI seems to also use GCE so it's the same in the end more or less.

Worth mentioning is that previous suggestion moving it off the primary download site didn't get accepted back in 2017.

@champtar
Copy link
Member

ACK
@diizzyy mirror can have a big lag, I expect the cdn to have almost none (but have no idea)
Also I don't see how using a proper cdn had more complexity than using a mirror.
I case of issue it's not hard to revert.

@neheb
Copy link
Contributor

neheb commented Nov 18, 2019

Merging. This is for master. If something breaks we can always revert.

@neheb neheb merged commit a8f863f into openwrt:master Nov 18, 2019
@jefferyto jefferyto deleted the circleci-cdn branch November 18, 2019 12:39
@jefferyto
Copy link
Member Author

CI checks have been failing intermittently ☹️ :

There are also successful runs, for example:
https://circleci.com/gh/openwrt/packages/6169
https://circleci.com/gh/openwrt/packages/6167
https://circleci.com/gh/openwrt/packages/6164

@aparcar Any idea what's going on?

(Unrelated? - there is also a build that has been running for over 60 hours: https://circleci.com/gh/openwrt/packages/6146)

@jow-
Copy link
Contributor

jow- commented Nov 21, 2019

@jefferyto - I think we need to switch to path versioning to tackle this issue. The CDN right now operates in caching reverse proxy mode which can lead to situations where e.g. the sha256sum, signature and sdk archive files are cached at different points in time, leading to the observed inconsistencies.

I suppose moving the snapshot builds from something like snapshots/targets/$target/$subtarget to snapshots/targets/$revision/$target/$subtarget (with an appropriate HTTP redirect in place) would solve that problem.

@jefferyto
Copy link
Member Author

@jow- I think I understand the caching situation, but the files are updated only once a day, so wouldn't there only be one time window each day where the files can be out-of-sync with each other? Also, the files are requested quickly one after another, so wouldn't it be more likely that all of the files are yesterday's version (but still in sync with each other), rather than some being yesterday's version and others today's version? (Or this has to do with different caching endpoints having different versions?)

I don't have a strong opinion on path versioning. (With a HTTP redirect then the CI build doesn't need to figure out what is the latest revision.)

@jow-
Copy link
Contributor

jow- commented Nov 21, 2019

@jefferyto - my assumption is that due to the anycast nature of the IPs, different request end up on different cache servers with different versions. But maybe the problem is at the source... would be interesting to know if downloads.openwrt.org had the same problem at the same time.

@neheb
Copy link
Contributor

neheb commented Nov 21, 2019

ja-pa's branch is a little behind master and does not have this change.

This should probably be reverted untill there's a fix.

@jefferyto
Copy link
Member Author

@neheb I think ja-pa's branch does have this: https://github.com/ja-pa/packages/commits/bind-9.14.8/.circleci/config.yml

I don't want to block CI checks but it may be hard to figure out what the issue is with this reverted.

@jefferyto
Copy link
Member Author

I suppose we can change it back to downloads.openwrt.org to see if the issue continues to exist.

@diizzyy
Copy link
Contributor

diizzyy commented Nov 21, 2019

@champtar
What mirrors do have big lag?

Anyhow, just back this out as it's breaking things

@champtar
Copy link
Member

@diizzyy "big" might be a bit strong, but mirrors should only sync every 12 to 24h

Looking at headers of downloads.openwrt.org

$ curl https://downloads.openwrt.org/snapshots/targets/ar71xx/generic/sha256sums -vso /dev/null 2>&1 | grep '^<'
< HTTP/1.1 200 OK
< Server: nginx/1.10.3
< Date: Thu, 21 Nov 2019 19:31:06 GMT
< Content-Type: text/plain
< Content-Length: 149801
< Last-Modified: Mon, 17 Jun 2019 12:17:31 GMT
< Connection: keep-alive
< ETag: "5d0784db-24929"
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Access-Control-Allow-Headers: DNT, X-CustomHeader, Keep-Alive, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Content-Range, Range
< Accept-Ranges: bytes

there is everything for the cdn to be able to do periodic cache revalidation
Is it possible to configure that ?

@aparcar
Copy link
Member

aparcar commented Nov 21, 2019

Am I missing something or is the "last modified" header verybwrong? I remember setting the CDN to not cache headers and check ask the main server each time, however is that header is wrong it's not really the CDNs fault...

@jefferyto
Copy link
Member Author

From the download page (https://downloads.openwrt.org/snapshots/targets/ar71xx/generic/) it does look like the last time it was built was 17 Jun 2019.

@diizzyy
Copy link
Contributor

diizzyy commented Nov 21, 2019

@champtar
I honestly don't see why that would be critical but oh well

@champtar
Copy link
Member

champtar commented Nov 21, 2019

@diizzyy I'm not saying it's critical at all, my point is "if we can have no lag and reduce usage of downloads.openwrt.org why not ?"
Now if we can't make this CDN working properly mirrors can be a good option.

@jefferyto
Copy link
Member Author

I finally managed to catch a build failure: https://circleci.com/gh/openwrt/packages/6199

The files do appear to be out-of-sync with each other:

> GET /snapshots/targets/ath79/generic/sha256sums HTTP/1.1
> Host: cdn.openwrt.org
> User-Agent: curl/7.52.1
> Accept: */*
> 
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
< HTTP/2 200 
< server: keycdn-engine
< date: Fri, 22 Nov 2019 21:18:35 GMT
< content-type: text/plain
< content-length: 166518
< vary: Accept-Encoding
< last-modified: Thu, 21 Nov 2019 06:38:06 GMT
< etag: "5dd630ce-28a76"
< access-control-allow-methods: GET, POST, OPTIONS
< access-control-allow-headers: DNT, X-CustomHeader, Keep-Alive, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Content-Range, Range
< expires: Fri, 22 Nov 2019 22:18:35 GMT
< cache-control: max-age=3600
< link: <https://downloads.openwrt.org/snapshots/targets/ath79/generic/sha256sums>; rel="canonical"
< x-cache: HIT
< x-edge-location: usny
< access-control-allow-origin: *
< accept-ranges: bytes
< 
...
> GET /snapshots/targets/ath79/generic/sha256sums.asc HTTP/1.1
> Host: cdn.openwrt.org
> User-Agent: curl/7.52.1
> Accept: */*
> 
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
< HTTP/2 200 
< server: keycdn-engine
< date: Fri, 22 Nov 2019 21:18:36 GMT
< content-type: text/plain
< content-length: 877
< last-modified: Fri, 22 Nov 2019 12:54:46 GMT
< etag: "5dd7da96-36d"
< access-control-allow-methods: GET, POST, OPTIONS
< access-control-allow-headers: DNT, X-CustomHeader, Keep-Alive, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Content-Range, Range
< expires: Fri, 22 Nov 2019 22:18:36 GMT
< cache-control: max-age=3600
< link: <https://downloads.openwrt.org/snapshots/targets/ath79/generic/sha256sums.asc>; rel="canonical"
< x-cache: HIT
< x-edge-location: usny
< access-control-allow-origin: *
< accept-ranges: bytes
< 
...
> GET /snapshots/targets/ath79/generic/sha256sums.sig HTTP/1.1
> Host: cdn.openwrt.org
> User-Agent: curl/7.52.1
> Accept: */*
> 
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
< HTTP/2 200 
< server: keycdn-engine
< date: Fri, 22 Nov 2019 21:18:36 GMT
< content-type: application/octet-stream
< content-length: 170
< last-modified: Thu, 21 Nov 2019 06:39:13 GMT
< etag: "5dd63111-aa"
< access-control-allow-methods: GET, POST, OPTIONS
< access-control-allow-headers: DNT, X-CustomHeader, Keep-Alive, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Content-Range, Range
< expires: Fri, 22 Nov 2019 22:18:36 GMT
< cache-control: max-age=3600
< link: <https://downloads.openwrt.org/snapshots/targets/ath79/generic/sha256sums.sig>; rel="canonical"
< x-cache: STALE
< x-edge-location: usny
< access-control-allow-origin: *
< accept-ranges: bytes
< 

From the download page right now:

Filename sha256sum File Size Date
sha256sums - 163.6 KB Fri Nov 22 13:53:41 2019
sha256sums.asc - 0.9 KB Fri Nov 22 13:54:46 2019
sha256sums.sig - 0.2 KB Fri Nov 22 13:54:46 2019

I wonder if the CDN is invalidating its cache properly?

diizzyy added a commit to diizzyy/packages that referenced this pull request Nov 22, 2019
This reverts commit 27fdddf due to it causing random failures.
Change agreed on here: openwrt#10560

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
@champtar
Copy link
Member

@jefferyto I see cache-control: max-age=3600, so I think keycdn is caching for 1h by default if there is no Cache-control headers.

What config did you use to make sure it always revalidate ?
Maybe we need something like Cache-control: no-cache, to really always force re-validation
https://www.keycdn.com/blog/http-cache-headers

Best solution is @jow- revision in the url, but need more work

@jefferyto
Copy link
Member Author

@champtar I didn't set up the CDN - I believe @aparcar knows more about this.

Curious how all three requests went to x-edge-location: usny and yet the results can be so different.

I'm not against adding versioning to the path - I just hope to find the true cause of this issue (maybe it is because of different cache servers with different versions) before jumping to a solution.

@BKPepe
Copy link
Member

BKPepe commented Nov 24, 2019

As @aparcar was notified, I think we should notify @ynezz to let him know about this.

@ynezz
Copy link
Member

ynezz commented Nov 24, 2019

Curious how all three requests went to x-edge-location: usny and yet the results can be so different.

@jefferyto I can see two different IPs .2 and .3 in that log, so probably different machines/cluster? Anyway, thanks a lot for tracking this down!

Putting this issues aside, I'm wondering if there is any particular reason for downloading of the SDK every time from scratch.

@diizzyy
Copy link
Contributor

diizzyy commented Nov 24, 2019

I think the idea is to catch breakage as soon as possible and avoid manual intervention (respinning) as much as possible. I don't think you'd save much setting up some kind of bsdiff-ish solution or did you have something else in mind?

@jefferyto
Copy link
Member Author

@ynezz Presumably because the snapshot SDK is updated every day. (Also allows for different branches to download different SDKs.)

@aparcar has a PR to replace this with an SDK Docker image that is rebuilt every day (#9434) but there hasn't been much progress lately.

@aparcar
Copy link
Member

aparcar commented Nov 24, 2019

Here are the current settings of keycdn, if anyone has an idea where the problem is, please let me know! I guess the CDN is not really suitable for snapshots due to it's short nature, maybe my idea was bad at the beginning. If we'd be testing release SDK it seems reasonable.
image

@jefferyto my favorite solution would be to use the Docker containers as they are generated anyway. Dockers CDN would handle the bandwidth, therefore same achievement after all.
I'll rework the PR

@ynezz
Copy link
Member

ynezz commented Nov 24, 2019

Presumably because the snapshot SDK is updated every day. (Also allows for different branches to download different SDKs.)

You can have any variable you want in the cached dirname, so it works fine with multiple branches and other build keys/variables. One could simply expire the cache after fixed timeout (12-24h for master, could be much higher for releases where you don't introduce breaking changes that often :)) and still be fine in 99.9% of use cases. Or if one really needs fresh, then just download sha256sum every time, check for the checksum change, kill the cache on change and download fresh SDK.

to replace this with an SDK Docker image

It makes sense to me. With that image there is no need for SDK download step anymore. As a bonus OpenWrt Docker images would then get more visibility and testing (dog fooding is always good), thus likely get more fixes and improvements.

Dockers CDN would handle the bandwidth, therefore same achievement after all.

I've mixed feelings about that, maybe it's because of the registry proxy used by the CI platforms? I find it much faster to use local CI registry (don't know if CircleCI has such stuff) then Docker registry. Anyway, this is cloud, so unpredictable behavior every day by the definition...

@aparcar
Copy link
Member

aparcar commented Nov 24, 2019

I updated the PR #9434 and added a random PR to test it. Seem to work, anyone dare a merge?

@champtar
Copy link
Member

@aparcar maybe set "Expire (in minutes)" to "-1" and "Ignore cache control" to "disable"
Do you have ssh access to downloads.openwrt.org ? a quick tcpdump would be way easier to see if we always revalidate.

I just launched a simple script to see how bad the issue is

#!/usr/bin/env python3

import requests, time
path = 'https://%s.openwrt.org/snapshots/targets/ath79/generic/sha256sums'

while True:
    cdn = requests.head(path%'cdn')
    down = requests.head(path%'downloads')
    print('## CDN content is', 'up to date' if cdn.headers['Etag'].lstrip('W/') == down.headers['Etag'].lstrip('W/') else 'stale')
    print(cdn.headers)
    print(down.headers, flush=True)
    time.sleep(60)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: use cdn.openwrt.org instead of downloads.openwrt.org
8 participants