Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build.koreader.rocks / ota.koreader.rocks down (Azure bandwidth issues) #10615

Open
Hzj-jie opened this issue Jun 26, 2023 · 116 comments
Open

build.koreader.rocks / ota.koreader.rocks down (Azure bandwidth issues) #10615

Hzj-jie opened this issue Jun 26, 2023 · 116 comments

Comments

@Hzj-jie
Copy link
Contributor

Hzj-jie commented Jun 26, 2023

Now it's a 1st gen vm on azure, the vm type has been deprecated and would be removed around 2024.
Meanwhile the Ubuntu is 16, and should be upgraded.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jun 29, 2023

I have some time recently to work on it.
But before starting, do we still have a list of the binaries in the machine?

@Frenzie
Copy link
Member

Frenzie commented Jun 29, 2023

How do you mean precisely? Like the list of apt installed packages and the Docker images?

Wrt the OS we shouldn't need much of anything other than Bash and Docker; all the important things are done in Docker.

It might be quicker to talk on Gitter or something btw, if you're on there?

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 1, 2023

Oh, that's great. Previously I thought we copied the binaries.
When would be the good time to do the upgrade? I haven't done it before, I am not very sure if I can preserve the original ip.
I will do some homework first.

@Frenzie
Copy link
Member

Frenzie commented Jul 1, 2023

There are a couple of things on there like ncdu that can be useful but I don't think a specific effort is needed there. We can just install it if or when we need it. What's important is ops and the user configs.

Pinging @houqp for the Cloudflare stuff.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 1, 2023

I know progress sync and ota are on the machine, while progress sync has persistent date. Anything else?

@Frenzie
Copy link
Member

Frenzie commented Jul 1, 2023

The configs and scripts, the signing key, I meant the ops thing quite literally (perhaps including the actual archives for ota, though that'd regenerate in time). Just everything that's in there.

What's in the cronfile is potentially also important, though it's probably just to run the cleanup script once or twice a day.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 3, 2023

OK, let me first upgrade the machine type, currently it's running an A1_Basic which will be deprecated August 2024.
https://learn.microsoft.com/en-us/azure/virtual-machines/av1-series-retirement

VM | Size | Type | vCPUs | RAM (GiB) | Data disks | Max IOPS | Temp storage (GiB) | Premium disk | Cost/month
A1_Basic | General purpose | 1 | 1.75 | 2 | 2x500 | 40 | Not supported | $22.63
I.e. 1 vCPU 1.75G ram, 1000 IOPS

Likely the B1ms VM type is the best approach.
B1ms | General purpose | 1 | 2 | 2 | 640 | 4 | Supported | $18.10
I.e. 1 vCPU 2G ram and 640 IOPS

The only thing matters is the temporary storage, 40GB vs 4GB, not sure if it's sufficient.

Or if 1G ram is enough, we can try B1s, and save more budget for the bandwidth - azure charges bandwidth separately which was the reason it ran out my budget from the subscription.
B1s | General purpose | 1 | 1 | 2 | 320 | 4 | Supported | $9.05
I.e. 1G ram and 320 IOPS vs 2G ram and 640 IOPS.

It seems like I lost access to the VM, previously it used a user name & password. But now it accepts only the ssh key-pair. But azure is very bad at the keys, replacing the key would break you guys I believe.
If possible, would you mind to add my public key so that I can log into it and take a look.

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDfnbW0jcCTFJknPks6Lir9ZZfiX8By62414r0bvN4ciIQWleU147Ma4ZBrR5E7GV8IyX4zbLmldI00uKbFb9q2IpHN7ebNmKfIOnDnTFOuLMPjsAUHCl13yIr0yLlEWILu1Tni7w3oeNXGy7WK2oDJ5DgwASvoCrQ5GKg3SNnmBJBk/1EMWVQHZ+SralMx5Udz80ij1YutWV0S8kJH3YgHXE1G4SVmTq9oC7riMI5l1QWJgaKynrY2D171VRhIbafqLkR7SmcN1Vw23fnAEbIga94SBcXyl9tVG2r5rYUSGyHkvO+rjc+XHf701AudeG/+LQB2Uf5t90y8e1oV5IREeVz2BIYomQBNQjTyS+EoOB+ai/ponXwaeoVUxTdYRWQTzVtZ0ewPoLTheCK8qEDwm04al4xjWgfCAXzd+5ZKpS2NcgLi+RDL3bN5uExMPevOtkVhhr1BrdJFBz7LeB1X7cx34I1GxhuLfGMoh0oVYhmv8dC2RIoNCngRzFVTKq0= hzj-jie@hzj-jie-x1

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 3, 2023

By the way, changing the machine type (size in azure concept) would preserve the data but restart the machine. Should we set up the services into systemd first?

@Frenzie
Copy link
Member

Frenzie commented Jul 3, 2023

I'm thinking by Docker container name, something like this?

[Unit]
Description=Docker nginx
Requires=docker.service
After=docker.service

[Service]
Restart=always
ExecStart=/usr/bin/docker start -a nginx
ExecStop=/usr/bin/docker stop -t 2 nginx

[Install]
WantedBy=multi-user.target

@Frenzie
Copy link
Member

Frenzie commented Jul 3, 2023

The only thing matters is the temporary storage, 40GB vs 4GB, not sure if it's sufficient.

What does temporary storage mean exactly? Over in /mnt? That's not really used atm; actually I'm not sure I even knew that existed. I figure we can easily make do with 4 GB for some temp extraction/file manipulation provided the actually important permanent stuff is at least some 30 GB as it is now.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 3, 2023

I don't know, but I can take a look once I can login to the machine.

@poire-z
Copy link
Contributor

poire-z commented Jul 3, 2023

I've just added your key into your own authorized_keys (hoping you remember the spelling of your username :)
(Been years since I last logged into this server :)

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 6, 2023

I cleaned up some my bad user names. But if I unfortunately broke your authentication, let me know.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 6, 2023

So the machine has two disks. The root is sda1, persistent AFAICT, and it's ~30G; the sdb is mounted at /mnt, temporary, and it's ~40G. So I think the "Temporary storage" means it, but we never use it.

/dev/sdb1       41151808    49036  38989340   1% /mnt

Hzj_jie@hzj-jie-ubuntu:/$ ll /mnt/
total 28
drwxr-xr-x  3 root root  4096 Jun 26 02:24 ./
drwxr-xr-x 25 root root  4096 Jun 26 02:23 ../
-r--r--r--  1 root root   639 Jun 26 02:24 DATALOSS_WARNING_README.txt
drwx------  2 root root 16384 Jun 26 02:23 lost+found/

The sda1 in contrast is 98% full,

/dev/sda1       29711408 29068328    626696  98% /

Hzj_jie@hzj-jie-ubuntu:/$ du -h -d 1 2>/dev/null
44K     ./tmp
680K    ./run
4.0K    ./root
17M     ./bin
14M     ./sbin
35M     ./opt
0       ./dev
159M    ./boot
8.0K    ./snap
4.0K    ./lib64
952M    ./lib
7.7M    ./etc
0       ./proc
249M    ./home
17G     ./ops
4.0K    ./docker
16K     ./lost+found
1.3G    ./usr
0       ./sys
24K     ./mnt
4.0K    ./srv
4.0K    ./media
438M    ./var
20G     .

Pretty much everything is in /ops, or more specifically, /ops/prod/build/download/stable. Are we serving only the latest build from here via ota? If so we can delete the old stable versions then.

For the server update, it's lucky that the do-release-upgrade exists on the vm and we may use it.

@Frenzie
Copy link
Member

Frenzie commented Jul 6, 2023

If so we can delete the old stable versions then.

Not as a matter of course, people depend on that and we purposefully keep a limited number of old ones around. But in order to free up 10+ GB for a release upgrade, go right ahead. :-)

@Frenzie
Copy link
Member

Frenzie commented Jul 6, 2023

Same story for the older nightlies.

@Frenzie
Copy link
Member

Frenzie commented Jul 6, 2023

PS I'm sure you're aware, but don't forget to run the upgrade in tmux in case of connectivity issues. :-)

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 7, 2023

I will try :)

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 9, 2023

Added and enabled services:

Hzj_jie@hzj-jie-ubuntu:/ops/prod/build/download$ sudo systemctl enable kosync.service
Created symlink from /etc/systemd/system/multi-user.target.wants/kosync.service to /etc/systemd/system/kosync.service.
Hzj_jie@hzj-jie-ubuntu:/ops/prod/build/download$ sudo systemctl enable kosync.service
Hzj_jie@hzj-jie-ubuntu:/ops/prod/build/download$ sudo systemctl enable nginx
Created symlink from /etc/systemd/system/multi-user.target.wants/nginx.service to /etc/systemd/system/nginx.service.
Hzj_jie@hzj-jie-ubuntu:/ops/prod/build/download$ sudo systemctl enable nightswatcher
Created symlink from /etc/systemd/system/multi-user.target.wants/nightswatcher.service to /etc/systemd/system/nightswatcher.service.

Anyone want to confirm if I have done it right?

If everything goes right, we may want to add these files into repo.

@Frenzie
Copy link
Member

Frenzie commented Jul 9, 2023

Luckily you don't seem to have interfered with my release proceedings. 😅

I'll check in a few minutes.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 9, 2023

Oh, I will definitely let you guys know first before proceeding updates or restarting.

@Frenzie
Copy link
Member

Frenzie commented Jul 9, 2023

I adjusted the nightswatcher startup script which had --rm in it (so stop would remove it immediately) but I think it's good to go now. I also temporarily removed all the nightly/stable files that are merely nice to have and not crucial, so I figure there should be enough space for a release upgrade.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            821M     0  821M   0% /dev
tmpfs           169M   18M  151M  11% /run
/dev/sda1        29G   10G   19G  36% /
tmpfs           842M  368K  842M   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           842M     0  842M   0% /sys/fs/cgroup
none             64K     0   64K   0% /etc/network/interfaces.dynamic.d
/dev/sdb1        40G   48M   38G   1% /mnt
tmpfs           169M     0  169M   0% /run/user/1004
tmpfs           169M     0  169M   0% /run/user/1002

@Frenzie
Copy link
Member

Frenzie commented Jul 9, 2023

Incidentally I just noticed a Prometheus and Grafana that haven't run in years. Therefore no need to add it to SystemD at this time, but just mentioning it.

image

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 12, 2023

I created koreader/koreader-misc#45 to add the definition files into koreader-misc.

Meanwhile, what's the good time for me to try a restart and ensure the services would be up and running after the restart?

@Frenzie
Copy link
Member

Frenzie commented Jul 12, 2023

It doesn't really matter if the nightlies are missing for a day, but the ideal time would be sometime after 7 UTC so the artifacts for the day are there already. Unless you mean wrt the current Android tc update (e.g., #10679) in which case it may be better to wait until that's sorted out.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 13, 2023

Oh, maybe in another way, would you please share the commands you are using to start the services? I do see five dockers now. My concern is that if the services are not working as expected (I do not think so though), I can manually start the dockers to avoid breaking the services.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jul 13, 2023

By the way, memory-wise, 1G would be a little bit restricted if I read /proc/meminfo correctly.

root@hzj-jie-ubuntu:~# cat /proc/meminfo
MemTotal:        1724096 kB
MemFree:          157712 kB
MemAvailable:     991028 kB
Buffers:           49364 kB
Cached:           964028 kB
SwapCached:            0 kB
Active:           952812 kB
Inactive:         424736 kB

@Frenzie
Copy link
Member

Frenzie commented Jul 13, 2023

Oh, maybe in another way, would you please share the commands you are using to start the services?

They're scripts in ops (under build, nginx and sync).

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Sep 26, 2023

I resized the machine to 1 vCPU + 1G ram, and it seems working. Likely it can buy us 3-5 more days per month.

So the public address is public accessible, I am not sure if anyone crawls the site directly through the public ip.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Sep 26, 2023

I do see requests like

172.71.17.150 - - [15/Sep/2023:18:59:20 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09
-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 OPR/101.0.0.0"

and

162.158.175.177 - - [15/Sep/2023:20:53:55 +0000] "GET /download/nightly/v2023.03-45-ga8ab5e84_2023-04-18/ HTTP/1.1" 404 146 "-" "Mozilla/5.0 AppleWebKit/537.
36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)"

Both should be disallowed.

@Frenzie
Copy link
Member

Frenzie commented Sep 26, 2023

I can see why you'd say that about some bot (though it should either get it from the CF cache or be the first hence putting it in the cache) but the first is simply Opera so I'm not sure what you're getting at?

For reference, mine looks like this:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 OPR/102.0.0.0

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Sep 30, 2023

Oh, I got them from the nginx log.
I think the requests should be cached and only accessed by cf like

108.162.246.82 - - [04/Jun/2017:03:23:57 +0000] "HEAD / HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online)"

rather than the regular browsers.

@Frenzie
Copy link
Member

Frenzie commented Sep 30, 2023

A request like that will happen all the time for every caching edge node. There's no such thing as regular browsers just accessing, CF is always in between.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Sep 30, 2023

Oh, do you mean requests like

172.71.17.150 - - [15/Sep/2023:18:59:20 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09
-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 OPR/101.0.0.0"

are indeed passing through cf? I made a mistake here, the hosts are not resolved to the public ip address of the vm.

But as long as it returned 200, the request was served through the azure network and got charged. For example, the following apk was accessed 8 times, and some requests were hours apart, and cf should cache them.

172.70.91.197 - - [13/Sep/2023:10:05:47 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Linux; Android 11; RMX1901 Build/RKQ1.201217.002) AppleWebKit/537.36 (KHTML, like Gecko)  Chrome/116.0.0.0 Mobile Safari/537.36"
172.71.242.147 - - [13/Sep/2023:10:05:50 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Linux; Android 11; RMX1901 Build/RKQ1.201217.002) AppleWebKit/537.36 (KHTML, like Gecko)  Chrome/116.0.0.0 Mobile Safari/537.36"
172.70.86.27 - - [13/Sep/2023:10:06:01 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Linux; Android 11; RMX1901 Build/RKQ1.201217.002) AppleWebKit/537.36 (KHTML, like Gecko)  Chrome/116.0.0.0 Mobile Safari/537.36"
172.69.59.188 - - [14/Sep/2023:00:51:25 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Android 13; Mobile; rv:109.0) Gecko/117.0 Firefox/117.0"
172.71.186.230 - - [14/Sep/2023:01:40:37 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "http://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk" "Mozilla/5.0 (Android 13; Mobile; rv:68.0) Gecko/68.0 Firefox/68.0"
172.71.17.151 - - [14/Sep/2023:03:47:53 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
172.68.10.185 - - [14/Sep/2023:03:49:10 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Mobile Safari/537.36"
172.71.17.150 - - [15/Sep/2023:18:59:20 +0000] "GET /download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/koreader-android-arm64-v2023.08-23-g3f677a7fd_2023-09-13.apk HTTP/1.1" 200 33851392 "https://build.koreader.rocks/download/nightly/v2023.08-23-g3f677a7fd_2023-09-13/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 OPR/101.0.0.0"

Or the tar.gz files in the dict/

172.69.33.129 - - [26/Sep/2023:02:31:50 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.08 (https://koreader.rocks/) LuaSocket/3.0.0"
162.158.91.27 - - [26/Sep/2023:02:31:51 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.08 (https://koreader.rocks/) LuaSocket/3.0.0"
172.71.130.85 - - [26/Sep/2023:03:00:08 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.08-8028 (https://koreader.rocks/) LuaSocket/3.0.0"
172.71.122.161 - - [26/Sep/2023:03:00:10 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.08-8028 (https://koreader.rocks/) LuaSocket/3.0.0"
162.158.158.199 - - [26/Sep/2023:03:08:32 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.05.1 (https://koreader.rocks/) LuaSocket/3.0.0"
162.158.155.194 - - [26/Sep/2023:03:08:35 +0000] "GET /download/dict/gcide.tar.gz HTTP/1.1" 200 14958781 "-" "KOReader/2023.05.1 (https://koreader.rocks/) LuaSocket/3.0.0"

I haven't seen cache related headers in the nginx configuration,

        location /download {
                alias /data/release_download;
                autoindex on;
                fancyindex on;
                fancyindex_exact_size off;
                fancyindex_name_length 95;
                fancyindex_natural_sort on;
                fancyindex_header /fancyindex/header.html;
                fancyindex_ignore fancyindex;
        }

Are you sure cf would cache the files in the case?

@Frenzie
Copy link
Member

Frenzie commented Sep 30, 2023

are indeed passing through cf?

Yes. It's impossible for it to be any other way.

For example, the following apk was accessed 8 times, and some requests were hours apart, and cf should cache them.

"Hours apart" combined with different edge nodes means there's no particular reason it would be cached. They might drop it sooner than whatever you set or might expect if it hasn't been accessed in a while. Something like CF helps the most when there are a ton of requests coming in a short span using the same edge node. But that's why we have a 50-60% cache hit rate rather than over 90% (on average, this morning at 7 it was actually 96.54%).

Are you sure cf would cache the files in the case?

Yes, you can easily verify for yourself with curl -I or -v.

Most important, you can see CF is aware of Last-Modified: Wed, 09 Jan 2019 15:35:31 GMT. In principle this is all it should check when it needs to revalidate.

$ curl -I http://build.koreader.rocks/download/dict/gcide.tar.gz
HTTP/1.1 200 OK
Date: Sat, 30 Sep 2023 07:44:14 GMT
Content-Type: text/plain
Content-Length: 14958781
Connection: keep-alive
Last-Modified: Wed, 09 Jan 2019 15:35:31 GMT
ETag: "5c3614c3-e440bd"
Cache-Control: max-age=14400
CF-Cache-Status: MISS
[…]
$ curl -I http://build.koreader.rocks/download/dict/gcide.tar.gz
HTTP/1.1 200 OK
Date: Sat, 30 Sep 2023 07:49:41 GMT
Content-Type: text/plain
Content-Length: 14958781
Connection: keep-alive
Last-Modified: Wed, 09 Jan 2019 15:35:31 GMT
ETag: "5c3614c3-e440bd"
Cache-Control: max-age=14400
CF-Cache-Status: REVALIDATED
[…]
$ curl -I http://build.koreader.rocks/download/dict/gcide.tar.gz
HTTP/1.1 200 OK
Date: Sat, 30 Sep 2023 07:50:01 GMT
Content-Type: text/plain
Content-Length: 14958781
Connection: keep-alive
Last-Modified: Wed, 09 Jan 2019 15:35:31 GMT
ETag: "5c3614c3-e440bd"
Cache-Control: max-age=14400
CF-Cache-Status: HIT
[…]
$ curl -v http://build.koreader.rocks/download/dict/gcide.tar.gz
* processing: http://build.koreader.rocks/download/dict/gcide.tar.gz
*   Trying 188.114.97.3:80...
* Connected to build.koreader.rocks (188.114.97.3) port 80
> GET /download/dict/gcide.tar.gz HTTP/1.1
> Host: build.koreader.rocks
> User-Agent: curl/8.2.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Sat, 30 Sep 2023 07:53:48 GMT
< Content-Type: text/plain
< Content-Length: 14958781
< Connection: keep-alive
< Last-Modified: Wed, 09 Jan 2019 15:35:31 GMT
< ETag: "5c3614c3-e440bd"
< Cache-Control: max-age=14400
< CF-Cache-Status: HIT

I haven't seen cache related headers in the nginx configuration,

Indeed, it might help a few percentage points to explicitly add some for at least a week (except for some files that should only be cached a few hours or maybe a day tops).

But ultimately the bandwidth looks like this. Those are quite comfortable numbers. If I find the time later I'll set up my VPS that I'm not really using because Azure is just… weirdly unattractive, but first it's probably better to identify where the data leak is coming from. But in any case it can't be from build.koreader.rocks or ota.koreader.rocks since as stated those pass completely through CF and are tracked by CF. :-/

image

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Oct 2, 2023

Ok, my previous company used some edge networking services a while ago which transfered the data across their edge servers, so the static requests would never reach the server again after some 5 minutes I recall.

But anyway, glad to know CF does not need any heads to function. (I remember we discussed this before :)
Though it's still suspicious that build.koreader.rocks is contributing most of the traffic.

@Frenzie
Copy link
Member

Frenzie commented Oct 2, 2023

The long and short of it is that it's all proxied and I can't imagine them somehow missing tens of gigabytes of data traffic. 47 (total) - 26 (cached) = 21 GB traffic from build. and ota over the past week.

image

image

Which it seems to me really only leaves the thing I blanked out here, or something internal to Azure.

Although technically this is possible:

curl -I build.koreader.rocks/download --resolve build.koreader.rocks:[[IP]]

With some file:

HTTP/1.1 200 OK
Server: nginx
Date: Mon, 02 Oct 2023 07:32:54 GMT
Content-Type: application/zip
Content-Length: 42455211
Last-Modified: Sun, 10 Sep 2023 06:10:46 GMT
Connection: keep-alive
ETag: "64fd5de6-287d0ab"
Accept-Ranges: bytes

Of course that's not quite as it should be so we'll have to double check the traffic numbers from the nginx logs just in case.

https://developers.cloudflare.com/fundamentals/setup/allow-cloudflare-ip-addresses/

@Frenzie
Copy link
Member

Frenzie commented Oct 3, 2023

I've added these caching headers:

      location ~* \.(apk|AppImage|targz|tar.gz|zip)$ {
        add_header Cache-Control "public, max-age=86400, immutable";
      }

      location ~* \.zsync$ {
        add_header Cache-Control "public, max-age=3600";
      }

Of course as stated this will only potentially affect the 22 GB of uncached traffic (mainly depending on whether CF merely checks last-modified or simply redownloads the entire thing because why not), not whatever traffic is actually being problematic.

image

@Frenzie
Copy link
Member

Frenzie commented Oct 3, 2023

build.koreader.rocks says 10.12 GB over the past 7 days (as in goaccess with --keep-last=7)

The ota.koreader.rocks logs say 107.58 GB. That's most peculiar to say the least because I only see CF IPs in the logs. Almost all of it from 172.71.126.something, and some fairly negligible amounts from other CF IPs. Then why doesn't CF show this? It was clearly served through CF. This doesn't make a lick of sense.

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Oct 6, 2023

That's really a lot. Can cf handle 206 correctly?

By the way, the name asdfsdfa is cool 👍

@Frenzie
Copy link
Member

Frenzie commented Oct 6, 2023

I think I see what you're saying.

  1. A partial content request comes in
  2. CF grabs the entire file (which may not necessarily be completely idiotic, provided it then caches it for future requests)
  3. CF only serves that which is actually requested (hopefully)

2, 3 ⇒ CF has downloaded some 50–70 GB more from the server than it actually served to end users, explaining the missing data, and using more data than it ever saved in the process.

This is probably fairly testable, though I don't have the time to right this moment.

@Frenzie Frenzie changed the title [Process] update the vm running ota.koreader.rocks build.koreader.rocks / ota.koreader.rocks down (Azure bandwidth issues) Nov 23, 2023
@ilyats
Copy link

ilyats commented Jan 25, 2024

To whom it may concern, the OTA server is down again. Latest APK on a mirror site is from 01/19/24

@Hzj-jie
Copy link
Contributor Author

Hzj-jie commented Jan 26, 2024

It was resumed yesterday.

@moshin34
Copy link

My sync randomly stopped working a few days ago across all devices. Is this a global thing?

@Miladiir
Copy link

ota.koreader.rocks is down starting from atleast yesterday ~08:00 UTC. I can contribute financially or with my time. I have azure and linux server experience if you guys need help.

@pazos
Copy link
Member

pazos commented Feb 24, 2024

ota.koreader.rocks is down starting from atleast yesterday ~08:00 UTC. I can contribute financially or with my time. I have azure and linux server experience if you guys need help.

Thanks for the offer :)

The server will be back online the 26. AFAIK it gets killed when the network bandwitdh reaches some limit. I think @Frenzie and @Hzj-jie are the people involved with the server management. Not sure what can be done without increasing quotas (I'm happily unaware :p)

@Frenzie
Copy link
Member

Frenzie commented Feb 24, 2024

Not sure what can be done without increasing quotas (I'm happily unaware :p)

I find the VPS very bad compared to all others I have access to. It's slow and SSH connections are unreliable. From e.g. AWS EC2 I know that's definitely not an issue with being in North America. These issues might be a worthy trade-off for a lot of bandwidth, yet bandwidth is apparently also more than an order of magnitude less than anything else. I am nonetheless grateful for its existence.

But as you know these things largely work on annoyance thresholds and it never quite managed to make enough impact, although it's come close. I definitely wouldn't want to lower the annoyance threshold by rewarding Azure for their terrible product.

@kyxap
Copy link

kyxap commented Jun 22, 2024

btw github pipeline does not much build requirements? just curious

@pazos
Copy link
Member

pazos commented Jun 22, 2024

btw github pipeline does not much build requirements? just curious

No idea, but lets focus on solve the bandwidth issue here.

Feel free to open another ticket if you want to build artifacts from github. I think gitlab is better. Not all the eggs on the same basket :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants