-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[investigating] networking becomes "clogged" and max bandwidth is reduced to 5% #566
Comments
closing as I couldn't replicate it in past 5 days |
@dhaavi there must be definitely something wrong with portmaster. I can't help myself but I have to blame something 😆 This is graph of my laptop's networking throughput. At 22:10 (first blue dashed line) portmaster's SPN module crashed on nil pointer reference (reported safing/spn#74). I've mentioned similar behaviour in another issue (safing/spn#67 (comment)) and as you can see laptop reboot again fixed it (second dashed blue line at the very right of the graph). I'm sitting here, watching how slow Nextcloud sync my files (over tailscale network (yellow graph), but connected over LAN, no relay). I wish I restarted my laptop earlier 😞 My laptop wasn't connected to SPN at least in last hour. I think I disabled it few minutes after opening the nil pointer crash but I'm not 100% sure. It's good to mention that it happened even before I've deployed tailscale. I've switched to tailscale yesterday. Positive is download, negative is upload. |
That's some very interesting data. Do you have CPU/MEM data for the same time period? Would be interesting if there is a correlation. Also, are you collecting metrics from the portmaster (http://127.0.0.1:817/metrics)? These two metrics could tell us how many packets the Portmaster was handling during that period.
I remember that if every packet goes through the Portmaster, the bandwidth limit was around 20Mbit/s - this suspiciously matches the reported speed here. |
I do, for 30 minutes already 😅 Do you have a grafana dashboard for those stats? I discovered the GraphsI've added the 2 annotations also to CPU/RAM dashboard and posting all 3 because so you have it with same time range. Some more graphs about network. There are some drops on UDP (wireguard managed by tailscale). I haven't noticed those when I originally opened this issue. Maybe because I did not use wireguard (tailscale) locally. Anyway I don't think it means anything because the UDP errors spike is after the laptop restart. I think it is drawback of being connected over Wi-Fi. |
btw the swap is not swap but zram. people often see it as problem since it's not common to use swap nowadays anywhere. |
Ha! I've got some interesting stuff!
|
I don't knwo what should I see here or how to use those metrics (rate() could be used probably?) but I can see that for today the metrics value increases noticeably faster than before. 2 squares were enough for last 2 days and for today 4 squares aren't enough. I wouldn't say that I did more heavy network stuff than I did yesterday. |
Wow, nice stats! It seems there is a correlation between CPU usage and the network.Interesting. This would point to a problem with the network integration. Are you using tailscale exit node stuff?
Does rsync also go over Tailscale, or not?
This is a histogram in the form that victoriametrics does it: https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350 So, something like this should do it: (not tested)
What I hope to see is how many packets the Portmaster handled in a certain period. For this, you can just use |
No, I tried it, it broke everything and I don't need it so I disabled it and since that I use SPN on my laptop.
I've posted 4 examples and the first one is rsync without tailcale. When I tried to download (firefox) the same file from server through LAN without tailscale it did not reach "full speed" though. I'm just wondering why this "problem" doesn't apply for rsync. I just can't understand why rsync without tailscale copies the file at "fulll speed" while downloading the same file using firefox makes it slow. There must be something that rsync bypassed and it's not tailscale...
Nope, normally it goes about 95% of my LAN throughput without tailscale. The overhead is really low.
I don't think the tailscale is the problem here. When it works, it works and the "overhead" of tailscale is only a few percent in speed drop... This happened even before I discovered tailscale. |
I am cleaning out old issues. If you feel this issue should not have been closed let me know. Please keep in mind, the free version of Portmaster only has limited support. |
❗ ❗ ❗ DISCLAIMER: THIS MAY NOT BE CAUSED BY PORTMASTER, STILL INVESTIGATING ❗ ❗ ❗
What happened:
In past week I had to reboot my laptop to make networking fast again. We are talking about LAN throughput so I'm 99% sure it wasn't related to ISP throttling me or anything.
I watched video content streamed from my local server over DLNA hosted by Plex Server. I'm using VLC most of the time but sometimes I use Videos gnome app (totem player) because it works better on slower LAN.
There is main difference between VLC and totem player:
However in past week for some reason it felt like the networking on my laptop is "clogged". I couldn't load self-hosted apps over LAN from my local server nor stream DLNA content smoothly (VLC) / cache it at full speed (totem). At most I could measure 500kB/s download from my home server but content with bitrate of 8Mbps was lagging. I blamed the home server at the begging because there are some I/O heavy tasks running time to time (backups and so on). Also, there are network heavy tasks being run.
Sadly I wasn't able to find any issue on my home server. I checked graphs scrapped by prometheus' node-exporter and there wasn't anything relevant. Disks were under normal load, no packet drops, bandwidth wasn't used completely, cpu was running at 40% most... Nothing weird.
This is the reason why I'm opening this issue and I will continue investigating this. Rebooting of my laptop always fixed that issue. I've installed node-exporter also on my laptop so I have some graphs for you when it happens again.
What I know so far:
192.168.1.0/24
subnet).What did you expect to happen?:
Networking is working at full speed all the time.
How did you reproduce it?:
I have no idea.
Debug Information:
Of course I forgot to copy it. I'll try not to forget the next time. I'm running version 0.8.5.
The text was updated successfully, but these errors were encountered: