Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 3 NFS speed problems #758

Closed
rvagg opened this issue Jun 13, 2017 · 6 comments
Closed

Raspberry Pi 3 NFS speed problems #758

rvagg opened this issue Jun 13, 2017 · 6 comments

Comments

@rvagg
Copy link
Member

rvagg commented Jun 13, 2017

This seems to have come up since the outage last week (#749). The summary is that the power took out all of the infra, the UPS that they are all on didn't last long at all (it's more overloaded than it should be, I'm reorganising things over the coming weeks to shift the load to be more reasonable), so it was a hard-shutdown for everything.

On startup, I took the opportunity to do a cleanup and update; this also involved cleaning out the Jenkins workspaces on all of the Pi's that are mounted via NFS to a single host on the same network. Unfortunately this put a big load on the network when they all got back up again because they all end up doing git clones at the same time and it would be more efficient to just let them do this on their own SD cards in this initial run. I'm going to have to be more judicious in future in my "cleanup" because of this cost.

But now we should be roughly back to normal, with workspaces having active versions of the most frequently used repos/branches. The Raspberry Pi 2's seem to be getting through their jobs pretty quickly. The 1 B+'s also seem to be doing things at their expected speed—not super fast but as fast as they historically have. But the Pi 3's seem to get stalled on git checkout or git clone activities. It may be because we're still playing catch-up but the amount of time it takes is unreasonable for the configuration and doesn't explain why the 2's have been so fast to catch up but the 3's haven't.

I'm going to be reorganising the network soon, I have some new gear coming for my network that should speed up the ARM cluster as well, it could be about the particular switch the 3's are on (the 1's and 2's are on separate switches too fwiw) or maybe the router that ties them all together is dodgy (wouldn't surprise me, it was cheap, it's going to be replaced soon).

So, this issue can be used to track ongoing problems that anyone experiences (ping @nodejs/testing), and I'll also use it to document changes that I'm making on my end that may improve the situation. We'll close it when we think it's satisfactorily resolved.

FYI I have concerns about using NFS in general, it's such an ancient protocol and I've never seen it perform well in any situation. I just don't know of a better option here. I could take them off NFS entirely and let them use their SD cards—NFS hasn't saved us a ton of time but it does also give us disk space that we don't get on the SD cards, particularly on the 1's which only have 8G. I also have persistent problems with mounting NFS on startup, I generally have to do it manually after the machines come back up. Perhaps its time to try sshfs or cifs or some other, I'm open to suggestions here!

@bnoordhuis
Copy link
Member

sshfs

CPU is probably going to be a big bottleneck on those machines unless you use a cipher like blowfish, and maybe even then. ssh and sshfs don't have null ciphers.

If you don't care about the abstractions NFS provides except storage (not files, locks, uids, etc.), you could configure a Linux machine on your network as a iSCSI target and have the Pis use it as block storage. Cheap, fast, easy to set up and maintain.

@Trott
Copy link
Member

Trott commented Jun 13, 2017

Does it make sense to configure CI to not run on the Pi3's until this is all sorted out? Or would that just make it so we won't know when it's fixed?

@rvagg
Copy link
Member Author

rvagg commented Jun 13, 2017

@Trott if you can confirm that it continues to be a problem we can take them out of the pool, I don't think we'd lose a lot in terms of coverage. It is useful knowing if the problem is persisting but I could reconnect them whenever I'm doing testing of changes on my end.

@Trott
Copy link
Member

Trott commented Jun 14, 2017

@Trott if you can confirm that it continues to be a problem we can take them out of the pool,

All day, nothin' but failing Pi3's...I haven't seen a single Pi3 pass...

@rvagg
Copy link
Member Author

rvagg commented Jun 14, 2017

Done: https://ci.nodejs.org/job/node-test-binary-arm/8670/

@joaocgreis can you sanity check for me -- I've just unticked pi3 from the labels list and it seems to be enough to turn them off, tbh it feels too simple!

@joaocgreis
Copy link
Member

@rvagg Looks good! Might be the only thing in Jenkins, but it is that simple.

@rvagg rvagg closed this as completed Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants