Skip to content

rochaporto/helm-foldingathome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Helm Folding@Home

A Helm Chart to deploy workloads for Folding@Home.

You can set a fixed number of replicas or enable the horizontal pod autoscaler which will fill unused resources in your cluster with pods executing workloads from Folding@Home.

These workloads have low priority and are pre-emptible, meaning your main workloads will have priority and can take over when you scale up.

Monitoring

Deployment

The default values will enable CPU workloads only:

helm install . --name folding-cpu --namespace folding \
    --set foldingathome.config.user=YOURUSER
    --set foldingathome.config.team=YOURTEAMID

If you have and want to use GPUs, pass the additional values file:

helm install . --name folding-cpu --namespace folding \
    --values values-gpu.yaml
    --set foldingathome.config.user=YOURUSER
    --set foldingathome.config.team=YOURTEAMID

Configuration

The foldingathome.config section takes key/value pairs with any FAHClient param, check the available options with:

kubectl exec -it pod <foldingathomepod> /FAHClient --help

Monitoring

Check the status of the horizontal pod autoscaler:

$ kubectl -n foldingathome get hpa
NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS    REPLICAS   AGE
foldingathome   Deployment/foldingathome   93%/90%   1         10000000   3          25m

$ kubectl -n foldingathome describe hpa
Name:                                                  foldingathome
Namespace:                                             foldingathome
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  93% (938m) / 90%
Min replicas:                                          1
Max replicas:                                          10000000
Deployment pods:                                       3 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Normal   SuccessfulRescale             10m                horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target

Check the logs of individual pods to check the configuration loaded and job execution:

kubectl -n foldingathome logs foldingathome-6c945c5d89-xpsbc -f
...
22:07:57:WU00:FS00:Assigned to work server 128.252.203.4
22:07:57:WU00:FS00:Requesting new work unit for slot 00: READY cpu:4 from 128.252.203.4
22:07:57:WU00:FS00:Connecting to 128.252.203.4:8080
22:08:00:WU00:FS00:Downloading 4.36MiB
22:08:03:WU00:FS00:Download complete
22:08:03:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13840 run:0 clone:4301 gen:2 core:0xa7 unit:0x0000000380fccb045e6ee152e718e24f
22:08:04:WU00:FS00:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah
22:08:04:WU00:FS00:Connecting to cores.foldingathome.org:80
22:08:04:WU00:FS00:FahCore a7: Downloading 8.91MiB
22:08:06:WU00:FS00:FahCore a7: Download complete
22:08:06:WU00:FS00:Valid core signature
22:08:06:WU00:FS00:Unpacked 20.97MiB to cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7
22:08:06:WU00:FS00:Starting
22:08:06:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper //cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1 -checkpoint 15 -np 4
22:08:06:WU00:FS00:Started FahCore on PID 15
...
22:08:06:WU00:FS00:0xa7:Project: 13840 (Run 0, Clone 4301, Gen 2)
22:08:06:WU00:FS00:0xa7:Unit: 0x0000000380fccb045e6ee152e718e24f
22:08:06:WU00:FS00:0xa7:Reading tar file core.xml
22:08:06:WU00:FS00:0xa7:Reading tar file frame2.tpr
22:08:06:WU00:FS00:0xa7:Digital signatures verified
22:08:06:WU00:FS00:0xa7:Calling: mdrun -s frame2.tpr -o frame2.trr -x frame2.xtc -e frame2.edr -cpt 15 -nt 4
22:08:06:WU00:FS00:0xa7:Steps: first=250000 total=125000

Development

Feel free to open issues or submit merge requests.

About

A Helm Chart to backfill with Folding@Home workloads

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published