-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add (path) "prefix" option to GPU plugin for scalability testing #1104
Commits on Aug 19, 2022
-
Add "prefix" option to GPU plugin for scalability testing
Devices can be faked for scalability testing when non-standard paths are used (GPU plugin code assumes container paths to match host paths, and container runtime prevents creating fake files under real paths). Note: If one wants to run both normal GPU plugin and faked one in same cluster, all nodes providing fake "i915" resources should be labeled differently from ones with real GPU plugin + devices, so that real GPU workloads can be limited to correct nodes with a suitable nodeSelector. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for a02335f - Browse repository at this point
Copy the full SHA a02335fView commit details -
More detailed log for number of found GPU devices / resource types
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 460fce1 - Browse repository at this point
Copy the full SHA 460fce1View commit details -
Add code for generating fake GPU sysfs + devfs files
Based on input JSON file
Configuration menu - View commit details
-
Copy full SHA for f49ca25 - Browse repository at this point
Copy the full SHA f49ca25View commit details -
Remove pre-existing fake sysfs & devfs content + more info
Fake devfs directory is mounted from host so OCI runtime can "mount" device files also to workloads requesting fake devices. This means that those files can persist over fake GPU plugin life-time, so earlier files need to be removed, as they may not match. Also, DaemonSet restarts failing init containers, so errors about directories generated on previous generator run would prevent getting logs of the real error from first generator run.
Configuration menu - View commit details
-
Copy full SHA for 7ecd8a3 - Browse repository at this point
Copy the full SHA 7ecd8a3View commit details -
Container runtime requires device files to real be devices
Represent fake GPU devices with null devices: https://www.kernel.org/doc/Documentation/admin-guide/devices.txt Real devfs check needed also changing, and removal warnings were simplified, as there's always just one entry.
Configuration menu - View commit details
-
Copy full SHA for ca83c87 - Browse repository at this point
Copy the full SHA ca83c87View commit details -
Apply golang-ci-lint suggestions to device generator
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 1e0e04d - Browse repository at this point
Copy the full SHA 1e0e04dView commit details -
Use normal GPU plugin deployment pod spec as base
With latest devices release.
Configuration menu - View commit details
-
Copy full SHA for e540306 - Browse repository at this point
Copy the full SHA e540306View commit details -
Add 8x DG1 configMap for fake GPU device generator
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for d5ff613 - Browse repository at this point
Copy the full SHA d5ff613View commit details -
Switch Intel plugin pod to use faked devices
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 269788e - Browse repository at this point
Copy the full SHA 269788eView commit details -
Apply golang-ci-lint suggestions to GPU plugin
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for b9038fa - Browse repository at this point
Copy the full SHA b9038faView commit details
Commits on Aug 22, 2022
-
Trivialize GPU plugin -prefix option handling
As suggested by Ukri. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for ee1ac15 - Browse repository at this point
Copy the full SHA ee1ac15View commit details -
Better error checks+logs for MkNod(), ReadDir() and RemoveAll()
Give more detailed logging for most likely failure, as MkNod() device node creation can fail as normal user. Additional error checking done in new dir removal helper function fixes Ukri's review comments. There's now error if to-be-removed fake sysfs has more content than expected (earlier such check was only for fake devfs content). Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for dafd079 - Browse repository at this point
Copy the full SHA dafd079View commit details -
Noticed by Tuomas. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 968e294 - Browse repository at this point
Copy the full SHA 968e294View commit details