Skip to content
This repository

Pilot updates #32

Merged
merged 6 commits into from almost 3 years ago

2 participants

Patrick Armstrong Pierre Riteau
Patrick Armstrong
Collaborator

This is a set of patches to add three things to Nimbus Pilot:

  • Add an option to disable pilot's memory bubbling. The Xen Best practices wiki page recommends you disable bubbling in a production environment, and we've had some reliability problems with it. So now we have an option to disable it.

  • Allow the number of cores requested for a VM to be forwarded to the PBS ppn request. This overrides the previous behaviour where an admin would specify the ppn in the pilot.conf file, if you wanted to ensure that only one VM ran per node. You can still get this behaviour with a non-zro ppn in the config file. If you set ppn to zero, you'll get this new behaviour where ppn is the number of cores per VM.

  • Configurable PBS accounting strings. Previously, this would always send the user's cert DN. Now you can send either the nimbus display name, the user's group name, or the user's DN. This is configured in pilot.conf

Feel free to ask me any questions or tell me to do something differently. I'm not very comfortable with the Spring stuff, so if there's a better way to implement what I have let me know.

I could also split this into three pull requests if that would be easier.

Pierre Riteau
s/reccommends/recommends/
Pierre Riteau
Owner

I don't know much about pilot, but I didn't see anything suspicious in the commit series.
Good to know about the bubbling issues by the way.

Patrick Armstrong oldpatricka merged commit 44aae3e into from May 11, 2011
Patrick Armstrong oldpatricka closed this May 11, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
6  authzdb/src/org/nimbus/authz/UserAlias.java
@@ -50,4 +50,10 @@ public int getAliasType() {
50 50
     public String getAliasTypeData() {
51 51
         return aliasTypeData;
52 52
     }
  53
+
  54
+    public String toString() {
  55
+
  56
+        return "userID: '" + userId + "' aliasName: '" + aliasName + "' friendlyName: '" + friendlyName
  57
+                + "' aliasType: '" + aliasType + "' aliasTypeData: '" + aliasTypeData + "'";
  58
+    }
53 59
 }
94  pilot/workspacepilot.py
@@ -35,27 +35,27 @@
35 35
 # result of "generate-index.py < workspacepilot.py"
36 36
 INDEX = """
37 37
       I. Globals                                (lines 10-69)
38  
-     II. Embedded, default configuration file   (lines 71-191)
39  
-    III. Imports                                (lines 193-220)
40  
-     IV. Exceptions                             (lines 222-348)
41  
-      V. Logging                                (lines 350-569)
42  
-     VI. Signal handlers                        (lines 571-673)
43  
-    VII. Timer                                  (lines 675-700)
44  
-   VIII. Path/system utilities                  (lines 702-1073)
45  
-     IX. Action                                 (lines 1075-1126)
46  
-      X. ReserveSlot(Action)                    (lines 1128-1732)
47  
-     XI. KillNine(ReserveSlot)                  (lines 1734-1812)
48  
-    XII. ListenerThread(Thread)                 (lines 1814-1919)
49  
-   XIII. StateChangeListener                    (lines 1921-2147)
50  
-    XIV. XenActions(StateChangeListener)        (lines 2149-2877)
51  
-     XV. FakeXenActions(XenActions)             (lines 2879-2993)
52  
-    XVI. XenKillNine(XenActions)                (lines 2995-3126)
53  
-   XVII. VWSNotifications(StateChangeListener)  (lines 3128-3743)
54  
-  XVIII. Configuration objects                  (lines 3745-3981)
55  
-    XIX. Convert configurations                 (lines 3983-4245)
56  
-     XX. External configuration                 (lines 4247-4317)
57  
-    XXI. Commandline arguments                  (lines 4319-4534)
58  
-   XXII. Standalone entry and exit              (lines 4536-4729)
  38
+     II. Embedded, default configuration file   (lines 71-205)
  39
+    III. Imports                                (lines 207-234)
  40
+     IV. Exceptions                             (lines 236-362)
  41
+      V. Logging                                (lines 364-583)
  42
+     VI. Signal handlers                        (lines 585-687)
  43
+    VII. Timer                                  (lines 689-714)
  44
+   VIII. Path/system utilities                  (lines 716-1087)
  45
+     IX. Action                                 (lines 1089-1140)
  46
+      X. ReserveSlot(Action)                    (lines 1142-1746)
  47
+     XI. KillNine(ReserveSlot)                  (lines 1748-1826)
  48
+    XII. ListenerThread(Thread)                 (lines 1828-1933)
  49
+   XIII. StateChangeListener                    (lines 1935-2161)
  50
+    XIV. XenActions(StateChangeListener)        (lines 2163-2898)
  51
+     XV. FakeXenActions(XenActions)             (lines 2900-3014)
  52
+    XVI. XenKillNine(XenActions)                (lines 3016-3153)
  53
+   XVII. VWSNotifications(StateChangeListener)  (lines 3155-3770)
  54
+  XVIII. Configuration objects                  (lines 3772-4011)
  55
+    XIX. Convert configurations                 (lines 4013-4285)
  56
+     XX. External configuration                 (lines 4287-4357)
  57
+    XXI. Commandline arguments                  (lines 4359-4574)
  58
+   XXII. Standalone entry and exit              (lines 4576-4769)
59 59
 """
60 60
 
61 61
 RESTART_XEND_SECONDS_DEFAULT = 2.0
@@ -146,6 +146,20 @@
146 146
 # If unconfigured, default is 2.0 seconds
147 147
 #restart_xend_secs: 0.3
148 148
 
  149
+
  150
+# This option determines whether pilot will attempt to bubble down memory for
  151
+# VMs. The Xen Best Practices wiki page at
  152
+# http://wiki.xensource.com/xenwiki/XenBestPractices recommends that you set a
  153
+# fixed amount of memory for dom0 because:
  154
+#
  155
+#   1. (dom0) Linux kernel calculates various network related parameters based
  156
+#      on the boot time amount of memory.
  157
+#   2. Linux needs memory to store the memory metadata (per page info structures), 
  158
+#      and this allocation is also based on the boot time amount of memory.
  159
+#
  160
+# Anything that is not 'yes' is taken as a no, and yes is the default
  161
+#bubble_mem: no
  162
+
149 163
 [systempaths]
150 164
 
151 165
 # This is only necessary if using SSH as a backup notification mechanism
@@ -2295,8 +2309,11 @@ def reserving(self, timeout=None):
2295 2309
         
2296 2310
         if not self.initialized:
2297 2311
             raise ProgrammingError("not initialized")
2298  
-            
2299  
-            
  2312
+
  2313
+        if not self.conf.bubble_mem:
  2314
+            log.debug("Memory bubbling disabled. No reservation neccessary.")
  2315
+            return
  2316
+
2300 2317
         memory = self.conf.memory
2301 2318
         if self.common.trace:
2302 2319
             log.debug("XenActions.reserving(), reserving %d MB" % memory)
@@ -2355,6 +2372,10 @@ def unreserving(self, timeout=None):
2355 2372
         if self.common.trace:
2356 2373
             log.debug("XenActions.unreserving(), unreserving %d MB" % memory)
2357 2374
 
  2375
+        if not self.conf.bubble_mem:
  2376
+            log.debug("Memory bubbling disabled. No unreservation neccessary.")
  2377
+            return
  2378
+
2358 2379
         # Be sure to unlock for every exit point.
2359 2380
         lockhandle = _get_lockhandle(self.conf.lockfile)
2360 2381
         _lock(lockhandle)
@@ -2450,7 +2471,7 @@ def unreserving(self, timeout=None):
2450 2471
                 raise UnexpectedError(errmsg)
2451 2472
 
2452 2473
         _unlock(lockhandle)
2453  
-        
  2474
+
2454 2475
         if raiseme:
2455 2476
             raise raiseme
2456 2477
 
@@ -3069,6 +3090,7 @@ def unreserving(self, timeout=None):
3069 3090
         else:
3070 3091
             log.info("XenKillNine unreserving, releasing %d MB" % memory)
3071 3092
 
  3093
+
3072 3094
         curmem = self.currentAllocation_MB()
3073 3095
         
3074 3096
         log.info("current memory MB = %d" % curmem)
@@ -3085,6 +3107,10 @@ def unreserving(self, timeout=None):
3085 3107
         killedVMs = self.killAll()
3086 3108
         if killedVMs:
3087 3109
             raiseme = KilledVMs(killedVMs)
  3110
+
  3111
+        if not self.conf.bubble_mem:
  3112
+            log.debug("Memory bubbling disabled. No return of memory neccessary.")
  3113
+            return
3088 3114
         
3089 3115
         if memory == XenActionsConf.BESTEFFORT:
3090 3116
             targetmem = freemem + curmem
@@ -3120,6 +3146,7 @@ def unreserving(self, timeout=None):
3120 3146
             else:
3121 3147
                 raise UnexpectedError(errmsg)
3122 3148
 
  3149
+
3123 3150
         if raiseme:
3124 3151
             raise raiseme
3125 3152
     
@@ -3839,7 +3866,7 @@ class XenActionsConf:
3839 3866
     
3840 3867
     BESTEFFORT = "BESTEFFORT"
3841 3868
 
3842  
-    def __init__(self, xmpath, xendpath, xmsudo, sudopath, memory, minmem, xend_secs, lockfile):
  3869
+    def __init__(self, xmpath, xendpath, xmsudo, sudopath, memory, minmem, xend_secs, lockfile, bubble_mem):
3843 3870
         """Set the configurations.
3844 3871
 
3845 3872
         Required parameters:
@@ -3861,6 +3888,8 @@ def __init__(self, xmpath, xendpath, xmsudo, sudopath, memory, minmem, xend_secs
3861 3888
         
3862 3889
         * xend_secs -- If xendpath is configured, amount of time to
3863 3890
         wait after a restart before checking if it booted.
  3891
+
  3892
+        * bubble_mem -- If set to False, pilot will not attempt memory bubbling
3864 3893
         
3865 3894
         Raise InvalidConfig if there is a problem with parameters.
3866 3895
 
@@ -3871,6 +3900,7 @@ def __init__(self, xmpath, xendpath, xmsudo, sudopath, memory, minmem, xend_secs
3871 3900
         self.sudopath = sudopath
3872 3901
         self.xendpath = xendpath
3873 3902
         self.lockfile = lockfile
  3903
+        self.bubble_mem = bubble_mem
3874 3904
         log.debug("Xenactions lockfile: %s" % lockfile)
3875 3905
 
3876 3906
         if memory == None:
@@ -4116,9 +4146,19 @@ def getXenActionsConf(opts, config):
4116 4146
         except:
4117 4147
             msg = "restart_xend_secs ('%s') is not a number" % xend_secs
4118 4148
             raise InvalidConfig(msg)
  4149
+
  4150
+    bubble_mem = True
  4151
+    try:
  4152
+        bubble_mem_val = config.get("xen", "bubble_mem")
  4153
+        if bubble_mem_val:
  4154
+            if bubble_mem_val.lower() == 'no':
  4155
+                bubble_mem = False
  4156
+    except Exception, e:
  4157
+        log.debug("No bubble_mem attribute set, assuming True ")
  4158
+    log.info("Bubbling set to false!")
4119 4159
             
4120 4160
     if not opts.killnine:
4121  
-        return XenActionsConf(xm, xend, xmsudo, sudo, opts.memory, minmem, xend_secs, lockfile)
  4161
+        return XenActionsConf(xm, xend, xmsudo, sudo, opts.memory, minmem, xend_secs, lockfile, bubble_mem)
4122 4162
     else:
4123 4163
         alt = "going to kill all guest VMs (if they exist) and give dom0 "
4124 4164
         alt += "their memory (which may or may not be the maximum available) "
@@ -4146,7 +4186,7 @@ def getXenActionsConf(opts, config):
4146 4186
                 log.info(msg + ", %s" % alt)
4147 4187
                 dom0mem = XenActionsConf.BESTEFFORT
4148 4188
                 
4149  
-        return XenActionsConf(xm, xend, xmsudo, sudo, dom0mem, minmem, xend_secs, lockfile)
  4189
+        return XenActionsConf(xm, xend, xmsudo, sudo, dom0mem, minmem, xend_secs, lockfile, bubble_mem)
4150 4190
 
4151 4191
 def getVWSNotificationsConf(opts, config):
4152 4192
     """Return populated VWSNotificationsConf object or raise InvalidConfig
23  service/service/java/source/etc/workspace-service/other/resource-locator-pilot.xml
@@ -6,7 +6,27 @@
6 6
                            http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
7 7
 
8 8
     <import resource="main.conflocator.xml" />
  9
+    <import resource="authz-callout-ACTIVE.xml" />
9 10
     
  11
+    <bean id="other.AuthzDataSource"
  12
+        class="org.apache.commons.dbcp.BasicDataSource">
  13
+        <property name="driverClassName" value="org.sqlite.JDBC" />
  14
+        <property name="maxActive" value="10" />
  15
+        <property name="maxIdle" value="4" />
  16
+        <property name="maxWait" value="2000" />
  17
+        <property name="poolPreparedStatements" value="true" />
  18
+
  19
+        <property name="url"
  20
+            value="jdbc:sqlite://$CUMULUS{cumulus.authz.db}" />
  21
+        <property name="username" value="nimbus"/>
  22
+        <property name="password" value="nimbus"/>
  23
+    </bean>
  24
+
  25
+
  26
+    <bean id="other.authzDBAdapter" class="org.nimbus.authz.AuthzDBAdapter">
  27
+        <constructor-arg ref="other.AuthzDataSource"/>
  28
+    </bean>
  29
+
10 30
     <bean id="nimbus-rm.scheduler.SlotManagement"
11 31
           class="org.globus.workspace.scheduler.defaults.pilot.PilotSlotManagement"
12 32
           init-method="validate">
@@ -100,6 +120,7 @@
100 120
         <property name="extraProperties" value="$PILOT{pbs.extra.properties}" />
101 121
         <property name="destination" value="$PILOT{pbs.destination}" />
102 122
         <property name="grace" value="$PILOT{pbs.grace}" />
  123
+        <property name="accounting" value="$PILOT{pbs.accounting.type}" />
103 124
         
104 125
 
105 126
         <!-- Needed workspace service modules -->
@@ -107,6 +128,8 @@
107 128
         <constructor-arg ref="nimbus-rm.loglevels" />
108 129
         <constructor-arg ref="other.MainDataSource" />
109 130
         <constructor-arg ref="other.timerManager" />
  131
+        <constructor-arg ref="other.authzDBAdapter" />
  132
+        <constructor-arg ref="nimbus-rm.service.binding.AuthorizationCallout" />
110 133
 
111 134
         <!-- set after object creation time to avoid circular dep with home -->
112 135
         <property name="instHome" ref="nimbus-rm.home.instance" />
30  service/service/java/source/etc/workspace-service/pilot.conf
@@ -45,7 +45,13 @@ contact.socket=1.2.3.4:41999
45 45
 #
46 46
 ################################################################################
47 47
 
48  
-# The path to the pilot program on the VMM nodes:
  48
+# The path to the pilot program on the VMM nodes.
  49
+#
  50
+# If you would like to use a configuration file, rather than the embedded
  51
+# configuration, add the -p and path to your configuration file to your path.
  52
+# For example:
  53
+#
  54
+# pilot.path=/opt/workspacepilot.py -p /etc/workspacepilot.conf
49 55
 
50 56
 pilot.path=/opt/workspacepilot.py
51 57
 
@@ -79,12 +85,14 @@ pbs.submit.path=qsub
79 85
 pbs.delete.path=qdel
80 86
 
81 87
 
82  
-# Processors per node, right now this should be set to be the maximum processors
83  
-# on each cluster node.  If it set too high, pilot job submissions will fail.
84  
-# If it is set too low, the pilot may end up not being the only LRM job on the
85  
-# node at a time and that is unpredictable/unsupported right now.
86  
-
87  
-pbs.ppn=2
  88
+# Processors per node. If this is set to 0, your pilot job will request
  89
+# as many processors as are requested for a VM. For example, if a user requests
  90
+# a 2 core VM, ppn will be set to 2.
  91
+#
  92
+# On some installations, you may wish to hardcode this to a specific value
  93
+# to ensure that each pilot job reserves a whole node for a VM. In this case, 
  94
+# choose a non-zero value.
  95
+pbs.ppn=0
88 96
 
89 97
 
90 98
 # If the pilot job should be submitted to a special queue/server, configure
@@ -110,6 +118,14 @@ pbs.grace=8
110 118
 pbs.extra.properties=
111 119
 
112 120
 
  121
+# Optional, if you would like to append an accounting string to your qsub
  122
+# invokation, you can use either the user's certificate DN, the user's display
  123
+# name as shown by nimbus-list-users, or the user's authz DB accounting group. 
  124
+#
  125
+# You can select these with 'dn', 'displayname', or 'group'
  126
+
  127
+pbs.accounting.type=
  128
+
113 129
 # Optional, if configured this is prepended to the pilot exe invocation if
114 130
 # nodes needed are greater than one.  Torque uses pbsdsh for this.
115 131
 
2  service/service/java/source/src/org/globus/workspace/cmdutils/TorqueUtil.java
@@ -91,7 +91,7 @@ public ArrayList constructQsub(String destination,
91 91
             throw new WorkspaceException(err);
92 92
         }
93 93
 
94  
-        if (ppn < 1) {
  94
+        if (ppn < 0) {
95 95
             final String err = "invalid processors per node " +
96 96
                     "request: " + Integer.toString(ppn);
97 97
             throw new WorkspaceException(err);
3  service/service/java/source/src/org/globus/workspace/creation/defaults/CreationManagerImpl.java
@@ -850,6 +850,7 @@ protected Reservation scheduleImpl(VirtualMachine vm,
850 850
         }
851 851
 
852 852
         final int memory = dep.getIndividualPhysicalMemory();
  853
+        final int cores = dep.getIndividualCPUCount();
853 854
         final int duration = dep.getMinDuration();
854 855
 
855 856
         // list of associations should be in the DB, perpetuation of
@@ -860,7 +861,7 @@ protected Reservation scheduleImpl(VirtualMachine vm,
860 861
             assocs = assocStr.split(",");
861 862
         }
862 863
 
863  
-        return this.scheduler.schedule(memory, duration, assocs, numNodes,
  864
+        return this.scheduler.schedule(memory, cores, duration, assocs, numNodes,
864 865
                                        groupid, coschedid, vm.isPreemptable(), callerID);
865 866
     }
866 867
 
15  service/service/java/source/src/org/globus/workspace/groupauthz/GroupAuthz.java
@@ -370,6 +370,21 @@ public Integer isRootPartitionUnpropTargetPermitted(URI target,
370 370
         throw new AuthorizationException(NO_POLICIES_MESSAGE);
371 371
     }
372 372
 
  373
+    public String getGroupName(String caller) {
  374
+
  375
+
  376
+        for (int i = 0; i < this.groups.length; i++) {
  377
+
  378
+            final GroupRights rights = getRights(caller, this.groups[i]);
  379
+            // only first inclusion of DN is considered
  380
+            if (rights != null) {
  381
+                return this.groups[i].getName();
  382
+            }
  383
+        }
  384
+
  385
+        return null;
  386
+    }
  387
+
373 388
 
374 389
     // -------------------------------------------------------------------------
375 390
     // FOR CLOUD AUTOCONFIG
2  service/service/java/source/src/org/globus/workspace/scheduler/Scheduler.java
@@ -38,6 +38,7 @@
38 38
      * @see #proceedCoschedule for handling separate requests together 
39 39
      *
40 40
      * @param memory MB needed
  41
+     * @param CPU cores needed
41 42
      * @param duration seconds needed
42 43
      * @param neededAssociations networks needed
43 44
      * @param numNodes number needed
@@ -49,6 +50,7 @@
49 50
      * @throws SchedulingException internal problem
50 51
      */
51 52
     public Reservation schedule(int memory,
  53
+                                int cores,
52 54
                                 int duration,
53 55
                                 String[] neededAssociations,
54 56
                                 int numNodes,
3  service/service/java/source/src/org/globus/workspace/scheduler/defaults/DefaultSchedulerAdapter.java
@@ -224,6 +224,7 @@ public long getSweeperDelay() {
224 224
     }
225 225
 
226 226
     public Reservation schedule(int memory,
  227
+                                int cores,
227 228
                                 int duration,
228 229
                                 String[] neededAssociations,
229 230
                                 int numNodes,
@@ -263,7 +264,7 @@ public Reservation schedule(int memory,
263 264
         this.creationPending.pending(ids);
264 265
 
265 266
         final NodeRequest req =
266  
-                new NodeRequest(ids, memory, duration, assocs, groupid, creatorDN);
  267
+                new NodeRequest(ids, memory, cores, duration, assocs, groupid, creatorDN);
267 268
 
268 269
         try {
269 270
 
15  service/service/java/source/src/org/globus/workspace/scheduler/defaults/NodeRequest.java
@@ -19,6 +19,7 @@
19 19
 public class NodeRequest {
20 20
 
21 21
     private int memory; // MBs
  22
+    private int cores;
22 23
     private int duration; // seconds
23 24
 
24 25
     private int[] ids = null;
@@ -41,12 +42,14 @@ public NodeRequest(int memory,
41 42
 
42 43
     public NodeRequest(int[] ids,
43 44
                        int memory,
  45
+                       int cores,
44 46
                        int duration,
45 47
                        String[] neededAssociations,
46 48
                        String groupid,
47 49
                        String creatorDN) {
48 50
         this(memory, duration);
49 51
 
  52
+        this.cores = cores;
50 53
         this.ids = ids;
51 54
         this.neededAssociations = neededAssociations;
52 55
         this.groupid = groupid;
@@ -80,6 +83,18 @@ public int getNumNodes() {
80 83
         return this.ids.length;
81 84
     }
82 85
 
  86
+    public int getCores() {
  87
+        // Java sets ints to 0 if they're never initialized
  88
+        if (this.cores == 0) {
  89
+            return 1;
  90
+        }
  91
+        return this.cores;
  92
+    }
  93
+
  94
+    public void setCores(int cores) {
  95
+        this.cores = cores;
  96
+    }
  97
+
83 98
     public int getMemory() {
84 99
         return this.memory;
85 100
     }
125  service/service/java/source/src/org/globus/workspace/scheduler/defaults/pilot/PilotSlotManagement.java
@@ -20,12 +20,16 @@
20 20
 import edu.emory.mathcs.backport.java.util.concurrent.ExecutorService;
21 21
 import org.apache.commons.logging.Log;
22 22
 import org.apache.commons.logging.LogFactory;
  23
+import org.globus.workspace.groupauthz.GroupAuthz;
23 24
 import org.globus.workspace.scheduler.NodeExistsException;
24 25
 import org.globus.workspace.scheduler.NodeInUseException;
25 26
 import org.globus.workspace.scheduler.NodeManagement;
26 27
 import org.globus.workspace.scheduler.NodeManagementDisabled;
27 28
 import org.globus.workspace.scheduler.NodeNotFoundException;
28 29
 import org.globus.workspace.scheduler.defaults.ResourcepoolEntry;
  30
+import org.globus.workspace.service.binding.authorization.CreationAuthorizationCallout;
  31
+import org.nimbus.authz.AuthzDBAdapter;
  32
+import org.nimbus.authz.UserAlias;
29 33
 import org.nimbustools.api.services.rm.DoesNotExistException;
30 34
 import org.nimbustools.api.services.rm.ResourceRequestDeniedException;
31 35
 import org.nimbustools.api.services.rm.ManageException;
@@ -118,6 +122,8 @@
118 122
 
119 123
     private TorqueUtil torque;
120 124
 
  125
+    private AuthzDBAdapter authzDBAdapter;
  126
+    private CreationAuthorizationCallout authzCallout;
121 127
 
122 128
     // set from config
123 129
     private String contactPort;
@@ -138,6 +144,7 @@
138 144
     private String destination = null; // only one for now
139 145
     private String extraProperties = null;
140 146
     private String multiJobPrefix = null;
  147
+    private String accounting;
141 148
 
142 149
     // -------------------------------------------------------------------------
143 150
     // CONSTRUCTOR
@@ -146,7 +153,9 @@
146 153
     public PilotSlotManagement(WorkspaceHome home,
147 154
                                Lager lager,
148 155
                                DataSource dataSource,
149  
-                               TimerManager timerManager) {
  156
+                               TimerManager timerManager,
  157
+                               AuthzDBAdapter authz,
  158
+                               CreationAuthorizationCallout authzCall) {
150 159
 
151 160
         if (home == null) {
152 161
             throw new IllegalArgumentException("home may not be null");
@@ -168,6 +177,9 @@ public PilotSlotManagement(WorkspaceHome home,
168 177
             throw new IllegalArgumentException("lager may not be null");
169 178
         }
170 179
         this.lager = lager;
  180
+
  181
+        this.authzDBAdapter = authz;
  182
+        this.authzCallout = authzCall;
171 183
     }
172 184
 
173 185
 
@@ -268,6 +280,20 @@ public void setLogdirResource(Resource logdirResource) throws IOException {
268 280
         this.logdirPath = logdirResource.getFile().getAbsolutePath();
269 281
     }
270 282
 
  283
+    public AuthzDBAdapter getAuthzDBAdapter() {
  284
+        return authzDBAdapter;
  285
+    }
  286
+
  287
+    public void setAuthzDBAdapter(AuthzDBAdapter authzDBAdapter) {
  288
+        this.authzDBAdapter = authzDBAdapter;
  289
+    }
  290
+
  291
+    public void setAccounting(String accounting) {
  292
+        if (accounting != null && accounting.trim().length() != 0) {
  293
+            this.accounting = accounting;
  294
+        }
  295
+    }
  296
+
271 297
     // -------------------------------------------------------------------------
272 298
     // IoC INIT METHOD
273 299
     // -------------------------------------------------------------------------
@@ -369,8 +395,8 @@ public synchronized void validate() throws Exception {
369 395
                     "Is the configuration present?");
370 396
         }
371 397
 
372  
-        if (this.ppn < 1) {
373  
-            throw new Exception("processors per node (ppn) is less than one, " +
  398
+        if (this.ppn < 0) {
  399
+            throw new Exception("processors per node (ppn) is less than zero, " +
374 400
                     "invalid.  Is the configuration present?");
375 401
         }
376 402
 
@@ -492,6 +518,7 @@ public Reservation reserveSpace(NodeRequest request, boolean preemptable)
492 518
 
493 519
         this.reserveSpace(request.getIds(),
494 520
                           request.getMemory(),
  521
+                          request.getCores(),
495 522
                           request.getDuration(),
496 523
                           request.getGroupid(),
497 524
                           request.getCreatorDN());
@@ -520,6 +547,7 @@ public Reservation reserveCoscheduledSpace(NodeRequest[] requests,
520 547
         // capacity vs. mapping and we will get more sophisticated here later)
521 548
 
522 549
         int highestMemory = 0;
  550
+        int highestCores = 0;
523 551
         int highestDuration = 0;
524 552
 
525 553
         final ArrayList idInts = new ArrayList(64);
@@ -533,6 +561,12 @@ public Reservation reserveCoscheduledSpace(NodeRequest[] requests,
533 561
                 highestMemory = thisMemory;
534 562
             }
535 563
 
  564
+            final int thisCores = requests[i].getCores();
  565
+
  566
+            if (highestCores < thisCores) {
  567
+                highestCores = thisCores;
  568
+            }
  569
+
536 570
             final int thisDuration = requests[i].getDuration();
537 571
 
538 572
             if (highestDuration < thisDuration) {
@@ -563,7 +597,7 @@ public Reservation reserveCoscheduledSpace(NodeRequest[] requests,
563 597
         // Assume that the creator's DN is the same for each node
564 598
         final String creatorDN = requests[0].getCreatorDN();
565 599
 
566  
-        this.reserveSpace(all_ids, highestMemory, highestDuration, coschedid, creatorDN);
  600
+        this.reserveSpace(all_ids, highestMemory, highestCores, highestDuration, coschedid, creatorDN);
567 601
         return new Reservation(all_ids, null, all_durations);
568 602
     }
569 603
 
@@ -579,6 +613,7 @@ public Reservation reserveCoscheduledSpace(NodeRequest[] requests,
579 613
      *        than one VM is mapped to the same node, the returned node
580 614
      *        assignment array will include duplicates.
581 615
      * @param memory megabytes needed
  616
+     * @param requestedCores needed
582 617
      * @param duration seconds needed
583 618
      * @param uuid group ID, can not be null if vmids is length > 1
584 619
      * @param creatorDN the DN of the user who requested creation of the VM
@@ -587,6 +622,7 @@ public Reservation reserveCoscheduledSpace(NodeRequest[] requests,
587 622
      */
588 623
     private void reserveSpace(final int[] vmids,
589 624
                               final int memory,
  625
+                              final int requestedCores,
590 626
                               final int duration,
591 627
                               final String uuid,
592 628
                               final String creatorDN)
@@ -604,6 +640,16 @@ private void reserveSpace(final int[] vmids,
604 640
             throw new ResourceRequestDeniedException(msg);
605 641
         }
606 642
 
  643
+        // When there is no core request, the default is -1,
  644
+        // we would actually like one core.
  645
+        int cores;
  646
+        if (requestedCores <= 0) {
  647
+            cores = 1;
  648
+        }
  649
+        else {
  650
+            cores = requestedCores;
  651
+        }
  652
+
607 653
         if (vmids.length > 1 && uuid == null) {
608 654
             logger.error("cannot make group space request without group ID");
609 655
             throw new ResourceRequestDeniedException("internal " +
@@ -628,13 +674,14 @@ private void reserveSpace(final int[] vmids,
628 674
             }
629 675
         }
630 676
 
631  
-        this.reserveSpaceImpl(memory, duration, slotid, vmids, creatorDN);
  677
+        this.reserveSpaceImpl(memory, cores, duration, slotid, vmids, creatorDN);
632 678
 
633 679
         // pilot reports hostname when it starts running, not returning an
634 680
         // exception to signal successful best effort pending slot
635 681
     }
636 682
 
637 683
     private void reserveSpaceImpl(final int memory,
  684
+                                  final int cores,
638 685
                                   final int duration,
639 686
                                   final String uuid,
640 687
                                   final int[] vmids,
@@ -646,20 +693,34 @@ private void reserveSpaceImpl(final int memory,
646 693
         final int dur = duration + this.padding;
647 694
         final long wallTime = duration + this.padding;
648 695
 
  696
+
  697
+        // If the pbs.ppn option in pilot.conf is 0, we should send
  698
+        // the number of CPU cores used by the VM as the ppn string,
  699
+        // otherwise, use the defined ppn value
  700
+        int ppnRequested;
  701
+        if (this.ppn == 0) {
  702
+            ppnRequested = cores;
  703
+        }
  704
+        else {
  705
+            ppnRequested = this.ppn;
  706
+        }
  707
+
  708
+        String account = getAccountString(creatorDN, this.accounting);
  709
+
649 710
         // we know it's torque for now, no casing
650 711
         final ArrayList torquecmd;
651 712
         try {
652 713
             torquecmd = this.torque.constructQsub(this.destination,
653 714
                                                   memory,
654 715
                                                   vmids.length,
655  
-                                                  this.ppn,
  716
+                                                  ppnRequested,
656 717
                                                   wallTime,
657 718
                                                   this.extraProperties,
658 719
                                                   outputFile,
659 720
                                                   false,
660 721
                                                   false,
661  
-                                                  creatorDN);
662  
-            
  722
+                                                  account);
  723
+
663 724
         } catch (WorkspaceException e) {
664 725
             final String msg = "Problem with Torque argument construction";
665 726
             if (logger.isDebugEnabled()) {
@@ -1670,4 +1731,52 @@ public boolean removeNode(String hostname)
1670 1731
     public String getVMMReport() {
1671 1732
         return "No VMM report when pilot is configured.";
1672 1733
     }
  1734
+
  1735
+    public String getAccountString(String userDN, String accountingType) {
  1736
+
  1737
+        String accountString = null;
  1738
+        if (accountingType == null) {
  1739
+            accountString = null;
  1740
+        }
  1741
+        else if (accountingType.equalsIgnoreCase("dn")) {
  1742
+
  1743
+            accountString = userDN;
  1744
+        }
  1745
+        else if (accountingType.equalsIgnoreCase("displayname")) {
  1746
+
  1747
+            try {
  1748
+                String userID = authzDBAdapter.getCanonicalUserIdFromDn(userDN);
  1749
+                final List<UserAlias> aliasList = authzDBAdapter.getUserAliases(userID);
  1750
+                for (UserAlias alias : aliasList) {
  1751
+                    if (alias.getAliasType() == AuthzDBAdapter.ALIAS_TYPE_DN) {
  1752
+
  1753
+                        accountString = alias.getFriendlyName();
  1754
+                    }
  1755
+                }
  1756
+                logger.error("Can't find display name for '" + userDN + "'. "
  1757
+                             + "No accounting string will be sent to PBS.");
  1758
+            }
  1759
+            catch (Exception e) {
  1760
+                logger.error("Can't connect to authzdb db. No accounting string will be sent to PBS.");
  1761
+            }
  1762
+        }
  1763
+        else if (accountingType.equalsIgnoreCase("group")) {
  1764
+
  1765
+            try {
  1766
+                GroupAuthz groupAuthz = (GroupAuthz)this.authzCallout;
  1767
+                accountString = groupAuthz.getGroupName(userDN);
  1768
+            }
  1769
+            catch (Exception e) {
  1770
+                logger.error("Problem getting group string. Are you sure you're using Group or SQL authz?");
  1771
+                logger.debug("full error: " + e);
  1772
+            }
  1773
+        }
  1774
+        else {
  1775
+
  1776
+            logger.error("'" + accountingType + "' isn't a valid accounting string type. "
  1777
+                         + "No accounting string will be sent to PBS.");
  1778
+        }
  1779
+
  1780
+        return accountString;
  1781
+    }
1673 1782
 }
22  service/service/java/tests/suites/basic/home/services/etc/nimbus/workspace-service/pilot.conf
@@ -45,7 +45,13 @@ contact.socket=1.2.3.4:41999
45 45
 #
46 46
 ################################################################################
47 47
 
48  
-# The path to the pilot program on the VMM nodes:
  48
+# The path to the pilot program on the VMM nodes.
  49
+#
  50
+# If you would like to use a configuration file, rather than the embedded
  51
+# configuration, add the -p and path to your configuration file to your path.
  52
+# For example:
  53
+#
  54
+# pilot.path=/opt/workspacepilot.py -p /etc/workspacepilot.conf
49 55
 
50 56
 pilot.path=/opt/workspacepilot.py
51 57
 
@@ -79,12 +85,14 @@ pbs.submit.path=qsub
79 85
 pbs.delete.path=qdel
80 86
 
81 87
 
82  
-# Processors per node, right now this should be set to be the maximum processors
83  
-# on each cluster node.  If it set too high, pilot job submissions will fail.
84  
-# If it is set too low, the pilot may end up not being the only LRM job on the
85  
-# node at a time and that is unpredictable/unsupported right now.
86  
-
87  
-pbs.ppn=2
  88
+# Processors per node. If this is set to 0, your pilot job will request
  89
+# as many processors as are requested for a VM. For example, if a user requests
  90
+# a 2 core VM, ppn will be set to 2.
  91
+#
  92
+# On some installations, you may wish to hardcode this to a specific value
  93
+# to ensure that each pilot job reserves a whole node for a VM. In this case, 
  94
+# choose a non-zero value.
  95
+pbs.ppn=0
88 96
 
89 97
 
90 98
 # If the pilot job should be submitted to a special queue/server, configure
22  service/service/java/tests/suites/spotinstances/home/services/etc/nimbus/workspace-service/pilot.conf
@@ -45,7 +45,13 @@ contact.socket=1.2.3.4:41999
45 45
 #
46 46
 ################################################################################
47 47
 
48  
-# The path to the pilot program on the VMM nodes:
  48
+# The path to the pilot program on the VMM nodes.
  49
+#
  50
+# If you would like to use a configuration file, rather than the embedded
  51
+# configuration, add the -p and path to your configuration file to your path.
  52
+# For example:
  53
+#
  54
+# pilot.path=/opt/workspacepilot.py -p /etc/workspacepilot.conf
49 55
 
50 56
 pilot.path=/opt/workspacepilot.py
51 57
 
@@ -79,12 +85,14 @@ pbs.submit.path=qsub
79 85
 pbs.delete.path=qdel
80 86
 
81 87
 
82  
-# Processors per node, right now this should be set to be the maximum processors
83  
-# on each cluster node.  If it set too high, pilot job submissions will fail.
84  
-# If it is set too low, the pilot may end up not being the only LRM job on the
85  
-# node at a time and that is unpredictable/unsupported right now.
86  
-
87  
-pbs.ppn=2
  88
+# Processors per node. If this is set to 0, your pilot job will request
  89
+# as many processors as are requested for a VM. For example, if a user requests
  90
+# a 2 core VM, ppn will be set to 2.
  91
+#
  92
+# On some installations, you may wish to hardcode this to a specific value
  93
+# to ensure that each pilot job reserves a whole node for a VM. In this case, 
  94
+# choose a non-zero value.
  95
+pbs.ppn=0
88 96
 
89 97
 
90 98
 # If the pilot job should be submitted to a special queue/server, configure
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.