Skip to content

Collab_W2019_Remote

Max-Home-Tower edited this page Jul 1, 2020 · 1 revision

Setting up remote jobs

(WIP-ish as of 2019-12-31)

FAQ

What are some common issues with the KUMC Isilon (remote job cluster computer for CPL)?

  • As of 2019-12-31: your local machine must run Matlab version R2017a so that it can interact with the version installed on the cluster machines. If you have a local version that is not R2017a then you can't queue jobs to the Isilon from your machine.
  • Check that your computer is properly mapped to the remote. On a Windows machine, you should be mapped to P:/Processed_Data and R:/Recorded_Data. If you run a Unix-based OS, then I can't really help you- sorry!
  • Sometimes the license manager goes down (for example, when Windows Updates are installed and the machine restarts). In your Matlab Editor, go to the Home tab, then click Parallel (above Environment). Click Manage Cluster Profiles, then click the Green Validate check button when you have selected the server you want to look at. If you can't communicate with or generate a job to send to the server, then the license manager is probably down:
    • Open the Windows Start menu and open Remote Desktop Connection (can just type it in and hit enter).
    • For Computer: enter mltest01 and click Connect
    • Enter your username and network credentials when prompted.
    • Open explorer.exe and navigate to C:\Program Files\MATLAB\R2017a\etc\win64
    • Double-click lmtools.exe (click Yes to allow it to make changes)
    • Go to the Start/Stop/Reread tab and highlight Matlab License Server in the listbox.
    • (Not sure if this is necessary, but I always first click the checkbox next to Force Server Shutdown, then click Stop Server; however, it should already be stopped given you are at this debugging step)
    • Wait a minute or two for it to stop if you did the previous step, then click Start Server. You can exit the Remote Desktop Connection now.
    • IMPORTANT: I've found that usually I have to wait ~30 minutes for the full license-manager server to get up and running. There have been some times where I got impatient and tried to queue up jobs after clicking Start Server and that has never worked, so I think it messes it up somehow if you try to send it a job in the middle of its startup (I don't know why this would be, just anecdotal and could be totally off-base).

What are the requirements for nigeLab to be able to run on a remote repository?

  • The primary thing is that the remote must be able to "see" the data files "as is". nigeLab does not "package" the data in any way to send it to the remote; at our lab, there is a slight difference in naming convention between the remote and mapping on the local machine, but otherwise both are Windows machines that have access to the data on network-attached If your network is configured similarly, you may be able to run the remote with minor configuration changes:
    • First, clone nigeLab to a repository on the remote machine that runs your remote workers.
    • Next, in +nigeLab/+defaults/Queue.m, change pars.RemoteRepoPath to reflect the path to nigeLab that a worker would "see" on the remote machine. If you have multiple remotes with different naming conventions, then you can put each path as a different cell array element.
    • If you have configured your own Matlab Job Server on a remote repository, then in +nigeLab/+defaults/Queue.m, you will need to update pars.ClusterList so that each cell array element corresponds to the name of one of your MJS names. I am guessing nobody will ever do this part.
Clone this wiki locally