## Parallel Computing with cluster-helper

<img src="https://computing.llnl.gov/tutorials/parallel_comp/images/nodesNetwork.gif">

A node is a like a computer within a much bigger computer!

**Install ipython-cluster-helper**:

* Clone repo from https://github.com/roryk/ipython-cluster-helper to your home directory
* Activate your CML Anaconda environment (e.g. <code>source activate environmentname</code>)
* Navigate to the ipython-cluster-helper directory and type <code>python setup.py install</code>

(For some reason, this can take a while, so it may be advisable to have students do it at the end of the prior day's session)

In [None]:
import cluster_helper.cluster

def squared(x):
    return x**2

#You can essentially replicate this syntax for every time you use it:
with cluster_helper.cluster.cluster_view(scheduler="sge", queue="RAM.q", num_jobs=10, cores_per_job=1) as view:
    
    #'map' applies a function to each value within an interable
    res = view.map(squared, range(0, 10))

## cluster-helper tips

* cluster-helper will act as if a fresh python notebook were started for each job, so it will not inherit your workspace's variables or import statements. **Give each job everything it needs to complete!**
* Jobs are still subject to memory limitations, so you may need to **break up large processes into smaller chunks.** For example, each job could correspond to analyzing one session, instead of one subject. 
* The cluster-helper memory parameter does not work. Supposedly, this will be fixed eventually. If absolutely necessary, you may specify more cores per job to functionally increase the allotted memory. But mind your total core count!
* It is often useful to save the output of each job in a dedicated directory, and sometimes useful to save intermediate values to aid in debugging or later nonparallel analyses. The Python "os" library can be helpful here. 
* **Be respectful!** There are only so many cores available to the entire Kahana lab and our collaborators across the country. 
* **Limit typical jobs to 100 cores or less**. Heavy usage means fewer resources for other users to use, and due to shared disk resources might actually slow down all jobs overall. Please ask for permission before using more.
* You can always use the '**qdel**' command in Terminal, followed by your job number, to kill any of your old jobs that may be wasting rhino's resources. 
* Use the '**qstat**' command in Terminal to see cluster usage information.
* Each rhino2 node has ~128 GB of memory and ~40 cores. 


### Cluster Helper usage

See the separate ClusterHelper.ipynb file in this repository for example usage of cluster-helper.  As long as you follow the polite usage etiquette described above, you should feel free to customize this to your own needs.  It is helpful to follow the general principles shown there, such as saving computational results for each job directly to disk and logging exceptions.

In [1]:
#Skeleton of a parallel function
subject_vars = []

#Construct your function such that it takes one input variable which contains whatever info you need within it (i.e. a list or array)
#Tip: Default arguments can often be an easy way to change the behavior of a parallel function. 
def tf_analysis(subject_vars, reref='bipolar'):
    
    #Sometimes useful to put entire function in a try...except block in case a subject/session breaks
    try:
    
        sub = subject_vars[0]
        exp = subject_vars[1]

        ###YOUR CODE GOES HERE###

        #Advisable to save outputs instead of relying on the outputs of view.map
        np.save('myoutput.npy', output_data)
    
    except:
        return
    
    return

with cluster_helper.cluster.cluster_view(scheduler="sge", queue="RAM.q", num_jobs=10, cores_per_job=1) as view:
    view.map(tf_analysis, subject_vars)

**Exercise: Write a parallel function that returns the number of (bipolar) electrodes for every subject in the RAM dataset. Run with 5 jobs and 1 core per job.**

## Wrap Up/Q&A

#### What did we learn over the past 2 weeks?

* Loading experimental info & EEG
* Spectral decomposition & time-frequency analysis
* Statistics: T-tests, multiple comparisons, permutations
* Phase-based functional connectivity
* Machine learning: linear, logisitic regression, feature selection

#### What did we NOT learn?

* Cognitive modeling
* Complex behavioral analyses
* Single-unit activity
* Spatial memory tasks
* Brain stimulation