Added functionality for command-line inputs #1

anniegbryant · 2022-02-28T02:18:23Z

Suggesting the following changes based on preliminary cluster usage:

implementation of command-line inputs for distribute_jobs.py to allow user to customise data location, sample YAML file, walltime, overwriting, and pbs notifications
pyspi_distribute.py now automatically generates a PBS script for each job based on user inputs
flexibility for user-supplied config.yaml to define a subset of SPIs to examine
creation of a script (create_yaml_for_samples.R) that automatically generates sample YAML file if the user wishes
updated readme to reflect these changes

- added trailing slash to data directory path in create_yaml_for_samples.R if not already included - removed extra paren from argparse argument in distribute_jobs.py

Added command-line option for number of CPUs and memory requested per job; transposed numpy array loaded from .npy array before initializing Data() object

anniegbryant · 2022-03-05T22:54:48Z

Hi Oliver, in my most recent commit I fixed an issue in distribute_jobs.py wherein the MTS data was transposed after initializing the Data() object. I worked around this issue by first loading in the binary npy file and manually transposing the matrix before initializing the Data() object (see line 82-83), but I suspect that something within pyspi's Data() object is flipping the rows and columns of the input data matrix.

I made a couple of other minor modifications, including allowing the user to specify the number of CPUs and GB of memory they would like to request per job.

olivercliff · 2022-03-09T12:42:17Z

Thanks, Annie, these changes look great! I left most of your modifications in but I slightly changed your approach to writing the PBS file by using the string Template class and a template PBS file. (I thought it was a little cleaner than printing one line at a time.)

I've also removed the transposition of the data, for now. The Data object takes in the numpy array with each dimension specified by the dim_order argument (e.g., dim_order='sp' means the observations are the first dimension, and the processes are the second dimension). After taking in the data, it always converts the data to the format: processes x observations, which is what you're seeing. You can verify that the data object is reading in the correct way by checking the properties data.n_processes and data.n_observations.

anniegbryant added 3 commits February 28, 2022 13:12

Updated readme

87e688e

fixed syntax errors

2b62be3

- added trailing slash to data directory path in create_yaml_for_samples.R if not already included - removed extra paren from argparse argument in distribute_jobs.py

Update distribute_jobs.py

5ba4904

Added command-line option for number of CPUs and memory requested per job; transposed numpy array loaded from .npy array before initializing Data() object

olivercliff merged commit 5ba4904 into olivercliff:main Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added functionality for command-line inputs #1

Added functionality for command-line inputs #1

anniegbryant commented Feb 28, 2022

anniegbryant commented Mar 5, 2022 •

edited

olivercliff commented Mar 9, 2022

Added functionality for command-line inputs #1

Added functionality for command-line inputs #1

Conversation

anniegbryant commented Feb 28, 2022

anniegbryant commented Mar 5, 2022 • edited

olivercliff commented Mar 9, 2022

anniegbryant commented Mar 5, 2022 •

edited