Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added functionality for command-line inputs #1

Merged
merged 3 commits into from Mar 9, 2022

Conversation

anniegbryant
Copy link
Contributor

Suggesting the following changes based on preliminary cluster usage:

  • implementation of command-line inputs for distribute_jobs.py to allow user to customise data location, sample YAML file, walltime, overwriting, and pbs notifications
  • pyspi_distribute.py now automatically generates a PBS script for each job based on user inputs
  • flexibility for user-supplied config.yaml to define a subset of SPIs to examine
  • creation of a script (create_yaml_for_samples.R) that automatically generates sample YAML file if the user wishes
  • updated readme to reflect these changes

- added trailing slash to data directory path in create_yaml_for_samples.R if not already included
- removed extra paren from argparse argument in distribute_jobs.py
Added command-line option for number of CPUs and memory requested per job; transposed numpy array loaded from .npy array before initializing Data() object
@anniegbryant
Copy link
Contributor Author

anniegbryant commented Mar 5, 2022

Hi Oliver, in my most recent commit I fixed an issue in distribute_jobs.py wherein the MTS data was transposed after initializing the Data() object. I worked around this issue by first loading in the binary npy file and manually transposing the matrix before initializing the Data() object (see line 82-83), but I suspect that something within pyspi's Data() object is flipping the rows and columns of the input data matrix.

I made a couple of other minor modifications, including allowing the user to specify the number of CPUs and GB of memory they would like to request per job.

@olivercliff olivercliff merged commit 5ba4904 into olivercliff:main Mar 9, 2022
@olivercliff
Copy link
Owner

Thanks, Annie, these changes look great! I left most of your modifications in but I slightly changed your approach to writing the PBS file by using the string Template class and a template PBS file. (I thought it was a little cleaner than printing one line at a time.)

I've also removed the transposition of the data, for now. The Data object takes in the numpy array with each dimension specified by the dim_order argument (e.g., dim_order='sp' means the observations are the first dimension, and the processes are the second dimension). After taking in the data, it always converts the data to the format: processes x observations, which is what you're seeing. You can verify that the data object is reading in the correct way by checking the properties data.n_processes and data.n_observations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants