Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated documentation for running the template generation pipeline in Alliance Canada #79

Merged
merged 2 commits into from
Oct 4, 2023

Conversation

rohanbanerjee
Copy link
Contributor

@rohanbanerjee rohanbanerjee commented Sep 25, 2023

Fixes #76, fixes #77

This README.md updates some missing steps for setting up and running the generate_template.py on Alliance Canada. Some specific details like setting up the nist_mni_pipelines inside the template folder, creation of virtual environment etc were missing.

To test this PR,

  1. Copy the data from duke/temp/rohan/bids_data to the Alliance Canada cluster following step 2.a.
  2. Follow the next steps in the step 2 to run the template generation pipeline.

Expected output

The pipeline would create a folder model_n_all inside the template folder. For each and every iteration of the template generation process, a folder would be created inside the model_n_all folder.

```
sbatch --time=24:00:00 --mem-per-cpu 4000 my_job.sh # will probably require batching several times, depending on number of subjects
sbatch --time=24:00:00 --mem-per-cpu 4000 template_pipeline.sh # will probably require batching several times, depending on number of subjects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "batching several times" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was written by Nadia. What I understood by that is that, the pipeline stops running in between and has to be re-run a few times. But this does add some confusion. Do you think I should remove it? It is implicitly understood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pipeline stops running in between

and why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because of the out-of-memory issue. When I had discussed about this with Benjamin in January, he also mentioned -- I think it is because the resources from the core become unavailable.
In my case, when I ran it on CC, stopped the first time with the following issue:
slurmstepd: error: Detected 7 oom-kill event(s) in StepId=41187759.batch. Some of your processes may have been killed by the cgroup out-of-memory handler. and when I re-ran it, it has been running smoothly without any issues since yesterday evening. I just opened an issue on their repo to gain insights from the maintainers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok— so add this info in the comment with a link to the issue for more clarity

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rohanbanerjee rohanbanerjee merged commit 6379cb1 into master Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants