You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, Need to Train OPT-125M using our own Dataset.
Getting FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' Error During training.
What is your question?
Hello, Need to Train OPT-125M using our own Dataset.
Getting FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' Error During training.
"Traceback (most recent call last):
File "metaseq/launcher/opt_baselines.py", line 342, in
cli_main()
File "metaseq/launcher/opt_baselines.py", line 337, in cli_main
get_grid, postprocess_hyperparams, add_extra_options_func=add_extra_options_func
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/sweep.py", line 378, in main
backend_main(get_grid, postprocess_hyperparams, args)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 41, in main
launch_train(args, grid, grid_product, dry_run, postprocess_hyperparams)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 465, in launch_train
job_id, stdout = run_batch(env, sbatch_cmd_str, sbatch_cmd)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 350, in run_batch
with subprocess.Popen(sbatch_cmd, stdout=subprocess.PIPE, env=env) as train_proc:
File "/usr/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch'"
What's your environment?
metaseq Version (e.g., 1.0 or master): Forked Repo
PyTorch Version (e.g., 1.0) : 1.11.0
OS (e.g., Linux): Linux
How you installed metaseq (pip, source): source
Build command you used (if compiling from source):
Python version: 3.7
CUDA/cuDNN version: cuda_11.1.TC455_06.29190527_0
GPU models and configuration: 1 Tesla p100
Any other relevant information:
The text was updated successfully, but these errors were encountered:
Our launcher assumes we're running on a slurm cluster. If you look at the stuff inside the --wrap argument launched, that's the actual training command (removing the srun prefix)
❓ Questions and Help
Hello, Need to Train OPT-125M using our own Dataset.
Getting FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' Error During training.
What is your question?
Hello, Need to Train OPT-125M using our own Dataset.
Getting FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' Error During training.
where is the sbatch file located ?
Code
! python metaseq/launcher/opt_baselines.py \ -n 1 -g 2 \ -p test_v0 \ --model-size 125m \ --azure \ --data /content/test_data \ --checkpoints-dir "/content/ck_point" \
What have you tried?
Tried to Train the OPT-125M on Google Colab. following the train.md provided in the repo. (https://github.com/facebookresearch/metaseq/blob/main/docs/training.md)
Facing Below Error
"Traceback (most recent call last):
File "metaseq/launcher/opt_baselines.py", line 342, in
cli_main()
File "metaseq/launcher/opt_baselines.py", line 337, in cli_main
get_grid, postprocess_hyperparams, add_extra_options_func=add_extra_options_func
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/sweep.py", line 378, in main
backend_main(get_grid, postprocess_hyperparams, args)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 41, in main
launch_train(args, grid, grid_product, dry_run, postprocess_hyperparams)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 465, in launch_train
job_id, stdout = run_batch(env, sbatch_cmd_str, sbatch_cmd)
File "/usr/local/lib/python3.7/dist-packages/metaseq-0.0.1-py3.7-linux-x86_64.egg/metaseq/launcher/slurm.py", line 350, in run_batch
with subprocess.Popen(sbatch_cmd, stdout=subprocess.PIPE, env=env) as train_proc:
File "/usr/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch'"
What's your environment?
pip
, source): sourceThe text was updated successfully, but these errors were encountered: