You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you HuggingFace team for all you do! This summer I have been working from this notebook when I noticed a gap I will discuss below. PS - this is my first GitHub issue, if you have any feedback.
Description (same as 52)
The hyperparameters sent by the client have an underscore in them (e.g. output_data_dir), whereas those received by the argparser have a hyphen (e.g. output-data-dir). Therefore, values do not get propagated through the train.py file.
Why another issue?
There have been recent commits resolving the above-linked issue in most of the notebooks. However I noticed the commit to fix this issue for Pytorch missed the second half of the typos (commit fixed train-batch-size and eval-batch-size, but still need to fix output-data-dir and model-dir).
parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
with these
Hi @philschmid! Got a tip from my colleague who filed the issue #52 that tagging you here would be best for Sagemaker issues. Please see my note above about a gap a recent commit leaving a remaining issue behind. Thanks!
Hello!
Reopening an issue connected to this thread here:
Thank you HuggingFace team for all you do! This summer I have been working from this notebook when I noticed a gap I will discuss below. PS - this is my first GitHub issue, if you have any feedback.
Description (same as 52)
The hyperparameters sent by the client have an underscore in them (e.g. output_data_dir), whereas those received by the argparser have a hyphen (e.g. output-data-dir). Therefore, values do not get propagated through the train.py file.Why another issue?
There have been recent commits resolving the above-linked issue in most of the notebooks. However I noticed the commit to fix this issue for Pytorch missed the second half of the typos (commit fixed train-batch-size and eval-batch-size, but still need to fix output-data-dir and model-dir).Relevant Commits
Files
I have tested the solution on these filesnotebooks/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb
notebooks/sagemaker/01_getting_started_pytorch/scripts/train.py
Solution
In the train.py file, swap these lines -parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
with these
parser.add_argument("--output_data_dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model_dir", type=str, default=os.environ["SM_MODEL_DIR"])
The text was updated successfully, but these errors were encountered: