GenSen on AML deep dive notebook (sentence similarity) #78
Conversation
Check out this pull request on ReviewNB: https://app.reviewnb.com/microsoft/nlp/pull/78 Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows. |
This is really good. Several questions:
At a bare minimum, I would split the notebook into two, one for training and one for hyperparameter tuning. In reco we are doing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to explicitly call out why users should use this and how azureml helps. Does hyperdrive improve the accuracy? we spin up the gpu compute for you and it would be very difficult to run without...etc
@miguelgfierro The followings are the replies:
Thanks! |
Great work!
|
For multi-gpu support, see here: https://github.com/microsoft/nlp/blob/bleik/utils_nlp/pytorch/device_utils.py |
@saidbleik The structure you mentioned: That's exactly what we are planning to do. @AbhiramE is working on gensen_deep_dive.ipynb which is training on VM without AML. He will raise a separate PR once this PR is merged. We are currently doing the experiments on the performance for AML v.s. VM. Once the evaluations are done, we will put the results in README file. I think maybe it's better to put all the sessions (preprocessing, training, tuning, evaluation) in one gensen_deep_dive_aml.ipynb notebook because our purpose is to show the whole end-to-end pipeline and this way can also avoid code replications on setting AML configurations. For train.py, do you recommend to let me put it in the same folder as the notebook? @saidbleik |
@saidbleik However, we do not have multi gpus permission to train the model for now. We can only use Standard_NC6 with single gpu. |
This is fine, but it can be less verbose. For example, you don't need to describe Gensen in the AML version. You can link to the base notebook instead (or perhaps have this common description in the readme). |
@heatherbshapiro Added in section 2.3.5 for explaining the Horovod. |
Several changes have been made:
|
…l never stop; min_epoch_loss always eqals to val_epoch_loss
training. 1. Add random seeds for iterators 2. learning rate=lr*hvd.size() 3. sync the optimizer 4. remove DataParallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks for making the updates!
8a4b0e6
to
c5362cd
Compare
The notebook includes: Data loading and preprocessing, Train GenSen model with distributed PyTorch with Horovod on AzureML and Tuning on HypterDrive. Evaluation and deployment will be added later. In addition, the comparison results with training and tuning on AML v.s. VM will be added once this initial PR is merged with staging.
We provide a distributed PyTorch with Horovod implementation of the paper along with pre-trained models as well as code to evaluate these models on a variety of transfer learning benchmarks.
This code is based on the gibhub codebase from Maluuba, but we have refactored the code in the following aspects:
train.py
) from non-stopping to stop when the validation loss reaches to the local minimum