Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_shallow_layer.py doesn't train correctly #18

Closed
hanrelan opened this issue May 7, 2019 · 4 comments
Closed

train_shallow_layer.py doesn't train correctly #18

hanrelan opened this issue May 7, 2019 · 4 comments

Comments

@hanrelan
Copy link
Contributor

hanrelan commented May 7, 2019

I'm trying to train the shallow-layer model and after 4-5 epochs I'm still seeing acc_lx close to zero. Is that normal? If you have an example training run log and the associated losses, that would be great. I want to make sure that something isn't broken before letting it train for a couple days.

The loss actually doesn't seem to change at all between epochs so I think training isn't happening, but I haven't modified the source code other than the paths.

Including my training configuration and output log below. I have an 11GB GPU so I had to change the batch size and gradient accumulation to prevent out of memory errors.

python train_shallow_layer.py --seed 1 --bS 8 --accumulate_gradients 4 --bert_type_abb uS --fine_tune --lr 0.001 --lr_bert 0.00001 --max_seq_leng 222

BERT-type: uncased_L-12_H-768_A-12
Batch_size = 32
BERT parameters:
learning rate: 1e-05
Fine-tune BERT: True
vocab size: 30522
hidden_size: 768
num_hidden_layer: 12
num_attention_heads: 12
hidden_act: gelu
intermediate_size: 3072
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
Load pre-trained parameters.
Seq-to-SQL: the number of final BERT layers to be used: 1
Seq-to-SQL: the size of hidden dimension = 100
Seq-to-SQL: LSTM encoding layer size = 2
Seq-to-SQL: dropout rate = 0.3
Seq-to-SQL: learning rate = 0.001


train results ------------
 Epoch: 0, ave loss: 6.216941231456922, acc_sc: 0.163, acc_sa: 0.717, acc_wn: 0.590,         acc_wc: 0.092, acc_wo: 0.547, acc_wvi: 0.016, acc_wv: 0.016, acc_lx: 0.000, acc_x: 0.001
dev results ------------
 Epoch: 0, ave loss: 6.288717828444157, acc_sc: 0.174, acc_sa: 0.715, acc_wn: 0.683,         acc_wc: 0.143, acc_wo: 0.658, acc_wvi: 0.016, acc_wv: 0.027, acc_lx: 0.000, acc_x: 0.001
 Best Dev lx acc: 0.00023750148438427741 at epoch: 0
train results ------------
 Epoch: 1, ave loss: 6.191903309470515, acc_sc: 0.166, acc_sa: 0.720, acc_wn: 0.692,         acc_wc: 0.113, acc_wo: 0.668, acc_wvi: 0.028, acc_wv: 0.028, acc_lx: 0.000, acc_x: 0.001
dev results ------------
 Epoch: 1, ave loss: 6.2836473057494, acc_sc: 0.168, acc_sa: 0.715, acc_wn: 0.683,         acc_wc: 0.148, acc_wo: 0.658, acc_wvi: 0.0
07, acc_wv: 0.014, acc_lx: 0.000, acc_x: 0.000
 Best Dev lx acc: 0.00023750148438427741 at epoch: 0
train results ------------
 Epoch: 2, ave loss: 6.187300067725954, acc_sc: 0.167, acc_sa: 0.720, acc_wn: 0.693,         acc_wc: 0.113, acc_wo: 0.669, acc_wvi: 0.033, acc_wv: 0.033, acc_lx: 0.000, acc_x: 0.001
dev results ------------
 Epoch: 2, ave loss: 6.283452489599453, acc_sc: 0.169, acc_sa: 0.715, acc_wn: 0.683,         acc_wc: 0.152, acc_wo: 0.658, acc_wvi: 0.000, acc_wv: 0.000, acc_lx: 0.000, acc_x: 0.001
 Best Dev lx acc: 0.00023750148438427741 at epoch: 0
@hanrelan hanrelan changed the title How many epochs before accuracy starts to improve train_shallow_layer.py doesn't train correctly May 7, 2019
@hanrelan
Copy link
Contributor Author

hanrelan commented May 7, 2019

I've confirmed that there is an issue with train_shallow_layer.py. Specifically the test I ran was to run both train.py (nl2sql) and train_shallow_layer.py overnight. The nl2sql model reached 80% accuracy after 12 epochs, while the shallow-layer model remained at 0% even after 21 epochs of training. CLI parameters for both were the same.

Any ideas on what might be causing train_shallow_layer.py to fail?

@whwang299
Copy link
Contributor

Hi @hanrelan

That may be caused by using improper learning rate. In shallow layer, args.lr is used to train BERT (sorry for the confusion) and 1e-3 is too fast.. I have modified the code to avoid confusion. Please try again with same command (note that args.lr is not used anymore in train_shallow_layer.py 0e1794f).

Thanks!

Wonseok

@hanrelan
Copy link
Contributor Author

hanrelan commented May 9, 2019

Ah, makes sense, thanks. I'll try it again tonight and close this issue if it works

@hanrelan
Copy link
Contributor Author

That fixed it, getting 78% lx accuracy now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants