Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors #3

kemalaraz · 2020-04-13T12:13:14Z

I am getting a warning "Token indices sequence length is longer than the specified maximum sequence length for this model (730 > 512). Running this sequence through the model will result in indexing errors" will that cause a problem. I couldn't find a truncation operation or max_length used in BERTHiddenStateEncoder and I know BERT model is limited to 512 tokens so will that cause a decrease in performance and stuff?

Thanks

slczgwh · 2020-04-14T08:49:37Z

The problem is caused by NYT10's indexing method. Its index is char-level, while we need word level index. We have write this translation at here. What you need is to uncomment this line, and remove your pkl file and try again. Besides, we actually remove all sentence whose length is larger than 512. The code is in SentenceREDataset.

kemalaraz · 2020-04-14T13:54:39Z

I searched a bit and I think even you remove with max length it might give that warning even if there is none. I looked at the code you are not removing with max length but I don't know why it is giving that error because I uncommented that line now the training started I will keep you updated if a problem occurs or after evaluation if I cannot get the results that you got..

Thanks a lot for quick responses:)

kemalaraz · 2020-04-15T18:09:49Z

One epoch took around 4.5 hours and validation started with 0.85 micro f1 and keeps decreasing after 7th epoch it was 0.80 also for nyt 10 max epoch is 100 in the code is that a typo because in the original paper it is 10. The evaluation after 6th epoch is below:

{'micro_f1': 0.8095688346036508, 'micro_p': 0.9041182682152261, 'micro_r': 0.7329224447867261, 'acc': 0.878153846153666, 'without_na_res': {'micro_f1': 0.8095688346036508, 'micro_p': 0.9041182682152261, 'micro_r': 0.7329224447867261, 'acc': 0.878153846153666}, 'na_res': {'micro_f1': 0.0, 'micro_p': 0.0, 'micro_r': 0.0, 'acc': 0.0}, 'without_na_micro_f1': 0.8095688346036508, 'normal': {'micro_f1': 0.9303428149628392, 'micro_p': 0.9351173020524431, 'micro_r': 0.9256168359938586, 'acc': 0.9256168359938586}, 'over_lapping': {'micro_f1': 0.6763617128988916, 'micro_p': 0.8458994708989115, 'micro_r': 0.5634361233477694, 'acc': 0.7968847352019958}, 'multi_label': {'micro_f1': 0.6370757175524843, 'micro_p': 0.8758076094753224, 'micro_r': 0.5006155108738201, 'acc': 0.8293677770218699}, 'triple_res': {'0': {'micro_f1': 0.0, 'micro_p': 0.0, 'micro_r': 0.0, 'acc': 0.0}, '1': {'micro_f1': 0.9300956580721509, 'micro_p': 0.9349112426032046, 'micro_r': 0.9253294289894124, 'acc': 0.9253294289894124}, '2': {'micro_f1': 0.7416331989737418, 'micro_p': 0.8629283489087612, 'micro_r': 0.6502347417835288, 'acc': 0.8219584569724807}, '3': {'micro_f1': 0.6360814069160526, 'micro_p': 0.8632958801490044, 'micro_r': 0.5035499726922428, 'acc': 0.8144876325081144}}}

kemalaraz · 2020-04-28T11:00:31Z

Still no luck when keep on training f1 keeps on decreasing, why might that occurs?

slczgwh mentioned this issue Apr 14, 2020

ValueError: Caught ValueError in DataLoader worker process 0. #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors #3

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors #3

kemalaraz commented Apr 13, 2020 •

edited

Loading

slczgwh commented Apr 14, 2020

kemalaraz commented Apr 14, 2020 •

edited

Loading

kemalaraz commented Apr 15, 2020

kemalaraz commented Apr 28, 2020

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors #3

Token indices sequence length is longer than the specified maximum sequence length for this model (708 > 512). Running this sequence through the model will result in indexing errors #3

Comments

kemalaraz commented Apr 13, 2020 • edited Loading

slczgwh commented Apr 14, 2020

kemalaraz commented Apr 14, 2020 • edited Loading

kemalaraz commented Apr 15, 2020

kemalaraz commented Apr 28, 2020

kemalaraz commented Apr 13, 2020 •

edited

Loading

kemalaraz commented Apr 14, 2020 •

edited

Loading