New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F score on NER task #3

Closed
TanyaZhao opened this Issue Aug 2, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@TanyaZhao
Copy link

TanyaZhao commented Aug 2, 2018

Hi~ Could you report your best F score on NER conll 2003 task ?
Thank you !

@kolloldas

This comment has been minimized.

Copy link
Owner

kolloldas commented Aug 5, 2018

Sure! I got F1 of 0.867 on the test set (0.921 on validation) using the BiLSTM CRF. I haven't done much hyperparameter tuning, so probably there is room for improvement. Would love to know the numbers you got.

@dsindex

This comment has been minimized.

Copy link

dsindex commented Oct 8, 2018

hi @kolloldas

i am implementing the Transformer-based NER by referring your code.

https://github.com/dsindex/etagger

here, i found that

  1. if i do not use the CRF layer, the performance is around 70%.
  • 5 layers of the Transformer blocks
  • feed forward net with conv1d(kernel size 1)
  1. but, with the CRF layer, the performance goes up to 88%.
test precision, recall, f1(token): with out CRF
[0.9940347495376279, 0.847970479704797, 0.7586206896551724, 0.7618694362017804, 0.5936254980079682, 0.8837209302325582, 0.5938914027149321, 0.32207207207207206, 0.22399150743099788, 0.6607466473359913]
[0.9897805675156923, 0.6917519566526189, 0.5954415954415955, 0.6351267779839208, 0.773356401384083, 0.6606714628297362, 0.6287425149700598, 0.6620370370370371, 0.8210116731517509, 0.6741863905325444]
[0.9919030970978516, 0.7619363395225465, 0.6671987230646449, 0.6927487352445193, 0.6716754320060105, 0.7560891938250429, 0.6108202443280977, 0.43333333333333335, 0.3519599666388657, 0.66739886509244]
-> last column is the overall precision, recall, f1

test precision, recall, f1(chunk): with CRF
[0.8724561403508772, 0.8804886685552408, 0.87645400070497]

so, i suspect that the Transformer encoder alone is weak for collecting context information at the current position(time=t).

in your code, you are using kernel_size=3 for feed forward net.
is it the key of increasing performance?

@dsindex

This comment has been minimized.

Copy link

dsindex commented Oct 9, 2018

the above problem was fixed after applying kernel_size=3 :)

@kolloldas

This comment has been minimized.

Copy link
Owner

kolloldas commented Oct 9, 2018

@dsindex that's right! By itself the encoder is weak if we limit feedforward connections to each time step. Setting the filter size to 3 essentially takes the context information as you rightly pointed out. In fact the folks at Google did the same thing. However this won't be a problem if we pair the encoder with a decoder. I wrote an article on this issue, please check it out if you haven't read it yet!

@kolloldas kolloldas closed this Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment