Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
F score on NER task #3
i am implementing the Transformer-based NER by referring your code.
here, i found that
so, i suspect that the Transformer encoder alone is weak for collecting context information at the current position(time=t).
in your code, you are using kernel_size=3 for feed forward net.
@dsindex that's right! By itself the encoder is weak if we limit feedforward connections to each time step. Setting the filter size to 3 essentially takes the context information as you rightly pointed out. In fact the folks at Google did the same thing. However this won't be a problem if we pair the encoder with a decoder. I wrote an article on this issue, please check it out if you haven't read it yet!