-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The result seems strange in my experiments #46
Comments
I am wondering this too. What are your hyperparameters? Trying to get the same results (F1:90.94%) as reported in Lample et al.'s LSTM-CRF model for some time. This is how my (hyper)params roughly look like:
I'm getting approx. 88.5% F1 score for this setting. The only difference I see compared to the original implementation of Lample, is the addition of singletons (w/ 0.5 probability) to train UNK token but IMO this should not make a huge difference, right? Any help would be appreciated. |
Firstly, Thanks for sharing the code and detailed instructions. I have been facing similar issues, I tried the same parameters as mentioned in the paper. It only gives a Test F-1 Score of around 87. I also tried tuning the hyper-parameters using different learning methods, learning rates, decays, momentum values. The best result achieved with the code is 88.5 F1. It would be great if you can share the hyperparameters using which you were able to reproduce the results in the paper. My Environment Details: Thanks |
I tried the following setting and the Test F-1 Score is 90.02
My Environment Details: |
@cooelf Thanks for the reply. |
@jayavardhanr I simply used glove.6B.300d word embeddings. It's quite small actually. My partner tried using the codes with glove.840B.300d in a similar task, which showed a big improvment (+3.8%) than glove.6B.300d. From my previous experiments, adam also seems to be better than SGD. Maybe you can try the embedding with the parameters. Hoping for your feedback! |
@cooelf Thanks for the details. I tried your mentioned hyper-parameters. I did achieve an F-1 score of 90.10 on the Test set. Thanks again. |
@cooelf @jayavardhanr . Hey guys... please maybe someone can help me....I try to run the model by myself. I am following the steps: 1.model/data_utils, config.py and than build_data.py but at the referenc he write that first you run bild_data and than config.py...which steps should i use? and when I run data_utils its not iterating over the CoNLL dataset but isnt showing any Error I dont know what i am doing wrong.... i will really appreciate your help.. |
'''
Changed Code:
''' The author provides test.txt, which will be used if you don't change this part of the code. |
I had to remove the ".iob" from the downloaded files ... did you do it too ? |
@luto65 Yes. Forgot to mention that |
Using defaults (without touching the installation) on macOS I got following on the CONLL dataset. impressive ! Congrats ! |
Hi @luto65 and @jayavardhanr thank you very much for your help. Have some of you a idee why i am getting this error at the issue 3 (i was opening one issue 3)? I was trying many other ways but i am getting always the same error... Thanks again in advance... |
Hi @jayavardhanr, I have a question about the 'build data' part. I found in the 'build_data.py' file, the author build the vocabulary by using all of 'train', 'dev' and 'test' data. But in my view, the vocabulary should be built on the train set. May be I missed something, can you give me some advices? Thanks a lot! |
hi!
now, if you want to do this procedure at runtime it might be hard to track. instead that you took all the words from train test and dev at the beginning or the training as vocabulary. the procedure is equivalent. |
@sbmaruf Hi, thank you~ Can I use the embedding of 'UNK' always for the unknown words in dev/test set when evaluation? I mean I don't want to assign the corresponding embeddings in glove to these unknown words. |
@ShengleiH sorry for being late I don't see any problem with doing this at the time of evaluation. Since at the training state you are only training the model based on the token from the train set. If you are using pretrained embedding, this is also done by the original author (@glample) of the paper. No need to consider < UNK >, while doing evaluation you only lookup on the embedding of dev and test and pass them to your model. Remember you haven't trained the model based on them(dev or test). That's why there is no problem. On the contrary, using their pretrained embedding doesn't contradict that you are using them to train your model. Remember at the train time you model never sees < UNK > tagged data. If you can differentiate two data that is considered as < UNK > in dev or test time with different embeddings, there is no problem. Apart from that if you are not using pre-trained embedding (initializing the embedding as random distribution), there should not be any problem though the original author (@glample) of the paper use < UNK > tag at that time. I would also like to have some input from @guillaumegenthial in this regard. |
Because we're using pre-trained embeddings, we can keep the vectors of all the words present in the |
Also, if you use the IOBES and GloVe6B you should get results similar to the paper. I wrote a new version of the code, that achieves higher scores : https://github.com/guillaumegenthial/tf_ner/ |
Thanks for your codes and instructions!
I'm using the code (with no revision) to run some experiments on CoNLL2003 dataset (english), the F1 scores of testa and testb are about 91% and 87% which is not consistant with the reported 91% on the test set.
I have tried to optimize the hyper-parameterss but the F1 score can only reach 88.8% at most. I'm wondering if it could be due to the environment, like python (3.6.4), tensorflow version (tensorflow-gpu==1.3.0) or CUDA (8.0 with cudnn 5.1).
Could you provide your enviroment for comparison or give some insight about this result?
Thanks
The text was updated successfully, but these errors were encountered: