The result seems strange in my experiments #46

cooelf · 2018-03-12T12:10:27Z

Thanks for your codes and instructions!

I'm using the code (with no revision) to run some experiments on CoNLL2003 dataset (english), the F1 scores of testa and testb are about 91% and 87% which is not consistant with the reported 91% on the test set.

I have tried to optimize the hyper-parameterss but the F1 score can only reach 88.8% at most. I'm wondering if it could be due to the environment, like python (3.6.4), tensorflow version (tensorflow-gpu==1.3.0) or CUDA (8.0 with cudnn 5.1).

Could you provide your enviroment for comparison or give some insight about this result?

Thanks

emrekgn · 2018-03-26T11:40:47Z

I am wondering this too. What are your hyperparameters?

Trying to get the same results (F1:90.94%) as reported in Lample et al.'s LSTM-CRF model for some time. This is how my (hyper)params roughly look like:

dim_word = 100
dim_char = 25
nepochs = 100
dropout = 0.5
batch_size = 10
lr_method = "sgd"
lr= 0.01
lr_decay = 1.0 # original work does not use decay either!
clip = 5.0 # gradient clipping
hidden_size_char = 25
hidden_size_lstm = 100
# I also replace numeric with zero as stated in the original implementation of Lample.

I'm getting approx. 88.5% F1 score for this setting.

The only difference I see compared to the original implementation of Lample, is the addition of singletons (w/ 0.5 probability) to train UNK token but IMO this should not make a huge difference, right?

Any help would be appreciated.
Thanks.

jayavardhanr · 2018-04-08T19:56:38Z

Firstly, Thanks for sharing the code and detailed instructions.

I have been facing similar issues, I tried the same parameters as mentioned in the paper. It only gives a Test F-1 Score of around 87.

I also tried tuning the hyper-parameters using different learning methods, learning rates, decays, momentum values. The best result achieved with the code is 88.5 F1.

It would be great if you can share the hyperparameters using which you were able to reproduce the results in the paper.

My Environment Details:
Python 2.7
Tensorflow-gpu 1.2.0
CUDA 8.0.44

Thanks

cooelf · 2018-04-09T01:20:38Z

I tried the following setting and the Test F-1 Score is 90.02

# embeddings
dim_word = 300
dim_char = 100    
# training
train_embeddings = False
nepochs          = 50
dropout          = 0.3
batch_size       = 50
lr_method        = "adam"
lr               = 0.005
lr_decay         = 0.9
clip             = 5 # if negative, no clipping
nepoch_no_imprv  = 7

# model hyperparameters
hidden_size_char = 100 # lstm on chars
hidden_size_lstm = 300 # lstm on word embeddings

My Environment Details:
Python 3.6
Tensorflow-gpu 1.3.0
CUDA 8.0.61 with cudnn 5.1

jayavardhanr · 2018-04-09T04:15:07Z

@cooelf Thanks for the reply.
Did you use glove.840B.300d or word2vec 300d for word embeddings?

cooelf · 2018-04-09T04:26:16Z

@jayavardhanr I simply used glove.6B.300d word embeddings. It's quite small actually. My partner tried using the codes with glove.840B.300d in a similar task, which showed a big improvment (+3.8%) than glove.6B.300d.

From my previous experiments, adam also seems to be better than SGD. Maybe you can try the embedding with the parameters.

Hoping for your feedback!

jayavardhanr · 2018-04-10T16:10:04Z

@cooelf Thanks for the details. I tried your mentioned hyper-parameters. I did achieve an F-1 score of 90.10 on the Test set.

Thanks again.

Jonida88 · 2018-04-11T18:25:42Z

@cooelf @jayavardhanr . Hey guys... please maybe someone can help me....I try to run the model by myself. I am following the steps: 1.model/data_utils, config.py and than build_data.py but at the referenc he write that first you run bild_data and than config.py...which steps should i use? and when I run data_utils its not iterating over the CoNLL dataset but isnt showing any Error I dont know what i am doing wrong.... i will really appreciate your help..

jayavardhanr · 2018-04-12T00:54:13Z

You need to download the CONLL data and place it at the appropriate location. You can find the data here - https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003
Make this change in model/config.py:

'''
Initial code(line number:73 to 78):

# filename_dev = "data/coNLL/eng/eng.testa.iob"
# filename_test = "data/coNLL/eng/eng.testb.iob"
# filename_train = "data/coNLL/eng/eng.train.iob"

filename_dev = filename_test = filename_train = "data/test.txt" # test

Changed Code:

filename_dev = "data/coNLL/eng/eng.testa.iob"
filename_test = "data/coNLL/eng/eng.testb.iob"
filename_train = "data/coNLL/eng/eng.train.iob"

#filename_dev = filename_test = filename_train = "data/test.txt" # test

'''

The author provides test.txt, which will be used if you don't change this part of the code.

luto65 · 2018-04-12T07:38:23Z

I had to remove the ".iob" from the downloaded files ... did you do it too ?

jayavardhanr · 2018-04-12T07:46:48Z

@luto65 Yes. Forgot to mention that

luto65 · 2018-04-12T11:15:25Z

Using defaults (without touching the installation) on macOS I got following on the CONLL dataset.
acc 97.91 - f1 89.54

impressive ! Congrats !

Jonida88 · 2018-04-15T17:18:36Z

Hi @luto65 and @jayavardhanr thank you very much for your help. Have some of you a idee why i am getting this error at the issue 3 (i was opening one issue 3)? I was trying many other ways but i am getting always the same error... Thanks again in advance...

ShengleiH · 2018-04-16T02:55:16Z

Hi @jayavardhanr, I have a question about the 'build data' part. I found in the 'build_data.py' file, the author build the vocabulary by using all of 'train', 'dev' and 'test' data. But in my view, the vocabulary should be built on the train set. May be I missed something, can you give me some advices? Thanks a lot!

sbmaruf · 2018-04-16T03:20:10Z

hi!
the vocab is alright with train test dev. here's the reason.

you are actually not using the lable of the dev and test
assume you are not using dev and test. now you got an unknown word from dev. you searched the word's embedding in glove or word2vec or fasttext (or initialize randomly). you found the embedding. you add the embedding to your vocabulary and lookup according to it. it's like while you find an unknown word at runtime and you process the word as your embedding will always be open for you to take. there's no harm in it.

now, if you want to do this procedure at runtime it might be hard to track. instead that you took all the words from train test and dev at the beginning or the training as vocabulary. the procedure is equivalent.

ShengleiH · 2018-04-16T03:32:20Z

@sbmaruf Hi, thank you~ Can I use the embedding of 'UNK' always for the unknown words in dev/test set when evaluation? I mean I don't want to assign the corresponding embeddings in glove to these unknown words.

sbmaruf · 2018-04-19T12:38:58Z

@ShengleiH sorry for being late

I don't see any problem with doing this at the time of evaluation. Since at the training state you are only training the model based on the token from the train set. If you are using pretrained embedding, this is also done by the original author (@glample) of the paper.

No need to consider < UNK >, while doing evaluation you only lookup on the embedding of dev and test and pass them to your model. Remember you haven't trained the model based on them(dev or test). That's why there is no problem. On the contrary, using their pretrained embedding doesn't contradict that you are using them to train your model.

Remember at the train time you model never sees < UNK > tagged data. If you can differentiate two data that is considered as < UNK > in dev or test time with different embeddings, there is no problem. Apart from that if you are not using pre-trained embedding (initializing the embedding as random distribution), there should not be any problem though the original author (@glample) of the paper use < UNK > tag at that time.

I would also like to have some input from @guillaumegenthial in this regard.

guillaumegenthial · 2018-09-24T06:31:40Z

Because we're using pre-trained embeddings, we can keep the vectors of all the words present in the train test and dev set. (Ideally we would keep all GloVe vectors but that's unnecessary for our experiment). Also, at training time, your model does see the UNK word (not all words in the training set are words in the GloVe vocab!).

guillaumegenthial · 2018-09-24T06:32:27Z

Also, if you use the IOBES and GloVe6B you should get results similar to the paper. I wrote a new version of the code, that achieves higher scores : https://github.com/guillaumegenthial/tf_ner/

cooelf changed the title ~~The results seem to be strange in my experiments~~ The result seems strange in my experiments Mar 12, 2018

guillaumegenthial closed this as completed Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The result seems strange in my experiments #46

The result seems strange in my experiments #46

cooelf commented Mar 12, 2018 •

edited

Loading

emrekgn commented Mar 26, 2018

jayavardhanr commented Apr 8, 2018

cooelf commented Apr 9, 2018

jayavardhanr commented Apr 9, 2018

cooelf commented Apr 9, 2018

jayavardhanr commented Apr 10, 2018

Jonida88 commented Apr 11, 2018

jayavardhanr commented Apr 12, 2018 •

edited

Loading

luto65 commented Apr 12, 2018

jayavardhanr commented Apr 12, 2018

luto65 commented Apr 12, 2018

Jonida88 commented Apr 15, 2018

ShengleiH commented Apr 16, 2018

sbmaruf commented Apr 16, 2018

ShengleiH commented Apr 16, 2018

sbmaruf commented Apr 19, 2018 •

edited

Loading

guillaumegenthial commented Sep 24, 2018

guillaumegenthial commented Sep 24, 2018

The result seems strange in my experiments #46

The result seems strange in my experiments #46

Comments

cooelf commented Mar 12, 2018 • edited Loading

emrekgn commented Mar 26, 2018

jayavardhanr commented Apr 8, 2018

cooelf commented Apr 9, 2018

jayavardhanr commented Apr 9, 2018

cooelf commented Apr 9, 2018

jayavardhanr commented Apr 10, 2018

Jonida88 commented Apr 11, 2018

jayavardhanr commented Apr 12, 2018 • edited Loading

luto65 commented Apr 12, 2018

jayavardhanr commented Apr 12, 2018

luto65 commented Apr 12, 2018

Jonida88 commented Apr 15, 2018

ShengleiH commented Apr 16, 2018

sbmaruf commented Apr 16, 2018

ShengleiH commented Apr 16, 2018

sbmaruf commented Apr 19, 2018 • edited Loading

guillaumegenthial commented Sep 24, 2018

guillaumegenthial commented Sep 24, 2018

cooelf commented Mar 12, 2018 •

edited

Loading

jayavardhanr commented Apr 12, 2018 •

edited

Loading

sbmaruf commented Apr 19, 2018 •

edited

Loading