cannot reproduce the RichpediaMEL results? #2

zhiweihu1103 · 2023-09-07T00:15:40Z

Hi, Pengfei. Nice work. I find I cannot reproduce the RichpediaMEL dataset result,, I use the same yaml as you provided, can you help me? attachment is the training logs.
richpediamel.txt

pengfei-luo · 2023-09-07T01:50:37Z

Hi, Zhiwei. I retrained the model with RichpediaMEL dataset, and everything seems fine. Based on the training logs you provided, I notice that the loss appears to be much larger than usual. In my training, after the first epoch, the Train/loss_epoch is around 3.19.

Train/loss_step	step	Train/loss_epoch
2.700	29
2.965	59
2.596	89
	97	3.187

Is the issue of not being able to reproduce the results limited to RichpediaMEL, or does it apply to all datasets?

zhiweihu1103 · 2023-09-07T01:55:07Z

Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.

zhiweihu1103 · 2023-09-07T01:59:51Z

In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?

pengfei-luo · 2023-09-07T02:21:09Z

Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.

That's strange. I've checked the MD5 of the files, and they appear to match the ones on my training server. Can you please check the learning rate during training? It seems that after the second epoch, the loss no longer exhibits significant changes.

In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?

For some entities, I couldn't retrieve suitable attributes from Wikidata (possibly due to a network issue), so I left them blank. In the implementation, the attributes are concatenated with the entity's name.

MIMIC/codes/utils/dataset.py

Lines 55 to 56 in 59ef385

    
           entity, attr = unquote(sample_dict.pop('entity_name')), sample_dict.pop('attr') 
        
           input_text = entity + ' [SEP] ' + attr  # concat entity and sentence

zhiweihu1103 · 2023-09-07T02:29:41Z

I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.

zhiweihu1103 · 2023-09-07T02:30:10Z

Okay, that means attr is not used in the current dataset, right?

pengfei-luo · 2023-09-07T02:50:35Z

I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.

You could log the learning rate without hassle by using PyTorch Lightning callbacks. You simply need to add it to the trainer's callbacks.

import os
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
from codes.utils.functions import setup_parser
from codes.model.lightning_mimic import LightningForMIMIC
from codes.utils.dataset import DataModuleForMIMIC

if __name__ == '__main__':
    args = setup_parser()
    pl.seed_everything(args.seed, workers=True)
    torch.set_num_threads(1)

    data_module = DataModuleForMIMIC(args)
    lightning_model = LightningForMIMIC(args)

    logger = pl.loggers.CSVLogger("./runs", name=args.run_name, flush_logs_every_n_steps=30)

    ckpt_callbacks = ModelCheckpoint(monitor='Val/mrr', save_weights_only=True, mode='max')
    early_stop_callback = EarlyStopping(monitor="Val/mrr", min_delta=0.00, patience=3, verbose=True, mode="max")
    lr_callback = LearningRateMonitor(logging_interval='step')

    trainer = pl.Trainer(**args.trainer,
                         deterministic=True, logger=logger, default_root_dir="./runs",
                         callbacks=[ckpt_callbacks, early_stop_callback, lr_callback])

    trainer.fit(lightning_model, datamodule=data_module)
    trainer.test(lightning_model, datamodule=data_module, ckpt_path='best')

Okay, that means attr is not used in the current dataset, right?

I'm not sure what you mean by "not used." Our intention is to utilize the attributes to enhance the representation of entities. Therefore, we concatenate the flattened key-value attributes with the entity's name as textual input.

zhiweihu1103 · 2023-09-07T02:54:45Z

I will give feedback this afternoon or evening.

zhiweihu1103 · 2023-09-07T02:55:57Z

What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.

pengfei-luo · 2023-09-07T03:03:03Z

What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.

No, I did use attributes. However, due to network issues or the absence of suitable attributes, some entities have empty or missing attr field.

zhiweihu1103 · 2023-09-07T03:03:44Z

Ok, I understand.

zhiweihu1103 · 2023-09-07T03:25:43Z

In addition, would you mind provide the Figure 4 datasets (10% and 20% for RichpediaMEL and WikiDiverse), and the numerical results, I need to draw my own histogram, but I don't know the specific value of your histogram.

zhiweihu1103 · 2023-09-07T08:01:08Z

Hi Pengfei. I have provided the training logs with learning rate logs. Please also pay attention to the question I mentioned above about the dataset and numerical results of Figure 4, looking forward to the discussion.
richpediamel_lr.txt
metrics.csv

zhiweihu1103 · 2023-09-08T07:09:01Z

Hi, Pengfei. Anything update?

pengfei-luo · 2023-09-10T15:13:23Z

Hi, Pengfei. Anything update?

Hi, sorry for the late response. I have reviewed your log file, and the learning rate appears to be fine. I attempted to retrain the model using the code and original data we uploaded, and the loss and evaluation results match our reported findings. Could you please check the configuration file config/richpediamel.yaml to see if there is anything wrong? Could you also provide details about the environment you used to train the model?

If you want to reproduce the reported results right now, I have uploaded a model checkpoint here (password: KDD2023richpedia).

In the low-resource setting, we only utilized the first 10% and 20% of the training data for each dataset, following the order in the training data file. This means that if you want to access the low-resource training data, you only need to control the amount of training data used.

Please add a new line after

MIMIC/codes/utils/dataset.py

Line 44 in 59ef385

train_data = _load_json_file(self.args.data.train_file)

train_data = train_data[:int(len(train_data) * 0.1)]  # or 0.2

Then you can obtain either 10% or 20% of the training data we used.

Regarding the numerical results you've requested, I will update them in the readme file in the next few days. Please stay tuned.

zhiweihu1103 · 2023-09-10T15:24:26Z

Hi Pengfei. First, I uploaded the yaml file information I used, and I did not make any modifications except the path; secondly, for the running environment, I created it through conda alone, and the environment information is exactly the same as your requirements.txt.

run_name: RichpediaMEL
seed: 43
pretrained_model: '/checkpoint/clip-vit-base-patch32'
lr: 1e-5

data:
  num_entity: 160933
  kb_img_folder: /data/RichpediaMEL/kb_image
  mention_img_folder: /data/RichpediaMEL/mention_image
  qid2id: /data/RichpediaMEL/qid2id.json
  entity: /data/RichpediaMEL/kb_entity.json
  train_file: /data/RichpediaMEL/RichpediaMEL_train.json
  dev_file: /data/RichpediaMEL/RichpediaMEL_dev.json
  test_file: /data/RichpediaMEL/RichpediaMEL_test.json

  batch_size: 128
  num_workers: 8
  text_max_length: 40

  eval_chunk_size: 6000
  eval_batch_size: 20
  embed_update_batch_size: 512

model:
  input_hidden_dim: 512
  input_image_hidden_dim: 768
  hidden_dim: 96
  dv: 96
  dt: 512
  TGLU_hidden_dim: 96
  IDLU_hidden_dim: 96
  CMFU_hidden_dim: 96

trainer:
  accelerator: 'gpu'
  devices: 1
  max_epochs: 20
  num_sanity_val_steps: 0
  check_val_every_n_epoch: 2
  log_every_n_steps: 30

All environmental information is:

absl-py                 1.4.0
aiohttp                 3.8.5
aiosignal               1.3.1
antlr4-python3-runtime  4.9.3
async-timeout           4.0.3
attrs                   23.1.0
cachetools              5.3.1
certifi                 2023.7.22
charset-normalizer      3.2.0
click                   8.1.7
filelock                3.12.3
frozenlist              1.4.0
fsspec                  2023.9.0
google-auth             2.22.0
google-auth-oauthlib    1.0.0
grpcio                  1.57.0
huggingface-hub         0.16.4
idna                    3.4
importlib-metadata      6.8.0
joblib                  1.3.2
Markdown                3.4.4
MarkupSafe              2.1.3
multidict               6.0.4
numpy                   1.24.4
oauthlib                3.2.2
omegaconf               2.2.3
packaging               23.1
Pillow                  9.3.0
pip                     23.2.1
protobuf                4.24.2
pyasn1                  0.5.0
pyasn1-modules          0.3.0
pyDeprecate             0.3.2
pytorch-lightning       1.7.7
PyYAML                  6.0.1
regex                   2023.8.8
requests                2.31.0
requests-oauthlib       1.3.1
rsa                     4.9
sacremoses              0.0.53
setuptools              68.0.0
six                     1.16.0
tensorboard             2.14.0
tensorboard-data-server 0.7.1
tokenizers              0.12.1
torch                   1.11.0
torchmetrics            0.11.0
tqdm                    4.66.1
transformers            4.18.0
typing_extensions       4.7.1
urllib3                 1.26.16
Werkzeug                2.3.7
wheel                   0.38.4
yarl                    1.9.2
zipp                    3.16.2

zhiweihu1103 · 2023-09-10T15:33:10Z

Thanks for information about how to run the low-resource setting experiments. I am very much looking forward to your numerical results, thank you for your efforts. In addition, regarding the reproduction of dataset RichpediaMEL, I think whether there may be some differences between the code you reproduced and the code uploaded, because I ran it twice on this dataset and the results were exactly the same as I above upload.

pengfei-luo · 2023-09-10T15:44:25Z

I can reproduce the results with the code we shared and the data we uploaded to OneDrive. Is there anything difference about the pretrained model? I saw you change the path. I use the one form huggingface.

SHA256: a63082132ba4f97a80bea76823f544493bffa8082296d62d71581a4feff1576f
MD5: 47767ea81d24718fcc0c8923607792a7

zhiweihu1103 · 2023-09-10T15:59:00Z

I download the pretrained clip from https://huggingface.co/openai/clip-vit-base-patch32/tree/main, I will replace the pytorch_model.bin with the link you provided, upload the results tomorrow morning.

zhiweihu1103 · 2023-09-10T16:14:20Z

But I found that the CLIP weighted link address I downloaded actually came out exactly the same as the one you provided after clicking pytorch_model.bin.

zhiweihu1103 · 2023-09-11T03:08:59Z

Hi Pengfei. I may need further help from you, because I still have difficulty reproducing the results of dataset RichpediaMEL, even though I have used the CLIP pre-training URL you gave me (actually the same pre-trained model I used previous), I will upload it below my running logs on three datasets.
wikidiverse_another.txt
wikimel_another.txt
richpediamel_another.txt

pengfei-luo · 2023-09-11T03:42:45Z

This is very strange. The other two datasets work fine, only RichpediaMEL has an issue. Maybe you could double-check the RichpediaMEL.tar file you downloaded? I will share an online Wandb report later to show that everything is normal on my end.

RichpediaMEL.tar
MD5: 0f499eddde7582428947e45ebb94388f
SHA256: 36ac5703e4a9890238daedf039a7b2923a7c4b66c66a6b9cf788db40eabe0447

zhiweihu1103 · 2023-09-11T03:46:28Z

I will take a screenshot to share the information after decompressing the RichpediaMEL dataset.

the kb_image has 96073 files, and mention_images has 15852 files.

zhiweihu1103 · 2023-09-11T04:56:34Z

I download the RichpediaMEL dataset from you provided: https://mailustceducn-my.sharepoint.com/:u:/g/personal/pfluo_mail_ustc_edu_cn/ERikbOQuoWFHrA_AizcuCbgB8PBOiRqCV4U0lZfxUN-6kg?e=speIdh

pengfei-luo · 2023-09-11T06:51:55Z

Could you please try upgrading transformers to version 4.27.1? I notice that the version of transformers might have an impact on the results, although I'm not sure what's causing the differences in results.

pip install transformers==4.27.1 --upgrade

zhiweihu1103 · 2023-09-11T08:24:49Z

Let me check.

pengfei-luo · 2023-09-11T08:39:04Z

The Wandb report is here.

zhiweihu1103 · 2023-09-11T08:39:56Z

You use the transformers==4.27.1 right?

pengfei-luo · 2023-09-11T08:45:48Z

Yes, in the Wandb report run, I used torch==1.11.0 and transformers==4.27.1. Other packages are the same as the requirements. I attempted to downgrade transformers to 4.18.0 and noticed that it did lead to a performance drop. I have no idea why this occurred.

zhiweihu1103 · 2023-09-11T08:49:45Z

If the performance degradation is due to transformers, then this should not be within the scope of our discussion. As long as the results can be reproduced, everything is good. I'll re-run and give my reproduction results.

zhiweihu1103 · 2023-09-12T11:05:26Z

Hi, Pengfei.
Firstly, I carefully compared the open source github code with the code you used on wandb. There are only some differences when the CLIP model executes the from_pretrained method.
The open source code on github is:

self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model).tokenizer

The code used by wandb is:

self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model, local_files_only=True).tokenizer

But I think this is not the main problem, because after I added the local_files_only=True parameter, I found the result was the same.

Then, I created a requirements.txt environment that is exactly the same as wandb provided, and the running results are exactly the same as mine before, indicating that the difference in results is not caused by environmental problems.

So, I need to confirm now, is the RichpediaMEL dataset you are using the version you uploaded? Because now all the code and environment information are completely consistent, the performance difference is difficult to accept.

zhiweihu1103 · 2023-09-12T11:06:49Z

I will take a screenshot to share the information after decompressing the RichpediaMEL dataset. the kb_image has 96073 files, and mention_images has 15852 files.

Here are the statistics for the RichpediaMEL dataset I used.

pengfei-luo · 2023-09-12T11:14:41Z

Maybe you can check if the MD5 values of all the files match mine?

ba086b054bf52d549f2a79503c76704a  kb_entity.json
8059b7aa89a9314d5dc38607a8685eeb  qid2id.json
831cdd92d70a93ea8a442798ec2fcde1  RichpediaMEL_dev.json
9e07e5e970e01079d256311e5ac10bd8  RichpediaMEL_test.json
e1d0b2adb2a1114cefa63860ffa23982  RichpediaMEL_train.json
961efc263bc8e2e7b257a28e8e703633  kb_image.zip
474c594ce8a95aa5dc9222365db0044e  mention_images.zip

pengfei-luo · 2023-09-12T11:16:53Z

The parameter local_files_only=True ensures that local files are used, and we have already confirmed that the model weights are consistent. I think this won't have any impact.

zhiweihu1103 · 2023-09-12T11:19:48Z

zhiweihu1103 · 2023-09-12T11:20:46Z

You can ignore the .pkl files, I found a difference between kb_image and mention_images.

pengfei-luo · 2023-09-12T11:24:53Z

Can you provide the MD5 values for kb_image.zip and mention_images.zip? I directly extracted these two ZIP files.

zhiweihu1103 · 2023-09-12T11:26:19Z

Wait a few minutes, I deleted the original file after decompressing it, and I need to download it again.

zhiweihu1103 · 2023-09-12T11:42:54Z

zhiweihu1103 · 2023-09-12T11:49:13Z

I can't think of any other reason why it is difficult to reproduce, because the size of the .zip file is the same, but the size after decompression is different?

zhiweihu1103 · 2023-09-12T11:51:25Z

I checked your running log on wandb , and your loss is obviously much lower than what I reproduced.

pengfei-luo · 2023-09-12T11:55:57Z

It seems all the files are normal. The difference in folder sizes may be due to differences in how the operating system organizes files.

pengfei-luo · 2023-09-12T12:07:02Z

Perhaps you can try changing some hyperparameters, such as the random seed, learning rate, and batch size, to see if they have an impact on the loss. If you have access to other servers, maybe you can try configuring the environment and running it on other servers. I don't know what's causing the inability to reproduce the results. All the results on my end are normal.

zhiweihu1103 · 2023-09-12T12:59:59Z

I can try it on other machines, but judging from my experience running your code, as long as the random seed is fixed, the results will be exactly the same every time.

zhiweihu1103 · 2023-09-12T14:08:11Z

I think it is necessary to give some new content. I originally ran the code on the V100 32G graphics card. Now I have tried it on the A6000 and found that the final result of the model is almost the same as that of the V100. Have you made any other modifications? Because the hyperparameters I use are completely consistent with the yaml you provided.

pengfei-luo · 2023-09-12T14:26:05Z

As you can see on Wandb, I confirm that there are no changes made to the code about model and data processing. We have verified that the pre-trained model, environment configuration, code, and data are all consistent. I don't think the CUDA version and NVIDIA driver version should have such a significant impact.

zhiweihu1103 · 2023-09-12T14:32:08Z

This is a very vexing question because I can't think of any other reason that could cause this problem.

pengfei-luo · 2023-09-12T14:37:29Z

My thought is, perhaps you can make some adjustments to the parameters and then observe if there's a decrease in loss within a few iterations (compared to your previous loss). Maybe consider replacing the optimizer with stochastic gradient descent? I'm not sure.

zhiweihu1103 · 2023-09-12T14:39:29Z

I'll try adjusting the parameters tomorrow and keep in touch at any time.

pengfei-luo · 2023-09-16T06:48:09Z

I discovered that the folder for the mention images should be named mention_images instead of mention_image. This would result in the absence of image data during the training process. This could also be the reason why you were unable to reproduce the original results.

pengfei-luo · 2023-09-16T06:53:04Z

You need to manually modify the following line in the config file for RichpediaMEL.

MIMIC/config/richpediamel.yaml

Line 10 in 59ef385

mention_img_folder: /YOUR_PATH/RichpediaMEL/mention_image

zhiweihu1103 · 2023-09-16T07:11:21Z

It is indeed like this, a very subtle problem, thank you very much, I will re-run the code and give the final result.

zhiweihu1103 · 2023-09-16T07:21:59Z

As a reminder, would you mind uploading the numerical results of Figure 4?

pengfei-luo · 2023-09-16T07:48:57Z

I have just updated the detailed results in the README file.

zhiweihu1103 · 2023-09-16T07:55:08Z

Great, Thx.

zhiweihu1103 · 2023-09-16T12:18:11Z

Hi, Pengfei. I have reproduced the results, thanks for your solution. Good luck. I will close the issue.

zhangzef · 2023-11-30T02:20:35Z

我似乎无法复现wikimel中的结果，我的超参数文件使用的是作者在GitHub中提供的文件，能麻烦您告诉我您在复现的过程中有什么需要注意的吗

pengfei-luo assigned pengfei-luo and unassigned pengfei-luo Sep 16, 2023

zhiweihu1103 closed this as completed Sep 16, 2023

cannot reproduce the RichpediaMEL results? #2

cannot reproduce the RichpediaMEL results? #2

Comments

zhiweihu1103 commented Sep 7, 2023

pengfei-luo commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

pengfei-luo commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023 • edited Loading

pengfei-luo commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

pengfei-luo commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023

zhiweihu1103 commented Sep 7, 2023 • edited Loading

zhiweihu1103 commented Sep 7, 2023 • edited Loading

zhiweihu1103 commented Sep 8, 2023

pengfei-luo commented Sep 10, 2023

zhiweihu1103 commented Sep 10, 2023 • edited Loading

zhiweihu1103 commented Sep 10, 2023

pengfei-luo commented Sep 10, 2023

zhiweihu1103 commented Sep 10, 2023 • edited Loading

zhiweihu1103 commented Sep 10, 2023

zhiweihu1103 commented Sep 11, 2023 • edited Loading

pengfei-luo commented Sep 11, 2023

zhiweihu1103 commented Sep 11, 2023 • edited Loading

zhiweihu1103 commented Sep 11, 2023

pengfei-luo commented Sep 11, 2023

zhiweihu1103 commented Sep 11, 2023

pengfei-luo commented Sep 11, 2023

zhiweihu1103 commented Sep 11, 2023

pengfei-luo commented Sep 11, 2023

zhiweihu1103 commented Sep 11, 2023

zhiweihu1103 commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023 • edited Loading

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023 • edited Loading

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 12, 2023

zhiweihu1103 commented Sep 12, 2023

pengfei-luo commented Sep 16, 2023 • edited Loading

pengfei-luo commented Sep 16, 2023

zhiweihu1103 commented Sep 16, 2023

zhiweihu1103 commented Sep 16, 2023

pengfei-luo commented Sep 16, 2023

zhiweihu1103 commented Sep 16, 2023

zhiweihu1103 commented Sep 16, 2023

zhangzef commented Nov 30, 2023

zhiweihu1103 commented Sep 7, 2023 •

edited

Loading

zhiweihu1103 commented Sep 7, 2023 •

edited

Loading

zhiweihu1103 commented Sep 7, 2023 •

edited

Loading

zhiweihu1103 commented Sep 10, 2023 •

edited

Loading

zhiweihu1103 commented Sep 10, 2023 •

edited

Loading

zhiweihu1103 commented Sep 11, 2023 •

edited

Loading

zhiweihu1103 commented Sep 11, 2023 •

edited

Loading

zhiweihu1103 commented Sep 12, 2023 •

edited

Loading

pengfei-luo commented Sep 12, 2023 •

edited

Loading

pengfei-luo commented Sep 16, 2023 •

edited

Loading