Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reproduce the RichpediaMEL results? #2

Closed
zhiweihu1103 opened this issue Sep 7, 2023 · 62 comments
Closed

cannot reproduce the RichpediaMEL results? #2

zhiweihu1103 opened this issue Sep 7, 2023 · 62 comments

Comments

@zhiweihu1103
Copy link

Hi, Pengfei. Nice work. I find I cannot reproduce the RichpediaMEL dataset result,, I use the same yaml as you provided, can you help me? attachment is the training logs.
richpediamel.txt

@pengfei-luo
Copy link
Owner

Hi, Zhiwei. I retrained the model with RichpediaMEL dataset, and everything seems fine. Based on the training logs you provided, I notice that the loss appears to be much larger than usual. In my training, after the first epoch, the Train/loss_epoch is around 3.19.

Train/loss_step epoch step Train/loss_epoch
2.700 0 29
2.965 0 59
2.596 0 89
0 97 3.187

Is the issue of not being able to reproduce the results limited to RichpediaMEL, or does it apply to all datasets?

@zhiweihu1103
Copy link
Author

Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.

@zhiweihu1103
Copy link
Author

In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?

@pengfei-luo
Copy link
Owner

Hi, Pengfei. Only limited to RichpediaMEL, the other two datasets can get results close to the original text.

That's strange. I've checked the MD5 of the files, and they appear to match the ones on my training server. Can you please check the learning rate during training? It seems that after the second epoch, the loss no longer exhibits significant changes.

In addition, I see that many attr fields in the dataset are empty. Is this field not used in the end?

For some entities, I couldn't retrieve suitable attributes from Wikidata (possibly due to a network issue), so I left them blank. In the implementation, the attributes are concatenated with the entity's name.

entity, attr = unquote(sample_dict.pop('entity_name')), sample_dict.pop('attr')
input_text = entity + ' [SEP] ' + attr # concat entity and sentence

@zhiweihu1103
Copy link
Author

I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 7, 2023

Okay, that means attr is not used in the current dataset, right?

@pengfei-luo
Copy link
Owner

I need to print the learning rate after each round, right? I also found that the losses did not change much after the second round.

You could log the learning rate without hassle by using PyTorch Lightning callbacks. You simply need to add it to the trainer's callbacks.

import os
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
from codes.utils.functions import setup_parser
from codes.model.lightning_mimic import LightningForMIMIC
from codes.utils.dataset import DataModuleForMIMIC

if __name__ == '__main__':
    args = setup_parser()
    pl.seed_everything(args.seed, workers=True)
    torch.set_num_threads(1)

    data_module = DataModuleForMIMIC(args)
    lightning_model = LightningForMIMIC(args)

    logger = pl.loggers.CSVLogger("./runs", name=args.run_name, flush_logs_every_n_steps=30)

    ckpt_callbacks = ModelCheckpoint(monitor='Val/mrr', save_weights_only=True, mode='max')
    early_stop_callback = EarlyStopping(monitor="Val/mrr", min_delta=0.00, patience=3, verbose=True, mode="max")
    lr_callback = LearningRateMonitor(logging_interval='step')

    trainer = pl.Trainer(**args.trainer,
                         deterministic=True, logger=logger, default_root_dir="./runs",
                         callbacks=[ckpt_callbacks, early_stop_callback, lr_callback])

    trainer.fit(lightning_model, datamodule=data_module)
    trainer.test(lightning_model, datamodule=data_module, ckpt_path='best')

Okay, that means attr is not used in the current dataset, right?

I'm not sure what you mean by "not used." Our intention is to utilize the attributes to enhance the representation of entities. Therefore, we concatenate the flattened key-value attributes with the entity's name as textual input.

@zhiweihu1103
Copy link
Author

I will give feedback this afternoon or evening.

@zhiweihu1103
Copy link
Author

What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.

@pengfei-luo
Copy link
Owner

What I mean is that I saw that the attr field is empty, indicating that attr is not used. In the code, I saw that there is indeed a part where attr is spliced.

No, I did use attributes. However, due to network issues or the absence of suitable attributes, some entities have empty or missing attr field.

@zhiweihu1103
Copy link
Author

Ok, I understand.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 7, 2023

In addition, would you mind provide the Figure 4 datasets (10% and 20% for RichpediaMEL and WikiDiverse), and the numerical results, I need to draw my own histogram, but I don't know the specific value of your histogram.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 7, 2023

Hi Pengfei. I have provided the training logs with learning rate logs. Please also pay attention to the question I mentioned above about the dataset and numerical results of Figure 4, looking forward to the discussion.
richpediamel_lr.txt
metrics.csv

@zhiweihu1103
Copy link
Author

Hi, Pengfei. Anything update?

@pengfei-luo
Copy link
Owner

Hi, Pengfei. Anything update?

Hi, sorry for the late response. I have reviewed your log file, and the learning rate appears to be fine. I attempted to retrain the model using the code and original data we uploaded, and the loss and evaluation results match our reported findings. Could you please check the configuration file config/richpediamel.yaml to see if there is anything wrong? Could you also provide details about the environment you used to train the model?

If you want to reproduce the reported results right now, I have uploaded a model checkpoint here (password: KDD2023richpedia).

In the low-resource setting, we only utilized the first 10% and 20% of the training data for each dataset, following the order in the training data file. This means that if you want to access the low-resource training data, you only need to control the amount of training data used.

Please add a new line after

train_data = _load_json_file(self.args.data.train_file)

train_data = train_data[:int(len(train_data) * 0.1)]  # or 0.2

Then you can obtain either 10% or 20% of the training data we used.

Regarding the numerical results you've requested, I will update them in the readme file in the next few days. Please stay tuned.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 10, 2023

Hi Pengfei. First, I uploaded the yaml file information I used, and I did not make any modifications except the path; secondly, for the running environment, I created it through conda alone, and the environment information is exactly the same as your requirements.txt.

run_name: RichpediaMEL
seed: 43
pretrained_model: '/checkpoint/clip-vit-base-patch32'
lr: 1e-5

data:
  num_entity: 160933
  kb_img_folder: /data/RichpediaMEL/kb_image
  mention_img_folder: /data/RichpediaMEL/mention_image
  qid2id: /data/RichpediaMEL/qid2id.json
  entity: /data/RichpediaMEL/kb_entity.json
  train_file: /data/RichpediaMEL/RichpediaMEL_train.json
  dev_file: /data/RichpediaMEL/RichpediaMEL_dev.json
  test_file: /data/RichpediaMEL/RichpediaMEL_test.json

  batch_size: 128
  num_workers: 8
  text_max_length: 40

  eval_chunk_size: 6000
  eval_batch_size: 20
  embed_update_batch_size: 512

model:
  input_hidden_dim: 512
  input_image_hidden_dim: 768
  hidden_dim: 96
  dv: 96
  dt: 512
  TGLU_hidden_dim: 96
  IDLU_hidden_dim: 96
  CMFU_hidden_dim: 96

trainer:
  accelerator: 'gpu'
  devices: 1
  max_epochs: 20
  num_sanity_val_steps: 0
  check_val_every_n_epoch: 2
  log_every_n_steps: 30

All environmental information is:

absl-py                 1.4.0
aiohttp                 3.8.5
aiosignal               1.3.1
antlr4-python3-runtime  4.9.3
async-timeout           4.0.3
attrs                   23.1.0
cachetools              5.3.1
certifi                 2023.7.22
charset-normalizer      3.2.0
click                   8.1.7
filelock                3.12.3
frozenlist              1.4.0
fsspec                  2023.9.0
google-auth             2.22.0
google-auth-oauthlib    1.0.0
grpcio                  1.57.0
huggingface-hub         0.16.4
idna                    3.4
importlib-metadata      6.8.0
joblib                  1.3.2
Markdown                3.4.4
MarkupSafe              2.1.3
multidict               6.0.4
numpy                   1.24.4
oauthlib                3.2.2
omegaconf               2.2.3
packaging               23.1
Pillow                  9.3.0
pip                     23.2.1
protobuf                4.24.2
pyasn1                  0.5.0
pyasn1-modules          0.3.0
pyDeprecate             0.3.2
pytorch-lightning       1.7.7
PyYAML                  6.0.1
regex                   2023.8.8
requests                2.31.0
requests-oauthlib       1.3.1
rsa                     4.9
sacremoses              0.0.53
setuptools              68.0.0
six                     1.16.0
tensorboard             2.14.0
tensorboard-data-server 0.7.1
tokenizers              0.12.1
torch                   1.11.0
torchmetrics            0.11.0
tqdm                    4.66.1
transformers            4.18.0
typing_extensions       4.7.1
urllib3                 1.26.16
Werkzeug                2.3.7
wheel                   0.38.4
yarl                    1.9.2
zipp                    3.16.2

@zhiweihu1103
Copy link
Author

Thanks for information about how to run the low-resource setting experiments. I am very much looking forward to your numerical results, thank you for your efforts. In addition, regarding the reproduction of dataset RichpediaMEL, I think whether there may be some differences between the code you reproduced and the code uploaded, because I ran it twice on this dataset and the results were exactly the same as I above upload.

@pengfei-luo
Copy link
Owner

I can reproduce the results with the code we shared and the data we uploaded to OneDrive. Is there anything difference about the pretrained model? I saw you change the path. I use the one form huggingface.

SHA256: a63082132ba4f97a80bea76823f544493bffa8082296d62d71581a4feff1576f
MD5: 47767ea81d24718fcc0c8923607792a7

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 10, 2023

I download the pretrained clip from https://huggingface.co/openai/clip-vit-base-patch32/tree/main, I will replace the pytorch_model.bin with the link you provided, upload the results tomorrow morning.

@zhiweihu1103
Copy link
Author

But I found that the CLIP weighted link address I downloaded actually came out exactly the same as the one you provided after clicking pytorch_model.bin.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 11, 2023

Hi Pengfei. I may need further help from you, because I still have difficulty reproducing the results of dataset RichpediaMEL, even though I have used the CLIP pre-training URL you gave me (actually the same pre-trained model I used previous), I will upload it below my running logs on three datasets.
wikidiverse_another.txt
wikimel_another.txt
richpediamel_another.txt

@pengfei-luo
Copy link
Owner

This is very strange. The other two datasets work fine, only RichpediaMEL has an issue. Maybe you could double-check the RichpediaMEL.tar file you downloaded? I will share an online Wandb report later to show that everything is normal on my end.

RichpediaMEL.tar
MD5: 0f499eddde7582428947e45ebb94388f
SHA256: 36ac5703e4a9890238daedf039a7b2923a7c4b66c66a6b9cf788db40eabe0447

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 11, 2023

I will take a screenshot to share the information after decompressing the RichpediaMEL dataset.
image
the kb_image has 96073 files, and mention_images has 15852 files.

@zhiweihu1103
Copy link
Author

@pengfei-luo
Copy link
Owner

Could you please try upgrading transformers to version 4.27.1? I notice that the version of transformers might have an impact on the results, although I'm not sure what's causing the differences in results.

pip install transformers==4.27.1 --upgrade

@zhiweihu1103
Copy link
Author

Let me check.

@pengfei-luo
Copy link
Owner

The Wandb report is here.

@zhiweihu1103
Copy link
Author

You use the transformers==4.27.1 right?

@pengfei-luo
Copy link
Owner

Yes, in the Wandb report run, I used torch==1.11.0 and transformers==4.27.1. Other packages are the same as the requirements. I attempted to downgrade transformers to 4.18.0 and noticed that it did lead to a performance drop. I have no idea why this occurred.

@zhiweihu1103
Copy link
Author

If the performance degradation is due to transformers, then this should not be within the scope of our discussion. As long as the results can be reproduced, everything is good. I'll re-run and give my reproduction results.

@zhiweihu1103
Copy link
Author

Hi, Pengfei.
Firstly, I carefully compared the open source github code with the code you used on wandb. There are only some differences when the CLIP model executes the from_pretrained method.
The open source code on github is:

self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model).tokenizer

The code used by wandb is:

self.tokenizer = CLIPProcessor.from_pretrained(self.args.pretrained_model, local_files_only=True).tokenizer

But I think this is not the main problem, because after I added the local_files_only=True parameter, I found the result was the same.

Then, I created a requirements.txt environment that is exactly the same as wandb provided, and the running results are exactly the same as mine before, indicating that the difference in results is not caused by environmental problems.

So, I need to confirm now, is the RichpediaMEL dataset you are using the version you uploaded? Because now all the code and environment information are completely consistent, the performance difference is difficult to accept.

@zhiweihu1103
Copy link
Author

I will take a screenshot to share the information after decompressing the RichpediaMEL dataset. image the kb_image has 96073 files, and mention_images has 15852 files.

Here are the statistics for the RichpediaMEL dataset I used.

@pengfei-luo
Copy link
Owner

image

Maybe you can check if the MD5 values of all the files match mine?

ba086b054bf52d549f2a79503c76704a  kb_entity.json
8059b7aa89a9314d5dc38607a8685eeb  qid2id.json
831cdd92d70a93ea8a442798ec2fcde1  RichpediaMEL_dev.json
9e07e5e970e01079d256311e5ac10bd8  RichpediaMEL_test.json
e1d0b2adb2a1114cefa63860ffa23982  RichpediaMEL_train.json
961efc263bc8e2e7b257a28e8e703633  kb_image.zip
474c594ce8a95aa5dc9222365db0044e  mention_images.zip

@pengfei-luo
Copy link
Owner

The parameter local_files_only=True ensures that local files are used, and we have already confirmed that the model weights are consistent. I think this won't have any impact.

@zhiweihu1103
Copy link
Author

image

@zhiweihu1103
Copy link
Author

You can ignore the .pkl files, I found a difference between kb_image and mention_images.

@pengfei-luo
Copy link
Owner

Can you provide the MD5 values for kb_image.zip and mention_images.zip? I directly extracted these two ZIP files.

@zhiweihu1103
Copy link
Author

Wait a few minutes, I deleted the original file after decompressing it, and I need to download it again.

@zhiweihu1103
Copy link
Author

image

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented Sep 12, 2023

I can't think of any other reason why it is difficult to reproduce, because the size of the .zip file is the same, but the size after decompression is different?

@zhiweihu1103
Copy link
Author

I checked your running log on wandb , and your loss is obviously much lower than what I reproduced.

@pengfei-luo
Copy link
Owner

It seems all the files are normal. The difference in folder sizes may be due to differences in how the operating system organizes files.

@pengfei-luo
Copy link
Owner

Perhaps you can try changing some hyperparameters, such as the random seed, learning rate, and batch size, to see if they have an impact on the loss. If you have access to other servers, maybe you can try configuring the environment and running it on other servers. I don't know what's causing the inability to reproduce the results. All the results on my end are normal.

@zhiweihu1103
Copy link
Author

I can try it on other machines, but judging from my experience running your code, as long as the random seed is fixed, the results will be exactly the same every time.

@zhiweihu1103
Copy link
Author

I think it is necessary to give some new content. I originally ran the code on the V100 32G graphics card. Now I have tried it on the A6000 and found that the final result of the model is almost the same as that of the V100. Have you made any other modifications? Because the hyperparameters I use are completely consistent with the yaml you provided.

@pengfei-luo
Copy link
Owner

pengfei-luo commented Sep 12, 2023

As you can see on Wandb, I confirm that there are no changes made to the code about model and data processing. We have verified that the pre-trained model, environment configuration, code, and data are all consistent. I don't think the CUDA version and NVIDIA driver version should have such a significant impact.

@zhiweihu1103
Copy link
Author

This is a very vexing question because I can't think of any other reason that could cause this problem.

@pengfei-luo
Copy link
Owner

My thought is, perhaps you can make some adjustments to the parameters and then observe if there's a decrease in loss within a few iterations (compared to your previous loss). Maybe consider replacing the optimizer with stochastic gradient descent? I'm not sure.

@zhiweihu1103
Copy link
Author

I'll try adjusting the parameters tomorrow and keep in touch at any time.

@pengfei-luo
Copy link
Owner

pengfei-luo commented Sep 16, 2023

I discovered that the folder for the mention images should be named mention_images instead of mention_image. This would result in the absence of image data during the training process. This could also be the reason why you were unable to reproduce the original results.

@pengfei-luo
Copy link
Owner

You need to manually modify the following line in the config file for RichpediaMEL.

mention_img_folder: /YOUR_PATH/RichpediaMEL/mention_image

@zhiweihu1103
Copy link
Author

It is indeed like this, a very subtle problem, thank you very much, I will re-run the code and give the final result.

@zhiweihu1103
Copy link
Author

As a reminder, would you mind uploading the numerical results of Figure 4?

@pengfei-luo
Copy link
Owner

I have just updated the detailed results in the README file.

@zhiweihu1103
Copy link
Author

Great, Thx.

@zhiweihu1103
Copy link
Author

Hi, Pengfei. I have reproduced the results, thanks for your solution. Good luck. I will close the issue.

@zhangzef
Copy link

我似乎无法复现wikimel中的结果,我的超参数文件使用的是作者在GitHub中提供的文件,能麻烦您告诉我您在复现的过程中有什么需要注意的吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants