Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reproduce BraTS 2020 results. #21

Open
gary-wang55 opened this issue Jun 21, 2023 · 23 comments
Open

Can not reproduce BraTS 2020 results. #21

gary-wang55 opened this issue Jun 21, 2023 · 23 comments

Comments

@gary-wang55
Copy link

gary-wang55 commented Jun 21, 2023

Hi, thanks for sharing the code base. I try to reproduce the results on BraTS 2020 dataset, but the results I got are much worse than the paper. Here are the details:

For model training:
wt is 0.8498, tc is 0.4873, et is 0.4150, mean_dice is 0.5840

The tensorboard files are:
brats20-wt
brats20-tc
brats20-et
brats20-mean-dice
brats20-train-loss

The final model files are:
brats20-model-file

My settings are default settings:
env = "DDP"
max_epoch = 300
batch_size = 2
num_gpus = 4
GPU type: A100

Then I use the best model (best_model_0.5975.pt) to do evaluation on the test set, and I got:
brats20-test-dice
brats20-test-hd95

My python environment is:
Python 3.8.10
monai 1.1.0
numpy 1.22.2
SimpleITK 2.2.1
torch 1.13.0a0+936e930

The most strange thing is the segmentation performance of TC and ET is quite bad. Do you have any idea why the performance is so weird, and could you give me some advice on model training? BTW, could you please share the conda env file and your model weights for BraTS 2020 dataset? If you can create and share a docker image I think that could be perfect! Thanks.

@gary-wang55
Copy link
Author

Also I train and test the code on BTCV dataset, the results are quite similar with the paper. Here are the details:

The tensor board files are:
btcv-train-dice
btcv-train-loss

The final model files are:
btcv-model-file

The results on test set are (evaluated by the best model (best_model_0.7873.pt)):
btcv-test-dice

@920232796
Copy link
Contributor

Please check your BraTS2020 data, and also you can plot some input data in the training phase (in the training_step function) to confirm that the training data is right.

@Devil-Ideal
Copy link

first of all, thank you to the authors for your excellent work ! and I might reproduce the results in the paper.my results are as follows,even better:
image

@920232796
Copy link
Contributor

Thank you for your experiments!

@gary-wang55
Copy link
Author

first of all, thank you to the authors for your excellent work ! and I might reproduce the results in the paper.my results are as follows,even better: image

Hi, thanks for sharing your results @Devil-Ideal . Could you please share your environment settings? for example, your python version and torch version. Also, did you train the model on 4xV100 GPUs and keep hyper-parametes as defaults? Here is my test result:
image

@Devil-Ideal
Copy link

of course, the environment is as follows. I only use 3 RTX 3090 GPUs and changed the frequency of verification to verify every 30 epochs. Other hyper-parameters are as default. I guess there is a problem with your dataset. I download the dataset from Kaggle instead of the official website(here is the link: https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation), since my application haven't been approved.
image
image
image

@gary-wang55
Copy link
Author

Hi Devil @Devil-Ideal , many thanks for your quick reply, I will try the dataset that you download from kaggle to see whether I can achieve the results.

@GVJHK
Copy link

GVJHK commented Sep 28, 2023

Hi Devil @Devil-Ideal , many thanks for your quick reply, I will try the dataset that you download from kaggle to see whether I can achieve the results.

Hello, have you reproduced the results of the successful paper?

@JoeQvQ
Copy link

JoeQvQ commented Nov 3, 2023

Turning to the kaggle version seems useless for me...

@GVJHK
Copy link

GVJHK commented Nov 3, 2023 via email

@iffthomas
Copy link

iffthomas commented Nov 15, 2023

I am encountering the same problem with reproducing the benchmark. sofar I haven't changed to the kaggle dataversion
I too have results in the same range:
wt is 0.86446, tc is 0.49546, et is 0.45163 , mean_dice is 0.603 for the best performing epoch during training.
Did somethings resolve the issue for you guys?

I'm not training on 4 v100's but I doubt this is the problem
Will run the samething on the kaggle dataset

@Devil-Ideal
Copy link

I am encountering the same problem with reproducing the benchmark. sofar I haven't changed to the kaggle dataversion I too have results in the same range: wt is 0.86446, tc is 0.49546, et is 0.45163 , mean_dice is 0.603 for the best performing epoch during training. Did somethings resolve the issue for you guys?

I'm not training on 4 v100's but I doubt this is the problem Will run the samething on the kaggle dataset

How many GPUs did you use and what was the batch size? In fact, in the code, batch size refers to the batch size on each GPU. Therefore, it is best to be the same as the original paper, the total batch size is 2*4

@iffthomas
Copy link

iffthomas commented Nov 15, 2023

Thank you for the suggestion.
I only trained on 1 GPU sofar, I actually haven't change the batch size..will rerun it with batchsize set to 2 since I only have 1GPU available would you consider this approach feasible?

Regarding the epochs, training happens with 300 epochs right?

@Devil-Ideal
Copy link

Thank you for the suggestion. I only trained on 1 GPU sofar, I actually haven't change the batch size..will rerun it with batchsize set to 2 since I only have 1GPU available would you consider this approach feasible?

Regarding the epochs, training happens with 300 epochs right?

yep,300 epochs are enough. And if you don't have enough GPUs, I think even total batch size == 4 or 6, the performence will be much better

@iffthomas
Copy link

Perfect! Thank you for taking your time I will update you once I've ran the suggested trials

@iffthomas
Copy link

I double checked and saw that I've already trained with batchsize = 2 & one GPU. Seems a bit odd that I don't get the same results.

@Devil-Ideal
Copy link

I double checked and saw that I've already trained with batchsize = 2 & one GPU. Seems a bit odd that I don't get the same results.

if the total batchsize = 2, it's normal, since the original setting is equal 8, so just ues bigger batchsize (e.g 4 GPUs with batchsize = 2)

@iffthomas
Copy link

Increasing the Batchsize wouldn't improve the results... I was wondering if you have a requirements.txt file or something to see what kind of configurations for python, pytorch etc you are using.

@Devil-Ideal
Copy link

Increasing the Batchsize wouldn't improve the results... I was wondering if you have a requirements.txt file or something to see what kind of configurations for python, pytorch etc you are using.

It's weired, since I have run it several times, and the results are not bad. Here is my configurations:
absl-py==2.0.0
blobfile==2.1.0
certifi==2023.7.22
charset-normalizer==3.3.1
filelock==3.13.0
grpcio==1.59.0
idna==3.4
importlib-metadata==6.8.0
joblib==1.3.2
lxml==4.9.3
Markdown==3.5
MarkupSafe==2.1.3
MedPy==0.4.0
monai==1.3.0
mpi4py
numpy==1.23.4
Pillow==10.1.0
protobuf==3.20.0
pycryptodomex==3.19.0
PyYAML==6.0.1
requests==2.31.0
scikit-learn==1.3.2
scipy==1.11.3
SimpleITK==2.3.0
six==1.16.0
tensorboard==1.15.0
threadpoolctl==3.2.0
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
tqdm==4.66.1
typing_extensions==4.8.0
urllib3==2.0.7
Werkzeug==3.0.1
zipp==3.17.0

@YOLO6995
Copy link

I use kaggle's BraTS2020 data set, because the official did not reply, and converted the dependency library version for many times, but the effect is the same as the host, I would like to ask if the friend with better training effect is the official data set used.Thanks.

您好,感谢您分享代码库。我尝试在 BraTS 2020 数据集上重现结果,但得到的结果比论文差很多。详细信息如下:

对于模型训练: wt为0.8498,tc为0.4873,et为0.4150,mean_dice为0.5840

张量板文件是: 小子20-wt brats20-tc brats20-等 brats20 平均骰子 brats20-火车损失

最终的模型文件为: brats20-模型文件

我的设置是默认设置: env = "DDP" max_epoch = 300 batch_size = 2 num_gpus = 4 GPU 类型:A100

然后我使用最佳模型(best_model_0.5975.pt)对测试集进行评估,得到: brats20 测试骰子 brats20-测试-hd95

我的python环境是: Python 3.8.10 monai 1.1.0 numpy 1.22.2 SimpleITK 2.2.1 torch 1.13.0a0+936e930

最奇怪的是TC和ET的分割性能相当糟糕。您知道为什么性能如此奇怪吗?您能给我一些关于模型训练的建议吗?顺便说一句,您能否分享一下 BraTS 2020 数据集的 conda env 文件和模型权重?如果您可以创建并共享 docker 镜像,我认为那就完美了!谢谢。

@ge-xing
Copy link
Owner

ge-xing commented Feb 1, 2024

18340097191, zxing565@connect.hkust-gz.edu.cn , you can add my wechat or email me to further discuss these problems. Recently, I also will open-source the v2 version of Diff-UNet. Welcome to try it.

@gs369369
Copy link

gs369369 commented Mar 7, 2024

我用的是kaggle的BraTS2020数据集,因为官方没有回复,并且转换了很多次依赖库版本,但是效果和宿主一样,请问训练效果比较好的朋友是不是官方数据设置已使用。谢谢。

您好,感谢您分享代码库。我尝试在 BraTS 2020 数据集上删除结果,但得到的结果比论文差很多。详细信息如下:
对于模型训练:wt为0.8498,tc为0.4873,et为0.4150,mean_dice为0.5840
张量板文件为:小子20-wt brats20-tc 小子20等 brats20 平均分割子 brats20-火车损失
最终的模型文件为:brats20-模型文件
我的设置是默认设置: env = "DDP" max_epoch = 300 batch_size = 2 num_gpus = 4 GPU 类型:A100
然后我使用最佳模型(best_model_0.5975.pt)对测试集进行评估,得到:brats20 测试骰子 brats20-测试-hd95
我的python环境是: Python 3.8.10 monai 1.1.0 numpy 1.22.2 SimpleITK 2.2.1 torch 1.13.0a0+936e930
最奇怪的是 TC 和 ET 的分割性能相当惊人。您知道为什么性能如此奇怪吗?您能给我一些关于模型训练的建议吗?顺便说一句,您能否分享一下 BraTS 2020 数据集的 conda env 文件和模型权重?如果你可以创建并共享docker镜像,我想那就完美了!谢谢。

请问您解决了吗

@ge-xing
Copy link
Owner

ge-xing commented Mar 7, 2024

Recently, I have also reproduced my code on brats2020 dataset, this is the training process:
image
I use 4 4090GPUs to run and batch_size is set as 2.

If you still have questions about it, please contact with me by email: zxing565@connect.hkust-gz.edu.cn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants