Hardware configuration problem #3

lincgcg · 2022-03-25T11:17:50Z

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

linwhitehat · 2022-03-29T02:15:26Z

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

Thank you for following our work.
The details of our experimental environment are as follows:
Available memory is 502 GB
Available GPU are Tesla V100S (32GB) x 4

lincgcg · 2022-04-01T02:16:59Z

Thanks for your reply, we have configured a similar configuration to yours:
Available memory is 503 GB
Available GPU are Tesla V100S (32GB) x 8
But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

linwhitehat · 2022-04-01T06:29:05Z

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.
I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

lincgcg · 2022-04-01T13:12:29Z

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.

I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G.
But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.

in pre-training, We got the following results:

We have no solution to this problem. Did you encounter this problem in the process of implementation?

Looking forward to your reply!

linwhitehat · 2022-04-03T05:34:11Z

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.

We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation？

The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.

I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G. But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.

in pre-training, We got the following results:

We have no solution to this problem. Did you encounter this problem in the process of implementation?
Looking forward to your reply!

Thanks for your feedback, we have updated the codes and readme to solve the problems.

GuisengLiu · 2022-04-06T07:29:47Z

Can you tell us your other specific software configuration, for example, which vision of python? CUDA=10.2 or 11.1?
Because we may met some problems during the reproduction process. We only noticed pytorch=1.8

linwhitehat added the help wanted Extra attention is needed label Mar 29, 2022

linwhitehat closed this as completed Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware configuration problem #3

Hardware configuration problem #3

lincgcg commented Mar 25, 2022 •

edited

Loading

linwhitehat commented Mar 29, 2022

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

lincgcg commented Apr 1, 2022

linwhitehat commented Apr 1, 2022

lincgcg commented Apr 1, 2022

linwhitehat commented Apr 3, 2022

GuisengLiu commented Apr 6, 2022 •

edited

Loading

Hardware configuration problem #3

Hardware configuration problem #3

Comments

lincgcg commented Mar 25, 2022 • edited Loading

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

linwhitehat commented Mar 29, 2022

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory？ how many GPUs？what type of GPUs？ because we found that we were running out of memory during the reproduction process.

lincgcg commented Apr 1, 2022

linwhitehat commented Apr 1, 2022

lincgcg commented Apr 1, 2022

linwhitehat commented Apr 3, 2022

GuisengLiu commented Apr 6, 2022 • edited Loading

lincgcg commented Mar 25, 2022 •

edited

Loading

GuisengLiu commented Apr 6, 2022 •

edited

Loading