Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware configuration problem #3

Closed
lincgcg opened this issue Mar 25, 2022 · 6 comments
Closed

Hardware configuration problem #3

lincgcg opened this issue Mar 25, 2022 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@lincgcg
Copy link

lincgcg commented Mar 25, 2022

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory? how many GPUs?what type of GPUs? because we found that we were running out of memory during the reproduction process.

@linwhitehat
Copy link
Owner

We encountered some problems in the process of reproducing the model.

Can you tell us your hardware configuration, for example, how much memory? how many GPUs?what type of GPUs? because we found that we were running out of memory during the reproduction process.

Thank you for following our work.
The details of our experimental environment are as follows:
Available memory is 502 GB
Available GPU are Tesla V100S (32GB) x 4

@linwhitehat linwhitehat added the help wanted Extra attention is needed label Mar 29, 2022
@lincgcg
Copy link
Author

lincgcg commented Apr 1, 2022

Thanks for your reply, we have configured a similar configuration to yours:
Available memory is 503 GB
Available GPU are Tesla V100S (32GB) x 8
But we have some questions in the process of implementing Pre-process:

  1. Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
  2. We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation?

@linwhitehat
Copy link
Owner

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

  1. Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
  2. We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation?
  1. The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.
  2. I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

@lincgcg
Copy link
Author

lincgcg commented Apr 1, 2022

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

  1. Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
  2. We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation?
  1. The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.
  2. I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G.
But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

  • In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:

Screen Shot 2022-04-01 at 21 01 32

Screen Shot 2022-04-01 at 20 39 32

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.
  • in pre-training, We got the following results:

Screen Shot 2022-04-01 at 21 09 07

Screen Shot 2022-04-01 at 21 09 49

Screen Shot 2022-04-01 at 21 10 04

We have no solution to this problem. Did you encounter this problem in the process of implementation?

Looking forward to your reply!

@linwhitehat
Copy link
Owner

Thanks for your reply, we have configured a similar configuration to yours: Available memory is 503 GB Available GPU are Tesla V100S (32GB) x 8 But we have some questions in the process of implementing Pre-process:

  1. Under your configuration, how long does it take to complete the second step in the pre-process? , We've spent at least 24 hours now, but the program still hasn't finished running. We would also like you to tell us how long the other steps take to run.
  2. We found that in the process of running the program(the second step in the pre-process), it only used nearly 502G of memory in the first few hours, but after about ten hours, the usage of the memory is only 10G-30G. Did you encounter this situation?
  1. The second step is to generate the pre-training dataset. The time cost of this step depends on the size of the corpora, and in our experiments, it took around 1-2 hours. I have checked and updated the codes, and suggest you replace the files in uer/target/bert_target.py and uer/utils/data.py with the new one.
  2. I have encountered a similar situation, and if this problem still exists after updating the code file, you can give feedback and we will share the generated pre-trained dataset.

According to your suggestion, we successfully completed the second step in the pre-process, and got the dataset.pt file with a size of about 28G. But unfortunately, we encountered some problems in the next task, and we tried many methods without success:

  • In the third step of the pre-process, we modified the file paths in the datasets/main.py file. We got the following results:
Screen Shot 2022-04-01 at 21 01 32 Screen Shot 2022-04-01 at 20 39 32

We think it may be that there is no packet/splitcap/ under the path ../ET-BERT-main/datasets/cstnet-tls1.3/, if possible, please check if the code needs this file.

  • in pre-training, We got the following results:
Screen Shot 2022-04-01 at 21 09 07 Screen Shot 2022-04-01 at 21 09 49 Screen Shot 2022-04-01 at 21 10 04

We have no solution to this problem. Did you encounter this problem in the process of implementation?
Looking forward to your reply!

Thanks for your feedback, we have updated the codes and readme to solve the problems.

@GuisengLiu
Copy link

GuisengLiu commented Apr 6, 2022

Can you tell us your other specific software configuration, for example, which vision of python? CUDA=10.2 or 11.1?
Because we may met some problems during the reproduction process. We only noticed pytorch=1.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants