error while running examples - segmentation fault #7

alexandremarcil · 2021-01-05T22:17:38Z

Hi,
I'm trying to run the example. I created the dnabest env and downloaded the packages and files. I get an error at step 3.3. while trying to run the Fine-tune with pre-trained model (DNABERT6). I get the following error message:

<class 'transformers.tokenization_dna.DNATokenizer'>
01/05/2021 17:08:16 - INFO - transformers.tokenization_utils - loading file https://raw.githubusercontent.com/jerryji1993/DNABERT/master/src/transformers/dnabert-config/bert-config-6/vocab.txt from cache at /home/mcb/users/zipcode/.cache/torch/transformers/ea1474aad40c1c8ed4e1cb7c11345ddda6df27a857fb29e1d4c901d9b900d32d.26f8bd5a32e49c2a8271a46950754a4a767726709b7741c68723bc1db840a87e
01/05/2021 17:08:16 - INFO - transformers.modeling_utils - loading weights file /home/mcb/users/zipcode/code/DNABERT/6-new-12w-0/pytorch_model.bin
Segmentation fault (core dumped)

I have tried redownloading the pretrained model, but got the same error. Strangely, I do not get this error locally on my mac, but without any GPU, it would take too long to run. I get this error on a linux server.

Any ideas on how to fix this? Thanks!

alexandremarcil · 2021-01-05T23:48:08Z

I just saw thread #4 (#4) had a similar issue and that the cuda drivers were the problem. I have tried reinstalling them with
conda install -c anaconda cudatoolkit
conda install -c anaconda cudnn
(there was a long list of conflicts...)
but it did not help. I am still getting the segmentation fault.

I have cuda v10.2
(dnabert) zipcode@mcb-gpu1:~/code/DNABERT$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2

I've never had such cuda issues on other PyTorch projects, so I don't really know how to troubleshoot this problem.

update: I also tried with the --no_cuda arg but I am getting still the same segmentation fault.

Zhihan1996 · 2021-01-09T21:09:32Z

The problem happens during loading the model. Do you have any unavailable GPUs. Can you try to specify GPUs by adding CUDA_VISIBLE_GPUS=0,1 before python run_.... ?

alexandremarcil · 2021-01-09T23:34:54Z

Yes, there are 10 GPUs on the server and some are used. The fix you proposed did not change anything. I hard-coded a free GPU (8) in run_finetune.py to check if that was the problem and still got the same error. Could you please explain what does the local_rank arg does, I am not sure I understand correctly. Here's the error I get:

01/09/2021 18:29:41 - WARNING - main - Process rank: -1, device: cuda:8, n_gpu: 1, distributed training: False, 16-bits training: False
01/09/2021 18:29:41 - INFO - transformers.configuration_utils - loading configuration file /home/mcb/users/zipcode/code/DNABERT/6-new-12w-0/config.json
01/09/2021 18:29:41 - INFO - transformers.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"do_sample": false,
"eos_token_ids": 0,
"finetuning_task": "dnaprom",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_beams": 1,
"num_hidden_layers": 12,
"num_labels": 2,
"num_return_sequences": 1,
"num_rnn_layer": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"pruned_heads": {},
"repetition_penalty": 1.0,
"rnn": "lstm",
"rnn_dropout": 0.0,
"rnn_hidden": 768,
"split": 10,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 4101
}

============================================================
<class 'transformers.tokenization_dna.DNATokenizer'>
01/09/2021 18:29:41 - INFO - transformers.tokenization_utils - loading file https://raw.githubusercontent.com/jerryji1993/DNABERT/master/src/transformers/dnabert-config/bert-config-6/vocab.txt from cache at /home/mcb/users/zipcode/.cache/torch/transformers/ea1474aad40c1c8ed4e1cb7c11345ddda6df27a857fb29e1d4c901d9b900d32d.26f8bd5a32e49c2a8271a46950754a4a767726709b7741c68723bc1db840a87e
01/09/2021 18:29:41 - INFO - transformers.modeling_utils - loading weights file /home/mcb/users/zipcode/code/DNABERT/6-new-12w-0/pytorch_model.bin
./test.sh: line 32: 28758 Segmentation fault (core dumped) python run_finetune.py --model_type dna --tokenizer_name=dna$KMER --model_name_or_path $MODEL_PATH --task_name dnaprom --do_train --do_eval --data_dir $DATA_PATH --max_seq_length 75 --per_gpu_eval_batch_size=16 --per_gpu_train_batch_size=16 --learning_rate 2e-4 --num_train_epochs 3.0 --output_dir $OUTPUT_PATH --evaluate_during_training --logging_steps 100 --save_steps 4000 --warmup_percent 0.1 --hidden_dropout_prob 0.1 --overwrite_output --weight_decay 0.01 --n_process 8

hkmztrk · 2021-01-13T13:13:40Z

@alexandremarcil this might help:
huggingface/transformers#5359 (comment)

alexandremarcil · 2021-01-18T16:31:05Z

Thanks @hkmztrk. I downgraded the sentencepiece to 0.1.91 and I do not have the segmentation fault anymore. but I have other issues :(

here's the fix for anyone else having this issue:
pip install sentencepiece==0.1.91

alexandremarcil closed this as completed Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error while running examples - segmentation fault #7

error while running examples - segmentation fault #7

alexandremarcil commented Jan 5, 2021

alexandremarcil commented Jan 5, 2021 •

edited

Loading

Zhihan1996 commented Jan 9, 2021

alexandremarcil commented Jan 9, 2021

hkmztrk commented Jan 13, 2021

alexandremarcil commented Jan 18, 2021

error while running examples - segmentation fault #7

error while running examples - segmentation fault #7

Comments

alexandremarcil commented Jan 5, 2021

alexandremarcil commented Jan 5, 2021 • edited Loading

Zhihan1996 commented Jan 9, 2021

alexandremarcil commented Jan 9, 2021

hkmztrk commented Jan 13, 2021

alexandremarcil commented Jan 18, 2021

alexandremarcil commented Jan 5, 2021 •

edited

Loading