Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

CDchenlin · 2023-09-25T05:36:14Z

Hello,

While fine-tuning the GLIP model on my custom dataset, I encountered the following issue:

'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f1838e7c580>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: eb487da1-5639-443c-88d4-5b9382ff21f6)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/config.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182c9b5d90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 4562ddc9-89e5-444d-9323-d5ac8fd27e7f)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/tokenizer.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182a57d4c0>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 094d0780-2efd-4754-b896-defb03d7709d)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182c9b5970>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: f259081b-8a16-4c7b-8136-56d0bfe5af63)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/config.json

I understand that this is a connection error related to Hugging Face. However, since I don’t have administrator privileges on my server, I’m wondering if there are any alternative solutions to this problem. For instance, would it be possible for me to download the weights using my PC and then transfer them to the server?

Despite the fact that I was unable to download the pretrained model, the train.py script continues to execute, resulting in the following error:

loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 133, in <module>
    main()
  File "tools/train.py", line 129, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1703, in train
    self._train_loop = self.build_train_loop(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1495, in build_train_loop
    loop = LOOPS.build(
  File "~anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 44, in __init__
    super().__init__(*args, **kwargs)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 245, in __init__
    self.full_init()
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 82, in full_init
    self.data_bytes, self.data_address = self._serialize_data()
  File "~/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 765, in _serialize_data
    data_bytes = np.concatenate(data_list)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

I’m curious as to why this occurred. Could it be related to the unsuccessful download of the pretrained model, or could it be an issue with my custom dataset?

Thank you for your assistance.

The text was updated successfully, but these errors were encountered:

hhaAndroid · 2023-09-25T05:52:08Z

@CDchenlin Hello, there is a very simple solution. You just need to download the corresponding weights to your local computer, then upload them to the server, and finally set the lang_model_name parameter to your local path.

from transformers import BertConfig, BertModel
from transformers import AutoTokenizer

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", add_pooling_layer=False, config=config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

config.save_pretrained("your path/bert-base-uncased")
model.save_pretrained("your path/bert-base-uncased")
tokenizer.save_pretrained("your path/bert-base-uncased")

then

# lang_model_name = 'bert-base-uncased'
lang_model_name = 'your path/bert-base-uncased'

hhaAndroid · 2023-09-25T05:53:54Z

@CDchenlin If the pretrained weights cannot be loaded, you can simply download them to your local machine, then upload them to the server, and finally modify the load_from address.

CDchenlin · 2023-09-25T06:32:18Z

@hhaAndroid Hello, thank you so much for your assistance. Could you possibly provide more specific advice? I attempted the following modifications at here code, but they were unsuccessful.

def __init__(self,
                 name: str = 'bert-base-uncased',
                 max_tokens: int = 256,
                 pad_to_max: bool = True,
                 use_sub_sentence_represent: bool = False,
                 special_tokens_list: list = None,
                 add_pooling_layer: bool = False,
                 num_layers_of_embedded: int = 1,
                 use_checkpoint: bool = False,
                 **kwargs) -> None:

was modified into

def __init__(self,
                 name: str = 'my path/bert-base-uncased',
                 max_tokens: int = 256,
                 pad_to_max: bool = True,
                 use_sub_sentence_represent: bool = False,
                 special_tokens_list: list = None,
                 add_pooling_layer: bool = False,
                 num_layers_of_embedded: int = 1,
                 use_checkpoint: bool = False,
                 **kwargs) -> None:

However the problem remains.

Thank you so much for your advise!

hhaAndroid · 2023-09-25T06:36:46Z

@CDchenlin In fact, my server is also unable to connect to the internet, so I use the method mentioned above and modify the configuration at this to a local path.

CDchenlin · 2023-09-25T07:27:01Z

Thank you so much, the problem is solved, however, the second problem remains as:

loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 133, in <module>
    main()
  File "tools/train.py", line 129, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1703, in train
    self._train_loop = self.build_train_loop(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1495, in build_train_loop
    loop = LOOPS.build(
  File "~anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 44, in __init__
    super().__init__(*args, **kwargs)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 245, in __init__
    self.full_init()
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 82, in full_init
    self.data_bytes, self.data_address = self._serialize_data()
  File "~/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 765, in _serialize_data
    data_bytes = np.concatenate(data_list)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

hhaAndroid · 2023-09-25T07:34:39Z

@CDchenlin This error is usually caused by the failure to read the data. In most cases, it is because the metainfo is written incorrectly, resulting in the JSON data being read but the dataset being empty due to misconfigured category fields. Did you use custom data?

CDchenlin · 2023-09-25T07:45:35Z

Yes, I used the custom data. To be honest, I am new to mmdetection. I have read and followed the documentation at this link Documentation to prepare my dataset and config file. I would greatly appreciate it if you could provide me with some additional information regarding this issue, as I found the resources related to mmdet>3.0 scarce.

CDchenlin · 2023-09-25T07:54:38Z

I believe the category information has already been defined in the JSON file in

"categories": [
        {
            "supercategory": "example supercategory",
            "id": 1,
            "name": "example category"
        }
    ]

So I omitted metainfo. Do you mean I should give specify of each category in the metainfo?

hhaAndroid · 2023-09-25T09:58:15Z

@CDchenlin https://github.com/open-mmlab/mmyolo/blob/main/docs/zh_cn/get_started/15_minutes_object_detection.md. same as mmdet

CDchenlin · 2023-09-26T07:00:00Z

@hhaAndroid Thank you for your invaluable advice. I have successfully organized my dataset and initiated training using the rtmdet model. However, upon transitioning to the GLIP model using the configuration file configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py, I encountered the following issues:

Traceback (most recent call last):
  File "~/Code/mmdetection-dev-3.x/tools/train.py", line 138, in <module>
    main()
  File "~/Code/mmdetection-dev-3.x/tools/train.py", line 134, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
    model = self.train_loop.run()  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
    losses = self._run_forward(data, mode='loss')  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
    results = self(**data, mode=mode)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/glip.py", line 270, in loss
    text_prompts = [
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/glip.py", line 271, in <listcomp>
    data_samples.text for data_samples in batch_data_samples
AttributeError: 'DetDataSample' object has no attribute 'text'

I have attempted multiple potential solutions, including transitioning to the official cat dataset referenced in this notebook. Despite these efforts, the error persists.

I would be immensely grateful for any advice or suggestions you might have regarding this issue.

Thank you very much for your assistance. I apologize if my repeated inquiries have caused any inconvenience.

hhaAndroid · 2023-09-27T06:14:45Z

@CDchenlin 你的 pipeline 少了 text key

    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor', 'flip', 'flip_direction', 'text',
                   'custom_entities'))

hhaAndroid closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023 •

edited

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023 •

edited

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 26, 2023

hhaAndroid commented Sep 27, 2023

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

Comments

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023 • edited

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 25, 2023 • edited

CDchenlin commented Sep 25, 2023

hhaAndroid commented Sep 25, 2023

CDchenlin commented Sep 26, 2023

hhaAndroid commented Sep 27, 2023

CDchenlin commented Sep 25, 2023 •

edited

CDchenlin commented Sep 25, 2023 •

edited