Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

Closed
CDchenlin opened this issue Sep 25, 2023 · 11 comments
Closed

Addressing Hugging Face-Related Errors During the Fine-Tuning of GLIP #10968

CDchenlin opened this issue Sep 25, 2023 · 11 comments

Comments

@CDchenlin
Copy link

Hello,

While fine-tuning the GLIP model on my custom dataset, I encountered the following issue:

'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f1838e7c580>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: eb487da1-5639-443c-88d4-5b9382ff21f6)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/config.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182c9b5d90>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 4562ddc9-89e5-444d-9323-d5ac8fd27e7f)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/tokenizer.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182a57d4c0>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 094d0780-2efd-4754-b896-defb03d7709d)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json
'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /bert-base-uncased/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f182c9b5970>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: f259081b-8a16-4c7b-8136-56d0bfe5af63)')' thrown while requesting HEAD https://huggingface.co/bert-base-uncased/resolve/main/config.json

I understand that this is a connection error related to Hugging Face. However, since I don’t have administrator privileges on my server, I’m wondering if there are any alternative solutions to this problem. For instance, would it be possible for me to download the weights using my PC and then transfer them to the server?

Despite the fact that I was unable to download the pretrained model, the train.py script continues to execute, resulting in the following error:

loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 133, in <module>
    main()
  File "tools/train.py", line 129, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1703, in train
    self._train_loop = self.build_train_loop(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1495, in build_train_loop
    loop = LOOPS.build(
  File "~anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 44, in __init__
    super().__init__(*args, **kwargs)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 245, in __init__
    self.full_init()
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 82, in full_init
    self.data_bytes, self.data_address = self._serialize_data()
  File "~/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 765, in _serialize_data
    data_bytes = np.concatenate(data_list)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

I’m curious as to why this occurred. Could it be related to the unsuccessful download of the pretrained model, or could it be an issue with my custom dataset?

Thank you for your assistance.

@hhaAndroid
Copy link
Collaborator

@CDchenlin Hello, there is a very simple solution. You just need to download the corresponding weights to your local computer, then upload them to the server, and finally set the lang_model_name parameter to your local path.

from transformers import BertConfig, BertModel
from transformers import AutoTokenizer

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", add_pooling_layer=False, config=config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

config.save_pretrained("your path/bert-base-uncased")
model.save_pretrained("your path/bert-base-uncased")
tokenizer.save_pretrained("your path/bert-base-uncased")

then

# lang_model_name = 'bert-base-uncased'
lang_model_name = 'your path/bert-base-uncased'

@hhaAndroid
Copy link
Collaborator

@CDchenlin If the pretrained weights cannot be loaded, you can simply download them to your local machine, then upload them to the server, and finally modify the load_from address.

@CDchenlin
Copy link
Author

CDchenlin commented Sep 25, 2023

@hhaAndroid Hello, thank you so much for your assistance. Could you possibly provide more specific advice? I attempted the following modifications at here code, but they were unsuccessful.

def __init__(self,
                 name: str = 'bert-base-uncased',
                 max_tokens: int = 256,
                 pad_to_max: bool = True,
                 use_sub_sentence_represent: bool = False,
                 special_tokens_list: list = None,
                 add_pooling_layer: bool = False,
                 num_layers_of_embedded: int = 1,
                 use_checkpoint: bool = False,
                 **kwargs) -> None:

was modified into

def __init__(self,
                 name: str = 'my path/bert-base-uncased',
                 max_tokens: int = 256,
                 pad_to_max: bool = True,
                 use_sub_sentence_represent: bool = False,
                 special_tokens_list: list = None,
                 add_pooling_layer: bool = False,
                 num_layers_of_embedded: int = 1,
                 use_checkpoint: bool = False,
                 **kwargs) -> None:

However the problem remains.

Thank you so much for your advise!

@hhaAndroid
Copy link
Collaborator

@CDchenlin In fact, my server is also unable to connect to the internet, so I use the method mentioned above and modify the configuration at this to a local path.

@CDchenlin
Copy link
Author

Thank you so much, the problem is solved, however, the second problem remains as:

loading annotations into memory...
Done (t=0.22s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 133, in <module>
    main()
  File "tools/train.py", line 129, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1703, in train
    self._train_loop = self.build_train_loop(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1495, in build_train_loop
    loop = LOOPS.build(
  File "~anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 44, in __init__
    super().__init__(*args, **kwargs)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 245, in __init__
    self.full_init()
  File "~/Code/mmdetection-dev-3.x/mmdet/datasets/base_det_dataset.py", line 82, in full_init
    self.data_bytes, self.data_address = self._serialize_data()
  File "~/envs/openmmlab/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 765, in _serialize_data
    data_bytes = np.concatenate(data_list)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

@hhaAndroid
Copy link
Collaborator

@CDchenlin This error is usually caused by the failure to read the data. In most cases, it is because the metainfo is written incorrectly, resulting in the JSON data being read but the dataset being empty due to misconfigured category fields. Did you use custom data?

@CDchenlin
Copy link
Author

CDchenlin commented Sep 25, 2023

Yes, I used the custom data. To be honest, I am new to mmdetection. I have read and followed the documentation at this link Documentation to prepare my dataset and config file. I would greatly appreciate it if you could provide me with some additional information regarding this issue, as I found the resources related to mmdet>3.0 scarce.

@CDchenlin
Copy link
Author

I believe the category information has already been defined in the JSON file in

"categories": [
        {
            "supercategory": "example supercategory",
            "id": 1,
            "name": "example category"
        }
    ]

So I omitted metainfo. Do you mean I should give specify of each category in the metainfo?

@hhaAndroid
Copy link
Collaborator

@CDchenlin
Copy link
Author

@hhaAndroid Thank you for your invaluable advice. I have successfully organized my dataset and initiated training using the rtmdet model. However, upon transitioning to the GLIP model using the configuration file configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py, I encountered the following issues:

Traceback (most recent call last):
  File "~/Code/mmdetection-dev-3.x/tools/train.py", line 138, in <module>
    main()
  File "~/Code/mmdetection-dev-3.x/tools/train.py", line 134, in main
    runner.train()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
    model = self.train_loop.run()  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
    losses = self._run_forward(data, mode='loss')  # type: ignore
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
    results = self(**data, mode=mode)
  File "~/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/glip.py", line 270, in loss
    text_prompts = [
  File "~/Code/mmdetection-dev-3.x/mmdet/models/detectors/glip.py", line 271, in <listcomp>
    data_samples.text for data_samples in batch_data_samples
AttributeError: 'DetDataSample' object has no attribute 'text'

I have attempted multiple potential solutions, including transitioning to the official cat dataset referenced in this notebook. Despite these efforts, the error persists.

I would be immensely grateful for any advice or suggestions you might have regarding this issue.

Thank you very much for your assistance. I apologize if my repeated inquiries have caused any inconvenience.

@hhaAndroid
Copy link
Collaborator

@CDchenlin 你的 pipeline 少了 text key

    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor', 'flip', 'flip_direction', 'text',
                   'custom_entities'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants