OSError: Consistency check failed #1498

Nirbhay2727 · 2023-06-06T19:43:57Z

Describe the bug

OSError: Consistency check failed: file should be of size 9860464979 but has size 4277965032 ((…)l-00006-of-00007.bin).
We are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Downloading (…)l-00006-of-00007.bin: 43%|███████████████▌ | 4.28G/9.86G [03:25<04:27, 20.8MB/s]

Reproduction

No response

Logs

model = AutoModelForCausalLM.from_pretrained(checkpoint,force_download=True, resume_download=False).to(device)
#device='cuda'

System info

- huggingface_hub version: 0.15.1
- Platform: Linux-5.15.0-1033-oracle-x86_64-with-glibc2.35
- Python version: 3.10.11
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ubuntu/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: Nirbhay2727
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.4.0
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.3
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/ubuntu/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/ubuntu/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/ubuntu/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

The text was updated successfully, but these errors were encountered:

Wauplin · 2023-06-06T20:28:13Z

Hi @Nirbhay2727, I'm sorry for the inconvenience. Can you tell me which model and file you are trying to instantiate ? Thanks in advance

mateoriosglb · 2023-06-16T17:11:49Z

Hello, I'm having the same issue with models:

stabilityai/stable-diffusion-2-1
runwayml/stable-diffusion-v1-5
SG161222/Realistic_Vision_V1.4
prompthero/openjourney

ParseDark · 2023-06-17T07:48:13Z

same issue. I want to use the whisper

matthew-hippocraticai · 2023-07-05T16:13:42Z

Same issue with falcon-40b-instruct

Wauplin · 2023-07-05T16:19:17Z

Sorry not taking care of it earlier. @matthew-hippocraticai could you post here:

the output of huggingface-cli env (run in your machine)
the script you used to download the weight
the exact stacktrace you get (the error message + trace)

If you can't reproduce the error please let me know as well. And just in case, could you update to latest huggingface_hub version and retry?

Thanks a lot in advance! I'll investigate this thoroughly in the coming days so more information will help.

matthew-hippocraticai · 2023-07-05T18:28:25Z

I believe the issue was actually just running out of storage. Has been fixed now, thanks!

Wauplin · 2023-07-06T15:50:01Z

I believe the issue was actually just running out of storage.

Ah! That's a really good insight for us. How did you found out in the end?
@mateoriosglb @ParseDark could it be the same for you?

If confirmed, I'll add a message to the exception to suggest to the user to check disk usage and retry.

matthew-hippocraticai · 2023-07-06T16:00:04Z

It was simply OSError: [Errno 28] No space left on device but I didn't see it at first and just assumed this error was the only one

julien-c · 2023-07-10T18:00:07Z

note that we could maybe explore checking space left on device before starting downloading (but – and it's a big but – i'm not sure if there's any reliable way to do this)

Wauplin · 2023-07-11T08:26:57Z

@julien-c Given #1498 (comment), I don't think space left on device was the root cause of the problem for everyone in this issue (ping @mateoriosglb @ParseDark @Nirbhay2727 if you can confirm?). But anyway, checking for free space is still an interesting nice to have IMO so I opened an issue for it #1551. There is a cross-platform Python's built-in to check disk_usage. I would only trigger a warning instead of raising an exception.

Wauplin · 2023-07-11T08:28:38Z

Also related, I've got a user getting repeatedly this issue (#1549) but not when downloading with snapshot_download. I'm trying to investigate it more.

mateoriosglb · 2023-07-11T19:33:40Z

I still don't know the source of the problem but I agree with @Wauplin. I have more than enough space left so I rebooted the kernel, reloaded everything and then they started working fine. However I would like to find the source because it is possible that it will happen again.

978551022 · 2023-07-31T11:41:47Z

I've been trying to re-download it for 5 days, and it always runs out
I changed everything I could and I still solved it like this

978551022 · 2023-07-31T11:42:39Z

Nirbhay2727 · 2023-07-31T12:24:45Z

The issue was solved for me by downgrading bitsandbytes to 0.37.0. I believe that the cause of the issue was that a deep copy of the model was created while saving the checkpoints.

Wauplin · 2023-07-31T12:26:56Z

@978551022 I'm sorry you are having troubles here but if I understand correctly, this is not a problem of consistency issue, right? If you are download files with a slow connection, I would highly recommend you to use resume_download=True. If some timeout errors happen, you "just" have to relaunch it. This can even be done in a single script:

import requests
from huggingface_hub import snapshot_download

while True:
    try:
        snapshot_download(..., resume_download=True)
        break
    except requests.ReadTimeout as e:
        print(f"Read timeout: {e}")

Each time the snapshot_download is restarted, it takes a bit of time to list the files to download but this init phase should be neglectable compared to the download time.

I'm sorry that there's no built-in solution right now but error handling is not always easy to integrate to a library - as user expectations might differ. Hope this snippet of code will make your life easier :)

978551022 · 2023-08-01T09:47:32Z

@978551022很抱歉您在这里遇到了麻烦，但如果我理解正确的话，这不是一致性问题，对吧？如果您要使用较慢的连接下载文件，我强烈建议您使用resume_download=True. 如果发生一些超时错误，您“只需”重新启动它。这甚至可以在单个脚本中完成：
import requests
from huggingface_hub import snapshot_download

while True:
    try:
        snapshot_download(..., resume_download=True)
        break
    except requests.ReadTimeout as e:
        print(f"Read timeout: {e}")
每次重新启动 snapshot_download 时，都会花费一些时间来列出要下载的文件，但与下载时间相比，此初始化阶段应该可以忽略不计。

很抱歉，目前没有内置的解决方案，但错误处理并不总是很容易集成到库中 - 因为用户的期望可能有所不同。希望这段代码能让您的生活更轻松:)

Breakpoint reconnection This configuration I have set to True
But still timeout, I even changed socket and requests, urllib3 source code
Still to no avail
In theory, resume_download=True can solve this problem
But thanks for your reply. I'm using it now
dataset = load_dataset("mlfoundations/datacomp_pools",num_proc=64)

It is currently being downloaded
The one I used before won't work:
nohup python download_upstream.py --scale xlarge --data_dir /home/nfs-s2/datacomp_pools/xlarge/ --skip_shards --download_npz --processes_count 12768 --thread_count 100000 > /home/wx1271473/code/download.log 2>&1 &

Wauplin · 2023-08-01T10:18:56Z

Then maybe the best is to do the retry loop around load_dataset. At least you wouldn't have to manually restart the script each time. Apart from that, I don't really have a solution to offer

while True:
    try:
        dataset = load_dataset("mlfoundations/datacomp_pools",num_proc=64)
        break
    except Exception as e:
        print(f"Exception: {e}")

978551022 · 2023-08-01T11:31:02Z

现在也许最好的方法是进行重新试循环load_dataset。至少您不必每次都手动重新启动脚本。除此之外，我真的没有提供解决方案
while True:
    try:
        dataset = load_dataset("mlfoundations/datacomp_pools",num_proc=64)
        break
    except Exception as e:
        print(f"Exception: {e}")

Okay, this is a violent download
So far I've been able to run it stably
load_dataset("mlfoundations/datacomp_pools",num_proc=64)

kanav1504 · 2024-01-22T22:20:37Z

Hi, I have the same error trying to download mistralai/Mixtral-8x7B-v0.1.

OSError: Consistency check failed: file should be of size 4983004072 but has size 4124798953 (model-00011-of-00019.safetensors).
We are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

brunopistone · 2024-02-04T18:28:31Z

Hello, I want to report a similar issue with mistralai/Mixtral-8x7B-v0.1.. I've opened a discussion also here

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
       "mistralai/Mixtral-8x7B-Instruct-v0.1",
        trust_remote_code=True,
        force_download=True,
        resume_download=False,
        quantization_config=bnb_config,
        device_map="auto")

Error: AlgorithmError: OSError('Consistency check failed: file should be of size 4221679088 but has size 3663094841 (model-00019-of-00019.safetensors).\nWe are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument.\nIf the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.'), exit code: 1

I've noticed that when the shards are loaded, it appears the following warning UserWarning: Not enough free disk space to download the file. The expected file size is: 4221.68 MB. The target location /root/.cache/huggingface/hub only has 3747.39 MB free disk space.. I have more than 400 GB of free space, so it's not even possible that there is missing space on the device.

Python modules:
transformers==4.37.2
peft==0.7.1
accelerate==0.26.1
bitsandbytes==0.42.0

Wauplin · 2024-02-06T17:00:11Z

@brunopistone I'm sorry you're facing this issue. I would be surprised if the UserWarning: Not enough free disk space to download the file. The expected file size is: 4221.68 MB. The target location /root/.cache/huggingface/hub only has 3747.39 MB free disk space. warning is raised but that you have enough free space on that drive. Are you sure the 400GB are on /root/.cache. What is the output of df -h /root/.cache for you?

If the cache directory is on a small partition, you can always set the cache to a different volume with more disk space available by setting the HF_HOME environment variable (see documentation).

xijiu9 · 2024-02-24T12:35:49Z

If you use ln -s /.cache /root/,cache since the space of /root/ is limited, you need to export HF_HOME=. This solves my problem.

JohnHerry · 2024-05-06T08:24:36Z

Hi, I have the same error trying to download mistralai/Mixtral-8x7B-v0.1.

OSError: Consistency check failed: file should be of size 4983004072 but has size 4124798953 (model-00011-of-00019.safetensors). We are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument. If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

The same error when downloading a very large dataset. Can we make the Consistence check Error a Consistence check Warning so that other part of the dataset can be downloaded continully?

Qingrenn · 2024-05-26T06:35:32Z

Hi, I have the same error trying to download mistralai/Mixtral-8x7B-v0.1.
OSError: Consistency check failed: file should be of size 4983004072 but has size 4124798953 (model-00011-of-00019.safetensors). We are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument. If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

The same error when downloading a very large dataset. Can we make the Consistence check Error a Consistence check Warning so that other part of the dataset can be downloaded continully?

Yes, I also met the same error when downloading the large dataset Salesforce/lotsa_data. Can we redownload the corrupted files after other part of the dataset have been downloaded ?

Exception: Consistency check failed: file should be of size 488482328 but has size 55485239 ((…)vm_traces_2017/data-00004-of-00022.arrow).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

fblgit · 2024-06-02T12:34:55Z

same problem here, weird stuff.. here is here any dos2unix issue kind of thing? first time i see this issue..

Wauplin · 2024-06-03T07:35:12Z

Can we redownload the corrupted files after other part of the dataset have been downloaded ?

Downloading files using datasets is slightly different at the moment (not the same cache). Ping @lhoestq is it possible to force trigger the redownload of a single arrow file?

Wauplin · 2024-06-03T07:37:17Z

first time i see this issue..

@fblgit I'm sorry for the inconvenience. It can be related to network issues but hard to investigate. Best strategy is still to redownload the corrupted file. Not sure about the dos2unix TBH. It could be possible for small text files but not huge binaries (which is where is usually happen).

lhoestq · 2024-06-03T10:59:39Z

you can force-redownload the file manually, e.g. in your case

from datasets import DownloadConfig, DownloadManager

url = "hf://datasets/Salesforce/lotsa_data/azure_vm_traces_2017/data-00004-of-00022.arrow"
config = DownloadConfig(force_download=True)
DownloadManager(config).download(url)

nguyenkhangme · 2024-06-06T07:58:16Z

I got the same problem and in my case, it turns out the downloading process has been terminated so when it starts again, it just adds up to the same file, which makes "Consistency check failed". Solved this by:

pip install huggingface_hub\[cli\]
huggingface-cli delete-cache

then delete the cache and download again

Nirbhay2727 added the bug Something isn't working label Jun 6, 2023

Wauplin mentioned this issue Jul 11, 2023

Check disk_usage before download #1551

Closed

ch-shin mentioned this issue Jul 20, 2023

Metadata download error - OSError: Consistency check failed mlfoundations/datacomp#33

Closed

Sanster mentioned this issue Mar 3, 2024

OSError: [Errno 28] No space left on device Sanster/iopaint-docs#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError: Consistency check failed #1498

OSError: Consistency check failed #1498

Nirbhay2727 commented Jun 6, 2023

Wauplin commented Jun 6, 2023

mateoriosglb commented Jun 16, 2023

ParseDark commented Jun 17, 2023

matthew-hippocraticai commented Jul 5, 2023

Wauplin commented Jul 5, 2023

matthew-hippocraticai commented Jul 5, 2023

Wauplin commented Jul 6, 2023 •

edited

Loading

matthew-hippocraticai commented Jul 6, 2023

julien-c commented Jul 10, 2023

Wauplin commented Jul 11, 2023

Wauplin commented Jul 11, 2023 •

edited

Loading

mateoriosglb commented Jul 11, 2023

978551022 commented Jul 31, 2023

978551022 commented Jul 31, 2023

Nirbhay2727 commented Jul 31, 2023 •

edited

Loading

Wauplin commented Jul 31, 2023

978551022 commented Aug 1, 2023 •

edited

Loading

Wauplin commented Aug 1, 2023

978551022 commented Aug 1, 2023

kanav1504 commented Jan 22, 2024

brunopistone commented Feb 4, 2024 •

edited

Loading

Wauplin commented Feb 6, 2024

xijiu9 commented Feb 24, 2024

JohnHerry commented May 6, 2024

Qingrenn commented May 26, 2024

fblgit commented Jun 2, 2024

Wauplin commented Jun 3, 2024

Wauplin commented Jun 3, 2024

lhoestq commented Jun 3, 2024

nguyenkhangme commented Jun 6, 2024

OSError: Consistency check failed #1498

OSError: Consistency check failed #1498

Comments

Nirbhay2727 commented Jun 6, 2023

Describe the bug

Reproduction

Logs

System info

Wauplin commented Jun 6, 2023

mateoriosglb commented Jun 16, 2023

ParseDark commented Jun 17, 2023

matthew-hippocraticai commented Jul 5, 2023

Wauplin commented Jul 5, 2023

matthew-hippocraticai commented Jul 5, 2023

Wauplin commented Jul 6, 2023 • edited Loading

matthew-hippocraticai commented Jul 6, 2023

julien-c commented Jul 10, 2023

Wauplin commented Jul 11, 2023

Wauplin commented Jul 11, 2023 • edited Loading

mateoriosglb commented Jul 11, 2023

978551022 commented Jul 31, 2023

978551022 commented Jul 31, 2023

Nirbhay2727 commented Jul 31, 2023 • edited Loading

Wauplin commented Jul 31, 2023

978551022 commented Aug 1, 2023 • edited Loading

Wauplin commented Aug 1, 2023

978551022 commented Aug 1, 2023

kanav1504 commented Jan 22, 2024

brunopistone commented Feb 4, 2024 • edited Loading

Wauplin commented Feb 6, 2024

xijiu9 commented Feb 24, 2024

JohnHerry commented May 6, 2024

Qingrenn commented May 26, 2024

fblgit commented Jun 2, 2024

Wauplin commented Jun 3, 2024

Wauplin commented Jun 3, 2024

lhoestq commented Jun 3, 2024

nguyenkhangme commented Jun 6, 2024

Wauplin commented Jul 6, 2023 •

edited

Loading

Wauplin commented Jul 11, 2023 •

edited

Loading

Nirbhay2727 commented Jul 31, 2023 •

edited

Loading

978551022 commented Aug 1, 2023 •

edited

Loading

brunopistone commented Feb 4, 2024 •

edited

Loading