-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: Consistency check failed #1498
Comments
Hi @Nirbhay2727, I'm sorry for the inconvenience. Can you tell me which model and file you are trying to instantiate ? Thanks in advance |
Hello, I'm having the same issue with models:
|
same issue. I want to use the whisper |
Same issue with falcon-40b-instruct |
Sorry not taking care of it earlier. @matthew-hippocraticai could you post here:
If you can't reproduce the error please let me know as well. And just in case, could you update to latest Thanks a lot in advance! I'll investigate this thoroughly in the coming days so more information will help. |
I believe the issue was actually just running out of storage. Has been fixed now, thanks! |
Ah! That's a really good insight for us. How did you found out in the end? If confirmed, I'll add a message to the exception to suggest to the user to check disk usage and retry. |
It was simply |
note that we could maybe explore checking space left on device before starting downloading (but – and it's a big but – i'm not sure if there's any reliable way to do this) |
@julien-c Given #1498 (comment), I don't think space left on device was the root cause of the problem for everyone in this issue (ping @mateoriosglb @ParseDark @Nirbhay2727 if you can confirm?). But anyway, checking for free space is still an interesting nice to have IMO so I opened an issue for it #1551. There is a cross-platform Python's built-in to check disk_usage. I would only trigger a warning instead of raising an exception. |
Also related, I've got a user getting repeatedly this issue (#1549) but not when downloading with |
I still don't know the source of the problem but I agree with @Wauplin. I have more than enough space left so I rebooted the kernel, reloaded everything and then they started working fine. However I would like to find the source because it is possible that it will happen again. |
The issue was solved for me by downgrading bitsandbytes to 0.37.0. I believe that the cause of the issue was that a deep copy of the model was created while saving the checkpoints. |
@978551022 I'm sorry you are having troubles here but if I understand correctly, this is not a problem of consistency issue, right? If you are download files with a slow connection, I would highly recommend you to use import requests
from huggingface_hub import snapshot_download
while True:
try:
snapshot_download(..., resume_download=True)
break
except requests.ReadTimeout as e:
print(f"Read timeout: {e}") Each time the snapshot_download is restarted, it takes a bit of time to list the files to download but this init phase should be neglectable compared to the download time. I'm sorry that there's no built-in solution right now but error handling is not always easy to integrate to a library - as user expectations might differ. Hope this snippet of code will make your life easier :) |
Breakpoint reconnection This configuration I have set to True It is currently being downloaded |
Then maybe the best is to do the retry loop around while True:
try:
dataset = load_dataset("mlfoundations/datacomp_pools",num_proc=64)
break
except Exception as e:
print(f"Exception: {e}") |
Okay, this is a violent download |
Hi, I have the same error trying to download mistralai/Mixtral-8x7B-v0.1. OSError: Consistency check failed: file should be of size 4983004072 but has size 4124798953 (model-00011-of-00019.safetensors). |
Hello, I want to report a similar issue with
Error: I've noticed that when the shards are loaded, it appears the following warning Python modules: |
@brunopistone I'm sorry you're facing this issue. I would be surprised if the If the cache directory is on a small partition, you can always set the cache to a different volume with more disk space available by setting the |
If you use ln -s /.cache /root/,cache since the space of /root/ is limited, you need to export HF_HOME=. This solves my problem. |
The same error when downloading a very large dataset. Can we make the |
Yes, I also met the same error when downloading the large dataset Salesforce/lotsa_data. Can we redownload the corrupted files after other part of the dataset have been downloaded ? Exception: Consistency check failed: file should be of size 488482328 but has size 55485239 ((…)vm_traces_2017/data-00004-of-00022.arrow).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub. |
same problem here, weird stuff.. here is here any dos2unix issue kind of thing? first time i see this issue.. |
Downloading files using |
@fblgit I'm sorry for the inconvenience. It can be related to network issues but hard to investigate. Best strategy is still to redownload the corrupted file. Not sure about the dos2unix TBH. It could be possible for small text files but not huge binaries (which is where is usually happen). |
you can force-redownload the file manually, e.g. in your case from datasets import DownloadConfig, DownloadManager
url = "hf://datasets/Salesforce/lotsa_data/azure_vm_traces_2017/data-00004-of-00022.arrow"
config = DownloadConfig(force_download=True)
DownloadManager(config).download(url) |
I got the same problem and in my case, it turns out the downloading process has been terminated so when it starts again, it just adds up to the same file, which makes "Consistency check failed". Solved this by: pip install huggingface_hub\[cli\]
huggingface-cli delete-cache then delete the cache and download again |
Describe the bug
OSError: Consistency check failed: file should be of size 9860464979 but has size 4277965032 ((…)l-00006-of-00007.bin).
We are sorry for the inconvenience. Please retry download and pass
force_download=True, resume_download=False
as argument.If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Downloading (…)l-00006-of-00007.bin: 43%|███████████████▌ | 4.28G/9.86G [03:25<04:27, 20.8MB/s]
Reproduction
No response
Logs
model = AutoModelForCausalLM.from_pretrained(checkpoint,force_download=True, resume_download=False).to(device) #device='cuda'
System info
The text was updated successfully, but these errors were encountered: