Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PytorchStreamReader failed reading zip archive: failed finding central directory (no backtrace available) #31620

Closed
kuloud opened this issue Dec 26, 2019 · 18 comments
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue

Comments

@kuloud
Copy link

kuloud commented Dec 26, 2019

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Module.load(modelFile.absolutePath)
    2.PytorchStreamReader failed reading zip archive

Expected behavior

Load the model file correct.

Environment

  • PyTorch Android Version : 1.3.1 ( org.pytorch:pytorch_android:1.3.1 )
  • Device Info: Mi 8 / MIUI 11.0.4.0 / Qualcomm Dragon 845

Additional context

onet_epoch.pt.zip

    com.facebook.jni.CppException: [enforce fail at inline_container.cc:137] . PytorchStreamReader failed reading zip archive: failed finding central directory
    (no backtrace available)
        at org.pytorch.Module$NativePeer.initHybrid(Native Method)
        at org.pytorch.Module$NativePeer.<init>(Module.java:70)
        at org.pytorch.Module.<init>(Module.java:25)
        at org.pytorch.Module.load(Module.java:21)

cc @suo

@ngimel ngimel added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 26, 2019
@ngimel
Copy link
Collaborator

ngimel commented Dec 26, 2019

Can you please provide a minimum script demonstrating the error?

@zdevito
Copy link
Contributor

zdevito commented Dec 27, 2019

The attached file does not seem to be a valid TorchScript file (i.e. one generated with .save on a ScriptModule object). Can you provide more detail about how it was generated?

@kuloud
Copy link
Author

kuloud commented Dec 27, 2019

The attached file does not seem to be a valid TorchScript file (i.e. one generated with .save on a ScriptModule object). Can you provide more detail about how it was generated?

this model file comes from
https://github.com/Sierkinhane/mtcnn-pytorch

the script to generate the file :
https://github.com/Sierkinhane/mtcnn-pytorch/blob/master/mtcnn/train_net/train.py

Thanks.

@kuloud
Copy link
Author

kuloud commented Dec 27, 2019

Can you please provide a minimum script demonstrating the error?
line 65 in the file:
https://github.com/didi/AoE/blob/master/Android/third_party/pytorch/src/main/kotlin/com/didi/aoe/runtime/pytorch/PyTorchInterpreter.kt

I try to load the attached model file in this way.

@driazati
Copy link
Contributor

The onet_epoch.pt.zip file you provided looks like it's produced by calling torch.save (maybe here?). TorchScript on mobile can only load a compiled model (one saved with torch.jit.save).

You will need to make your model compatible with TorchScript (either via tracing or scripting, details here), then save that via torch.jit.save.

@kuloud
Copy link
Author

kuloud commented Dec 31, 2019

The onet_epoch.pt.zip file you provided looks like it's produced by calling torch.save (maybe here?). TorchScript on mobile can only load a compiled model (one saved with torch.jit.save).

You will need to make your model compatible with TorchScript (either via tracing or scripting, details here), then save that via torch.jit.save.

Ok, It works for me, thank you.

@kuloud kuloud closed this as completed Dec 31, 2019
facebook-github-bot pushed a commit that referenced this issue Jan 8, 2020
Summary:
This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++.

Relevant for #31620
](https://our.intern.facebook.com/intern/diff/19252172/)
Pull Request resolved: #31709

Pulled By: driazati

Differential Revision: D19252172

fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this issue Jan 30, 2020
Summary:
This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++.

Relevant for pytorch#31620
](https://our.intern.facebook.com/intern/diff/19252172/)
Pull Request resolved: pytorch#31709

Pulled By: driazati

Differential Revision: D19252172

fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5
@saurabhmalviya25
Copy link

@kuloud Can you please share the code with scripting.
I am also facing the same issue but unable to resolve it.

@Coderx7
Copy link
Contributor

Coderx7 commented Mar 18, 2020

I faced this issue and my problem was that, the pretrained model I used, had saved the model wholesomely ( saved it like torch.save({'model': model}) and as you know, this is very bad, since in order to use that model, you need to have the same module/file/dir heirarchy or else it will crash.
I tried to convert this model into torch script, it went fine, but upon using I faced this error.

to solve this, I simply first saved the models state_dict() into a new checkpoint file, loaded from the new checkpoint and then converted my model.
hope it helps you as well

@HeyangQin
Copy link

In my case, this error was caused by a corrupted saved file. So I switch to older checkpoints and the problem is gone.

@tejan-rgb
Copy link

model_mdetr, postprocessor = torch.hub.load('ashkamath/mdetr:main', 'mdetr_efficientnetB5', pretrained=True, return_postprocessor=True)

I am running this and it is showing me this error:

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Help!

@bekirbakar
Copy link

In my case, this error was caused by a corrupted saved file. So I switch to older checkpoints and the problem is gone.

I had the same problem today. In my case, it was because of corrupted download files. Somehow it stops downloading and goes to the next process which is unzipping. There the problem occurs.

There must be hash control or something else for downloads and I think an exception should be raised.

@naveenjafer
Copy link

@bekirbakar Did you manage to fix this? I am running into the same issue.

@bekirbakar
Copy link

@naveenjafer

Problem was my server's internet speed (maybe not speed but stability). I downloaded files with a good connection using another device. And then I transferred them to the server via external hardware.

I suggest you check your ethernet or wifi. Or of course, you can do what I did.

Basically, the code tries to extract/unzip corrupted files.

@adeljalalyousif
Copy link

adeljalalyousif commented Jun 29, 2022

/// same error

import torch
from torch import nn
import torchvision
resnet = torchvision.models.resnet101(pretrained=True)

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

can any one help me

@bekirbakar
Copy link

Download model manually and load it from local. See this post.

@RezaYazdaniAminabadi
Copy link

I run into this issue when different processes on different GPU ranks want to load the same file using map_location='cpu'. I have also checked that I can load the same checkpoint at python shell. Does anyone know how to resolve this?
Thanks,
Reza

@RezaYazdaniAminabadi
Copy link

By the way, it would be nice if we can get the error message shows the name of the file and the rank that cannot load the checkpoint!

@QinHsiu
Copy link

QinHsiu commented Jul 2, 2023

I have faced the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

No branches or pull requests