Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 5] Input/output error #510

Closed
salmannauman6 opened this issue Apr 15, 2019 · 11 comments

Comments

Projects
None yet
9 participants
@salmannauman6
Copy link

commented Apr 15, 2019

Bug report for Colab: http://colab.research.google.com/.

  • Basically, I am getting OSError: [Errno 5] Input/output error when trying to read a large (6GB) CSV file which is placed on my google drive. This was working fine earlier. I was able to read the data, but then all of a sudden, the same thing has stopped working. This is completely random and the root cause is not understandable. I am working on google chrome browser.
@colaboratory-team

This comment has been minimized.

@salmannauman6

This comment has been minimized.

Copy link
Author

commented Apr 16, 2019

No, it does not. I have just one folder in my root folder which contains this one CSV file I am reading.

@colaboratory-team

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

Thanks for confirming.
Can you share a minimal self-contained repro notebook, either publicly or just with colaboratory-team@google.com ?
(it would be helpful to see precisely how you're reading the data)

Does the problem go away if you first
!cp path/to/data.csv local.csv
and then read from the local path?

@ShHsLin

This comment has been minimized.

Copy link

commented Apr 22, 2019

Similar issue here.
Get
gzip: stdin: Input/output error
tar: Child returned status 1
tar: Error is not recoverable: exiting now

when doing,
!tar -zxvf /content/gdrive/My\ Drive/data.tgz -C ./ > /dev/null
with a large data.tgz file ~ 10GB.

@Syzygy2048

This comment has been minimized.

Copy link

commented Apr 25, 2019

I've no issue accessing 20 GB files.

What causes this issue for me is when there are many files in the folder (or parent folders) I'm accessing.
Instead of having path/to/data/data_x_of_1000files_in_folder.csv, I transformed the file structure to path/to/data/20folders/data_x_of_50files_in_folder.csv

Try making sure that there are no more than 50 files in the folder the file is in, or in any of the parent folders.

When I was only accessing a single file, or accessing files sequentially, I could also just try to load the file again, that worked because the context has been loaded already. This didn't work for random access.

Works for me, hope this helps you too.

@sgabor1

This comment has been minimized.

Copy link

commented May 1, 2019

Similarly things were working without a problem until today, now the untar won't finish anymore with a large file:
tar: /content/gdrive/My Drive/bigfile.tar: Cannot read: Operation not permitted
tar: /content/gdrive/My Drive/bigfile.tar: Cannot read: Input/output error
tar: Too many errors, quitting
tar: Error is not recoverable: exiting now
It could successfully untar all the files (31GB tar with 10000 files) even yesterday multiple times..
The command I'm using:
!tar -C features -xf /content/gdrive/My Drive/bigfile.tar

Trying to copy the whole tar into the runtime first also timing out:
cp: error reading '/content/gdrive/My Drive/bigfile.tar': Input/output error

@furkanyildiz

This comment has been minimized.

Copy link

commented May 20, 2019

I have same problem. I can not read my files on drive. It's sometimes working but mostly giving OSError

OSError: Can't read data (file read failed: time = Mon May 20 00:34:07 2019
, filename = '/content/drive/My Drive/train/trainX_file1', file descriptor = 83, errno = 5, error message = 'Input/output error', buf = 0xc71d3864, total read size = 42145, bytes this sub-read = 42145, bytes actually read = 18446744073709551615, offset = 119840768)

Also creating file giving the OSError.

OSError: Unable to create file (unable to open file: name = '/content/drive/My Drive/train/model.hdf5', errno = 5, error message = 'Input/output error', flags = 13, o_flags = 242)

"https://research.google.com/colaboratory/faq.html#drive-timeout" does not helped me.

@kallianisawesome

This comment has been minimized.

Copy link

commented May 20, 2019

I have same problem too. I can't load my data which is not very large, I can load it with num_workers = 1(use PyTorch Dataloader method), but I can't get my files. The number of my files is about 40000. I have tried io.imread or cv2.imread, they all work fine in my own computer, and I am sure that my files are in right place. I can't figure it out for days, I guess it' not my problem. I will try to get image matrix in my own computer and upload by csv format. If this work out, I will feedback.

The link below offers a method, but my files are already in subfolders, maybe it can help you.
https://stackoverflow.com/questions/54973331/input-output-error-while-using-google-colab-with-google-drive

@colaboratory-team

This comment has been minimized.

Copy link
Contributor

commented May 21, 2019

Duplicate of #559

@colaboratory-team colaboratory-team marked this as a duplicate of #559 May 21, 2019

@yuuSiVo

This comment has been minimized.

Copy link

commented Jun 11, 2019

I have same issue too. Today, I made voice conversion program in Google Colaboratory. Yesterday it was works. But, today not working since this morning in Japan

@abiantorres

This comment has been minimized.

Copy link

commented Jun 22, 2019

I have the same issue. I can't access to a hdf5 file of 42 GB. At some point of my processing pipe comes an OSError, as @furkanyildiz commented. I access each element sequentially and then stored it instantaneously in another .tfrecords file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.