Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1 image_data_from_csv handling test data #894

Closed
carbocation opened this issue Oct 12, 2018 · 3 comments
Closed

v1 image_data_from_csv handling test data #894

carbocation opened this issue Oct 12, 2018 · 3 comments

Comments

@carbocation
Copy link

Describe the bug
I am loading data with image_data_from_csv(). I have a csv with labels for all of my data. When I leave all of my images in the folder, the program runs normally. If I move some of my images into test, the program panics when I try to train it:

FileNotFoundError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 137, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 137, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/app/fastai/fastai/vision/data.py", line 189, in __getitem__
    x,y = self.ds[idx]
  File "/app/fastai/fastai/vision/data.py", line 70, in __getitem__
    def __getitem__(self,i): return open_image(self.x[i]),self.y[i]
  File "/app/fastai/fastai/vision/image.py", line 267, in open_image
    x = PIL.Image.open(fn).convert('RGB')
  File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2548, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/root/Wdzws5Dx1CYs5bx/file1.jpg'

Looking at the code, I think this is because my data that I moved into test is still in my labels.csv.

Expected behavior
The program should iterate only over the files that it found in folder, instead of trying to iterate over labels.csv and then crashing when it finds a file|class that does not exist in folder because it has been moved to test.

Alternatively, it is possible that this is an intentional design decision, but it feels clunky to allow us to use the CSV format to determine file|class but still have to create new folders and remove files from the label csv. If iterating over labels.csv instead of the folder is mandatory, it would be nice if the csv supported file|class|test (for example), so we could specify directly in the CSV which files were test files.

@sgugger
Copy link
Contributor

sgugger commented Oct 13, 2018

I'm not sure I follow you. Using this method, you can't expect the library to iterate other files found in folder: it's supposed to have a list of filenames in the csv fil that all are in folder.
Note that ìmage_data_from_csv` doesn't support test datasets yet.
Going on, please use the forum for this, as it'll reach a larger audience, thanks!

@sgugger sgugger closed this as completed Oct 13, 2018
@johnyquest7
Copy link

johnyquest7 commented Dec 13, 2018

@carbocation One way to use test set is

src = (ImageItemList.from_csv(path, 'train_labels.csv', folder='', suffix='.tif',test = path2)
       .random_split_by_pct(0.2)
       .label_from_df())
data = (src.transform(tfms, size=128)
        .add_test_folder()
        .databunch().normalize(imagenet_stats))
print('Train size:', len(data.train_ds))
print('Valid size:', len(data.valid_ds))
print('Test size:', len(data.test_ds))

But in this method the CSV file has names of files in the train folder only.

@johnyquest7
Copy link

Another approach to adding test folder
https://docs.fast.ai/data_block.html#Add-a-test-set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants