Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

parlai.core.build_data.DownloadableFile.download_file () is unable to deal with files zipped with __OSX folders #4567

Closed
chiehminwei opened this issue Jun 2, 2022 · 2 comments
Labels

Comments

@chiehminwei
Copy link
Contributor

chiehminwei commented Jun 2, 2022

In line 368, of the _unzip method in

def _unzip(path, fname, delete=True):

with zf.open(member, 'r') as inf, PathManager.open(outpath, 'wb') as outf:

This throws the following error on my linux machine when I'm trying to download a file that was zipped on MacOS.
We should add a simple sanity check to make sure we're not trying to open any __OSX paths.

Traceback (most recent call last):
  File "build.py", line 42, in <module>
    build(opt)
  File "build.py", line 35, in build
    downloadable_file.download_file(dpath)
  File "/private/home/jimmywei/ParlAI/parlai/core/build_data.py", line 105, in download_file
    untar(dpath, self.file_name)
  File "/private/home/jimmywei/ParlAI/parlai/core/build_data.py", line 269, in untar
    return _unzip(path, fname, delete=delete)
  File "/private/home/jimmywei/ParlAI/parlai/core/build_data.py", line 383, in _unzip
    with zf.open(member, 'r') as inf, PathManager.open(outpath, 'wb') as outf:
  File "/private/home/jimmywei/.conda/envs/conda_parlai/lib/python3.8/site-packages/iopath/common/file_io.py", line 1012, in open
    bret = handler._open(path, mode, buffering=buffering, **kwargs)  # type: ignore
  File "/private/home/jimmywei/.conda/envs/conda_parlai/lib/python3.8/site-packages/iopath/common/file_io.py", line 604, in _open
    return open(  # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: 'datadata/Friends/__MACOSX/._friends-corpus'

This happened when I was trying to download the Friends dataset from ConvoKit:

from parlai.core.build_data import DownloadableFile
import parlai.core.build_data as build_data
import os
from convokit import download

RESOURCES = [
    DownloadableFile(
        'http://zissou.infosci.cornell.edu/convokit/datasets/friends-corpus/friends-corpus.zip',
        'friends-corpus.zip',
        '51ae80ce345212839d256b59b4982e9b40229ff6049115bd54d885a285d2b921',
        zipped=True,
    )
]


def build(opt):
    dpath = os.path.join(opt['datapath'], 'Friends')
    version = '1.00'

    if not build_data.built(dpath, version_string=version):
        print('[building data: ' + dpath + ']')
        if build_data.built(dpath):
            # An older version exists, so remove these outdated files.
            build_data.remove_dir(dpath)
        build_data.make_dir(dpath)

        # Download the data.
        for downloadable_file in RESOURCES:
            downloadable_file.download_file(dpath)
@chiehminwei chiehminwei changed the title parlai.core.build_data.DownloadableFile is unable to deal with files zipped with __OSX folders parlai.core.build_data.DownloadableFile.download_file () is unable to deal with files zipped with __OSX folders Jun 2, 2022
@github-actions
Copy link

github-actions bot commented Jul 3, 2022

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

@github-actions github-actions bot added the stale label Jul 3, 2022
@stephenroller
Copy link
Contributor

Lol wtf

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants