Skip to content

TarFileSystem: using wildcards together with // in tar #1947

@observingClouds

Description

@observingClouds

What happened

I like to open several tar items within a tar file with fsspec, like:

import fsspec
of = fsspec.open_files("tar://**/*.txt::test.tar")

This works as expected for "normal" tar archives, but unfortunately, the tar entries I need to access include // instead of just /.
While I can access those files directly when mentioning them explicitly, using wildcards fails.

Minimal example

import fsspec
import subprocess
import tarfile
import os

file_path = "test.txt"
subprocess.run(['touch', file_path])

tar_file_path1 = "test_w_extraslash.tar"
with tarfile.open(tar_file_path1, 'w') as tar:
     tar.add(file_path, arcname='path/with/extra/slash//test.txt')

of = fsspec.open_files("tar://**/*::test_w_extraslash.tar")
#of
# <List of 0 OpenFile instances>

tar_file_path2 = "test_wo_extraslash.tar"
with tarfile.open(tar_file_path2, 'w') as tar:
     tar.add(file_path, arcname='path/without/extra/slash/test.txt')

of2 = fsspec.open_files("tar://**/*::test_wo_extraslash.tar")
# of
# <List of 1 OpenFile instances>

Any smart ideas how one can handle this case?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions