Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read parquet from Azure when archive file exists #406

Open
mavestergaard opened this issue Mar 22, 2023 · 0 comments
Open

Cannot read parquet from Azure when archive file exists #406

mavestergaard opened this issue Mar 22, 2023 · 0 comments

Comments

@mavestergaard
Copy link

This is the setup in the blob storage:

container-name/YEAR2023/MONTH03/DAY15/file_name.parquet
container-name/YEAR2023/MONTH03/DAY15/file_name.parquet.archive_202303162004
container-name/YEAR2023/MONTH03/DAY14/file_name.parquet

I get the following error:
azure.core.exceptions.HttpResponseError: The specifed resource name contains invalid characters.
When I am trying to read:
file_name = "az://container-name/YEAR2023/MONTH03/DAY15/file_name.parquet"
I only get the error when the following file exists:
"az://container-name/YEAR2023/MONTH03/DAY15/file_name.parquet.archive_202303162004"

file_name = "az://container-name/YEAR2023/MONTH03/DAY14/file_name.parquet"
is read fine.

There appears to be a regression from 2022.10.0 to any newer version as with 2022.10.0 the below works:

import pandas as pd
df = pd.read_parquet('az://container-name/YEAR2023/MONTH03/DAY15/file_name.parquet',storage_options={'connection_string': con_str, })

works with:

adlfs == 2022.10.0
fsspec ==  2023.3.0

but not newer versions of adlfs

With the newest version of adlfs this works though:

import fsspec
import pandas as pd
fs = fsspec.filesystem('az', connection_string=con_str)
fs.ls("/container-name/YEAR2023/MONTH03/DAY15/")
df = pd.read_parquet(file_name, storage_options={'connection_string': con_str})

but not if i ran the command
df = pd.read_parquet('az://container-name/YEAR2023/MONTH03/DAY15/file_name.parquet',storage_options={'connection_string': con_str}) first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant