Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fs] Fix globbing in the scala blob storage FS #13173

Merged
merged 2 commits into from Jun 13, 2023

Conversation

daniel-goldstein
Copy link
Contributor

@daniel-goldstein daniel-goldstein commented Jun 12, 2023

Using URL parsing for object storage URIs has proven problematic as they don't really have the same syntax. In particular, Java URI truncates the path at ? and assigns the rest to a query string field (as it should), but ? is a valid character in a blob storage object and is used as a glob pattern. We were forgetting to look at the URI.query field to assemble the full object name and therefore broke globbing. I added a few tests specifically that used single-character match pattern (?) and got rid of URI in favor of a regex to get ahead of any other potential footguns.

}
parseHttpsUrl(filename)
.orElse(parseHailAzUrl(filename))
.getOrElse(throw new IllegalArgumentException("ABS URI must be of the form https://<ACCOUNT>.blob.core.windows.net/<CONTAINER>/<PATH>"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error reporting has gotten a little worse. Before we gave specialized errors depending on the scheme. Maybe it's worthwhile to still match the scheme first, then match the rest with the appropriate regex?

@danking danking merged commit 18ad85f into hail-is:main Jun 13, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants