Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Suggestion: Optional content type when for writing file #443

Closed
benrutter opened this issue Dec 20, 2023 · 2 comments
Closed

Feature Suggestion: Optional content type when for writing file #443

benrutter opened this issue Dec 20, 2023 · 2 comments

Comments

@benrutter
Copy link
Contributor

When writing to azure blob storage, it's not currently possible to specify content type. For instance this:

html_contents = """
<!DOCTYPE html>
<html lang="en">
<head><title>Hello</title></head>
<body><h1>Hello world!</h1></body>
</html>
"""

with fsspec.open("abfs://container/file.html", mode="w", **storage_options) as file:
    file.write(html_contents)

will be uploaded to azure blob storage as the default "application/octet-steam" rather than "text/html". Azure blob storage can be used to host static sites, for instance, but requires it knows the content types (text/html, text/javascript, etc) of files.

It would be really handy if this was possible:

with fsspec.open("abfs://container/file.html", mode="w", content_type="text/html", **storage_options) as file:
    file.write(html_contents)

azure sdk blob client offerns a content_settings keyword argument that works like this:

blob_client.upload_blob(
    contents,
    content_settings=ContentSettings(content_type="text/html"),
)

I think passing this in from an adlfs perspective would just be something like this for spec.py (lines 1574):

                        await bc.upload_blob(
                            f1,
                            overwrite=overwrite,
                            metadata={"is_directory": "false"},
                            raw_response_hook=make_callback(
                                "upload_stream_current", callback
                            ),
                            max_concurrency=max_concurrency or self.max_concurrency,
                            content_settings=ContentSettings(content_type=content_type),
                            **self._timeout_kwargs,
                        )

It would be simple enough to default to not passing in an argument unless specified, so shouldn't (in theory) cause any changes to existing code.

But I'm not sure I understand enough about fsspec and adlfs's interaction. Does fsspec just pass any additional keyword arguments down the chain so that the earlier example would work as expected?

@TomAugspurger
Copy link
Contributor

That was fixed in #392. You should be able to pass it to file.write

@benrutter
Copy link
Contributor Author

benrutter commented Dec 22, 2023

That's immense, thanks a whole bunch!

Just in case anyones looking over this in future it won't work for my example, this:

with fsspec.open("abfs://container/file.html", mode="w", **storage_options) as file:
    file.write(
        html_contents,
        content_settings=ContentSettings(content_type=content_type),
    )

chucks up a TypeError: PickleableTextIOWrapper.write() takes no keyword argument error.
But definiting the filesystem first works great!

abfs = fsspec.filesystem(
    "abfs",
    **storage_options,
)
abfs.write_bytes(
    path="container/file.html",
    value=html_contents.encode("utf-8"),
    content_settings=ContentSettings(content_type=content_type),
)

I noticed it doesn't work for write_text only write_bytes, is that known behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants