Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Type set by HTTP Gateway #152

Open
lidel opened this issue Sep 4, 2019 · 5 comments
Open

Content Type set by HTTP Gateway #152

lidel opened this issue Sep 4, 2019 · 5 comments

Comments

@lidel
Copy link
Member

lidel commented Sep 4, 2019

HTTP Gateway does content-type sniffing based on golang.org/src/net/http/sniff.go and file extension. js-ipfs uses similar setup.

Problem: there is no mechanism for website creator to override returned content-type, setting custom file extension works only for some file types.

Example

The same data produces different content-type, depending on request path.

Click to expand example!

SVG image

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.svg
→ returned as image/svg+xml

XML document

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.xml
→ returned as text/xml

Unknown extension

https://ipfs.io/ipfs/QmVdFJJBiQkVKFcvXu4WzySbZ7KnCW6uGWLJqZz5FnRWjk/ipfs-logo.foo
→ returned as text/plain

Raw CID

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU
→ returned as text/plain

Raw CID + explicit filename

https://ipfs.io/ipfs/QmTqZhR6f7jzdhLgPArDPnsbZpvvgxzCZycXK7ywkLxSyU?filename=/ipfs-logo.svg
→ returned as image/svg+xml

Motivation

We want IPFS to become viable solution for hosting websites.
At the HTTP level, as a bare minimum, website owners expect to able to override:

  • content-type of specific files / file types
  • error pages (4xx, 5xx)

Ideas to explore

Embedding content-type in unixfs metadata

One way to address this is to support embedding Content-Type in DAG metadata.
This is tracked in ipld/legacy-unixfs-v2#11, but is not a silver bullet.

Main cons:

  • requires low-level tooling
  • changes the DAG, changes the CID

Drop-in config to override content-type per directory

@warpfork noted that DAG metadata may not be the best place for storing content-type:

ipfs/specs#217 (comment)
+1 towards the idea that if [Content] type is getting well-known support, it should be something we move towards the gateway knowing of it, rather than making it a feature of the filesystem.

This would be a much closer set of relationships to how the rest of the world works already (e.g. doing sysadmin today with nginx or something, I would generally configure [Content] types at the webserver area, and not in filesystem metadata) -- and thus seems much less likely to go awry.

Carefully avoiding baking in the idea of a single "mimetype string" field into our filesystem metadata also leaves much more room for issues to evolve around the things Ian mentioned:

  1. a file can have multiple mime types depending on the context
  2. some mime types can't be deduced until the entire file has been read

My take on this is:

  • embedding content-type in DAG (unixfs metadata) can be an useful option, but there should (also) be a config-based way to specify or override content-type returned by HTTP Gateway
  • the config should travel with data, enabling unified behavior on all gateways
    • prior art: .htaccess, .gitattributes
    • website creator would add something like .ipfs/content-types and .ipfs/404.html to the directory, and Gateway would do the right thing when resource from directory or its subdirectories are requested
    • presence of the config file would disable content sniffing on both server and client (X-Content-Type-Options: nosniff)

References

cc @olizilla @autonome

@hsanjuan
Copy link
Contributor

hsanjuan commented Sep 5, 2019

* prior art: `.htaccess`, `.gitattributes`

wouldn't his mean that every request to the gateway becomes two request (one to the actual content, the other to figure out if .htaccess-clone exists). This may be expensive.

And. if using different extensions on the filename is effectively setting the content type guessed for that file, isn't this precisely a way to hint/override the content type of certain content?

@lidel
Copy link
Member Author

lidel commented Sep 5, 2019

wouldn't his mean that every request to the gateway becomes two request (one to the actual content, the other to figure out if .htaccess-clone exists). This may be expensive.

It looks that way, however (iiuc) if gateway wants to resolve /ipfs/{cid}/foo/bar/cat.xyz to a CID it needs to fetch and cache dag roots of /ipfs/{cid}/, /ipfs/{cid}/foo/ and /ipfs/{cid}/foo/bar/.

This means checking if .ipfs exists in any of them does not trigger additional fetch: dag with directory listing is already cached in local repo, which should be cheap to check by the gateway.

if using different extensions on the filename is effectively setting the content type guessed for that file, isn't this precisely a way to hint/override the content type of certain content?

Unfortunately extension-based sniffing relies on arbitrary mapping hardcoded in go-ipfs and works only for popular file types, such as SVG. Publishing file with .sxg extension did not set correct content-type (example below).

Real life example: .sxg

Signed HTTP Exchanges (#121) are bundled as .sxg files. Chrome won't load them unless .sxg is returned with specific content-type (at the moment it is application/signed-exchange;v=b3). Right now ipfs.io has a special Nginx rule that overrides content-type for .sxg, but this obviously does not scale well, and will break old snapshots when we globally update to a new version. On top of that, future specs add more content types.

It is a good illustration of use case where a person publishing file would want to override content-type of a specific file locally and ensure every gateway returns a valid one.

@AuHau
Copy link
Member

AuHau commented Nov 26, 2019

Just FYI there is accepted proposal ipfs/kubo#6214 for support of .ipfs-gateway.(json|yaml). Let see how implementation will move on.

@holloshaw
Copy link

holloshaw commented Jan 17, 2022

Has much progressed in terms of having a 404 page for ipfs hosted websites?

@lidel
Copy link
Member Author

lidel commented Mar 30, 2022

I believe _redirects is work-in-progress, and _headers will be next – see recent status update in ipfs/specs#257 (comment)
When we have that, we may allow customizing Content-Type header via _headers file (tbd, needs security analysis).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants