Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low level MIME sniffing #49843

Closed
jimmywarting opened this issue Sep 24, 2023 · 5 comments
Closed

Low level MIME sniffing #49843

jimmywarting opened this issue Sep 24, 2023 · 5 comments
Labels
feature request Issues that request new features to be added to Node.js. stale

Comments

@jimmywarting
Copy link

jimmywarting commented Sep 24, 2023

What is the problem this feature will solve?

This issue came up during a discussion related to fs.openAsBlob, where blob.type currently returns an empty string. I must agree that it would be nice if Node.js could automatically determine the MIME type of a given entity, whether through extension lookup or by inspecting "magic numbers".

Of course, there might be some cases where certain file extensions collide, but I believe it would be best to adhere to the IANA standards. After all, all browsers are capable of deducing certain information. Here's a straightforward example:

var root = await navigator.storage.getDirectory()
var fileHandle = await root.getFileHandle('image.png', { create: true })
var file = await fileHandle.getFile()
console.log(file.type) // Returns 'image/png'

I'm not sure about the exact mechanism behind it, but it simply works somehow.

I've come across some useful resources that might help address this issue:

These resources could provide valuable insights and potential solutions for improving the MIME type detection in Node.js.

What is the feature you are proposing to solve the problem?

I'm unsure about the API's potential appearance, but it would be beneficial to explore various methods for detecting it. I'm also uncertain whether it should return an array of potential matches along with their respective scores and probabilities.

For instance:

x.identifyType('txt') // Returns 'text/plain'
x.identifyExtension('text/plain') // Returns 'txt'
x.sniffTypeFrom(blob || arrayBuffer || arrayBufferView) // Infers 'text/plain'
x.sniffExtensionFrom(blob || arrayBuffer || arrayBufferView) // Infers 'txt'
x.standardize('text/javascript1.2') // Converts to 'text/javascript' following the specification at https://mimesniff.spec.whatwg.org/#ref-for-javascript-mime-type
x.detectFormat(anything) // Returns { extension: 'txt', type: 'text/plain' }
x.lookupInSystemRegistry(anything) // Searches, for instance, the Windows system registry for installed applications
x.IANA...

Perhaps when searching for a path, it could consider both the file name extension and perform MIME sniffing.

Lastly:

blob = await openAsBlob('./image.png')
blob.type // Returns 'image/type'

Please note that the exact method names and functionality would depend on the implementation details of the API.

I think this would serve the community to have this built right in as it's something that is very commonly used.

@jimmywarting jimmywarting added the feature request Issues that request new features to be added to Node.js. label Sep 24, 2023
@bnoordhuis
Copy link
Member

whether through extension lookup or by inspecting "magic numbers"

Both have problems. Just looking at the extension is prone to false positives (e.g. GIF file with a .jpg extension), while sniffing the first few bytes is problematic with non-files like fifos (can't seek, can't rewind.)

Even determining if a file is text/plain is challenging if you don't know the encoding. A file consisting of the octets 00 00 00 20 is text/plain only when you know it's UTF32-BE, otherwise it's application/octet-stream.

https://mimesniff.spec.whatwg.org/ seems to be aimed at HTTP, not file systems, so I don't think it's of much use. I feel node shouldn't be in the business of making educated guesses, not when everyone else is making different educated guesses.

@anonrig
Copy link
Member

anonrig commented Sep 25, 2023

For future reference, we have a fast mimesniff parser at Ada - https://github.com/ada-url/mimesniff

@Uzlopak
Copy link
Contributor

Uzlopak commented Oct 16, 2023

A million years ago i programmed this

https://github.com/parallax/jsPDF/blob/5d09af9135a2fe049c7d3c8b95df280d22e4a6db/src/modules/addimage.js#L51

Is this something we should put in a transform stream?

Copy link
Contributor

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the never-stale Mark issue so that it is never considered stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment.
For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Apr 14, 2024
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stale
Projects
Status: Pending Triage
Development

No branches or pull requests

4 participants