Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending file-type #340

Closed
mooyoul opened this issue Mar 20, 2020 · 2 comments · Fixed by #603
Closed

Extending file-type #340

mooyoul opened this issue Mar 20, 2020 · 2 comments · Fixed by #603

Comments

@mooyoul
Copy link
Contributor

mooyoul commented Mar 20, 2020

Dear maintainers.
First, I really appreciate that file-type has been keep maintaining. Thank you for your efforts & contributions.

Background

Recently, file-type was refactored to utilize @Borewit's awesome strtok3 packages.
With this changes, we can navigate binary stream easily & effectively.
Now, file-type uses only one Tokenizer interface regardless of source. All public methods creates Tokenizer internally and pass created tokenizer to file-type matcher.

Beside these refactors, developers cannot easily extend file-type.
If someone needs to add new file type support, There are only two choices:

  1. Fork file-type package and customize
    Pros: It does not occur additional i/o if chunk was cached internally
    Cons: Less maintainability

  2. Run custom file-type matchers if file-type returns undefined.
    Pros: More maintainability
    Cons: It occurs additional I/O

As you can see, There are no effective way to extend file-type.

Idea

Expose Matcher interface

So, It would be nice if we support additional matchers.

Let's suppose there is a matcher like this:

interface MatchedType {
  ext: string;
  mime: string;
}

// write custom matcher
async function webpMatcher(tokenizer: ITokenizer): Promise<MatchedType | void>  {
  await tokenizer.ignore(8);
  const buf = Buffer.alloc(4);
  await tokenizer.peekBuffer(buf);
  if (buffer.compare(Buffer.from([0x57, 0x45, 0x42, 0x50])) === 0) {
    return { ext: "webp", mime: "image/webp" };
  }
}

Then, Provide custom matchers like these ways:

// quick & dirty - provide custom matchers as options
fileType.fromFile(path, { matchers: [webpMatcher] });

// or provide factroy like axios does
const fileType = FileType.create({ matchers: [webpMatcher] });

// or provide instance methods to add matchers to specific position (head or tail)
const fileType = FileType.create();
fileType.appendMatcher(webpMatcher);
fileType.prependMatcher(webpMatcher);

I'm not sure which way is cleanest. Please leave your feedback!

Future

With this idea, Developers can easily extend file-type.

From file-type's perspective, file-type can have multiple 3rd-party type matchers which can be released to npm independently, or managed via Github organization/Scoped packages (e.g. @file-type/core for
built-it matchers, @file-type/image for image file type matchers)

Also it can be used solving unsupported features like #264 (comment).
For example, Developers can write custom TSV/CSV type matchers and publish to npm even if file-type does not provide built-in TSV/CSV file type matchers.

@Borewit
Copy link
Collaborator

Borewit commented Mar 21, 2020

Interesting idea @mooyoul. The way I read it, you propose 2 things:

  1. Expand file-type with custom "matchers" (code recognize file signature)
  2. Split file-type engine from "matchers" (which requires 1)

Although modularity is a beautiful thing, it requires additional maintenance. Not necessary for 3rd party implementation (you could argue that those would save work), but at least splitting the engine with the core.

Although the intend of your example is clear to me, note that you should not read but peek to allow the next matcher to read from position 0.

(I will be very busy coming weeks, please forgive me not being responsive these days.)

@sindresorhus
Copy link
Owner

We would definitely like to make it possible to provide your own matchers. There are some complications though. For example, sometimes detections are order-dependent. How should that be handled?

I don't think modularizing is a good idea for file-type. I would prefer to keep as much in this package as possible.


I like this version the most:

// or provide factroy like axios does
const fileType = FileType.create({ matchers: [webpMatcher] });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment