-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further modularize the pdf.js npm package #12900
Comments
I think the problem so far has been that most contributors work on the library itself and are not too familiar with the myriad of bundling tools. In the beginning we supported multiple bundling tools that contributors helped to implement, but this became complex quite soon, so we simplified this in order for it to be maintainable for us. I think we're all open to suggestions for improved bundling, but we rely on the community here to provide us with feedback and patches since this isn't something we work with often. We're happy to review any patches to improve the current situation, but we do want to focus on maintainability as well. |
Well, please note that this is still the main reason for this library to exist :-) As mentioned in #12900 (comment), over the last few years we've purposely removed a lot of overall complexity related to bundling since that's allowed quite a bit of clean-up/simplification of old code.
Note that having one unified API-surface significantly simplifies both maintenance and support, not to mention that by not providing direct access to "lower"-level code we can much more easily make changes and implement new features.
Note that this part in only relevant for Node.js environments, where Web Workers aren't available...
... for every other use-case, i.e. in browsers, note that you're not supposed to load the
Huh; but those are evaluated when the library is built, see While you can obviously always create your own build of the PDF.js library, where you remove e.g. the network-code that you don't need, I'm a bit worried that trying to officially support such a thing in our builds could quickly lead to a high maintenance/support-burden for the few regular PDF.js contributors. |
I'm going to close this issue since there are currently no actionable points in here. Hopefully it does shed some light on the current situation and in which areas we will review contributions. Thanks. |
Given that pdf.js was once built as a PDF rendering library for Firefox and bundled with the application, I understand that bundle size was never a top priority. On the other hand, pdf.js is still (and maybe more than ever) the first choice for developers who work on web applications that need to process or render PDF documents in some way. There is a myriad of questions on StackOverflow around how to bundle the huge pdfjs CommonJS modules with applications.
@robrez suggested to deploy at least tree-shakeable ES modules along with the CJS modules in 2017 (#10317), but the ticket has received little attention despite 33 thumb-ups. There is a module called @bundled-es-modules/pdfjs-dist on npm that claims to "bundle and expose" pdfjs as an ES module, but I think they're really only wrapping the pre-built pdf.js library (which is still only default-exporting and relying on that built-in
require()
), i.e. no tree-shaking advantage over the original library.I spent many hours browsing the source code of pdf.js, and my greatest concerns with regard to reducing the bundle size are:
PDFJSDev.*
configuration variablesAs an application developer, I probably don't need or want a PDF library to handle input abstraction (including different buffer formats, downloading from network streams, node.js integration, etc.) for me. I would probably prefer direct access to the underlying processor and provide my PDF data directly in its single, preferred buffer format. That, together with working tree-shaking, would probably already remove a big part from the resulting bundle.
The worker module resolution is a similar case. The published version of the library relies on a hardcoded (yet dynamic)
require()
which bundlers cannot replace with their own module resolution due to the sheer number of evaluated options. Again, I think this should better be handled outside the library, by the caller, who knows better about the (single, actual) environment.And lastly, the whole
PDFJSDev
variable system could contribute to smaller bundle sizes if only it would qualify for build-time evaluation and dead code removal. Maybe I am missing something, but I have a feeling that the configuration (which is currently anobject
and runtime-processed with helpers like.test()
and.eval()
on top) could just as well be replaced with something more common likeprocess.env
variables, which bundlers like webpack or rollup would then evaluate earlier, at build time, so that unused code could be identified and removed from the bundle.I think this whole topic deserves more attention than it is currently receiving. I would be willing to contribute a fair share, but it would need more volunteers, and contact to somebody who is more familiar with the code would surely speed things up.
The text was updated successfully, but these errors were encountered: