-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web assembly #60
Comments
That does sound interesting! Will do some tests soon. |
@gedw99 do you perhaps have an idea how to call the pdfium WASM from the go-pdfium WASM binary? |
I tried to build the example Go as WASM, but it has some weird behavior (it doesn't compile) |
Sorry for lack of response. Came down with sone flu. will try this when back on my feet. Wazero should in theory be able to be used to host the wasm. Wazero can be embedded with any golang program. They have many examples and there are many on GitHub. The way the wasm was compiled however is important. Was it compiles to run in a browser or outside a browser ? Also I know Wazero has / is working on being able to run both . You can also ask the Wazero team - they are really proactive |
Line 12 looks like emscripten. https://github.com/bblanchon/pdfium-binaries/blob/master/steps/06-build.sh |
I understand, but when pdfium is compiled for WASM, and go-pdfium is compiled in WASM, that doesn't mean that the go-pdfium WASM can interface with the pdfium WASM, so that's the problem I'm trying to figure out right now. |
So it sounds like your trying to run both in a browser ? The answer to this I don’t know. I would ask the Wazero team. You raise a very good point and I have not delved into this . for server, desktop and mobile though here is an example that calls into code compiled to wasm with emscripten: https://github.com/tetratelabs/wazero/blob/866fac2e969c1d45ce2459355de88a6395202aae/emscripten/emscripten_example_test.go |
I'm not trying to run anything yet, I'm just trying to figure out how this would work in theory. pdfium compiled into WASM isn't how Go normally integrates with libraries, because that happens through cgo. In this case Go would need to know that it has to find the cgo implementations in the seperate pdfium WASM binary. I'm just seeing how that would work. |
Everything you say I agree with. basically Wazero runs the wasm compiled pdfium. That’s why I added the link to the example. I would start there. It’s not complicated . In fact less complicated than cgo imho. Give it a try !! The Wazero team will help if they can. Just make a reproduction repo for them ( or branch ). I am still sick in bed mate so can’t give it a try myself . |
It’s does NOT need the CGO stuff , to answer your question |
Your able to call functions in the wasm from golang by using Wazero |
I understand what you're saying, but that means that I would have to rewrite every pdfium C++ implementation that I have right now to work with Wazero/pdfium WASM, it's not going to happen. If we can compile go-pdfium into WASM, and it could automatically call into the pdfium WASM, as if the pdfium WASM is the C++ library that go-pdfium would normally call into, then that would be perfect. |
True you will have to rewrite the calls. There is no cgo involved. It all comes down to if it’s worth it for you . Speed , throughput might be slower or higher. I would try one or two functions first and do a benchmark composition by golang tests |
I don't really see any advantage of that right now, but if you're willing to give that's fine with me. If we can have a solution that supports both cgo and WASM, that would be nice. The only advantage of a go-pdfium WASM version for me would be able to run it in a place where you can't run the pdfium C++ library directly, so the browser for example. |
I don’t have time . But I expect someone else will. Maybe tag this issue with something appropriate |
All wasm runtimes are using wit format as a DSL . It’s replaces CGO etc is a simplified way to think about it . https://github.com/theduke/wasi-sql/blob/main/schema/sql_v1_alpha1.wit see : tetratelabs/wazero#662 it will code gen the DSL for you |
Yeah, but I don't see the added value of that right now, why is that better than CGO? |
It should be much faster . no cgo and wasm via em++ produces lean wasm deployment is easier because the desktop and mobile is runtime linking mess is 100% bypassed. You just loading wasm that you embedded using normal golang embedding easier to manage and implement because one version runs everywhere . Clearly way easier to maintain imho. |
Your threading ide case should also be easier and you gain security sandboxing in the cloud. Docker is not secure in this sense. |
Sounds good! Looking forward to your benchmarks! |
I don’t have time to work on this , |
Me neither 😆 |
Interesting thread. FYI, we've opened a gophers slack wazero channel for chatter as you need it. Also, we notice a lot of people struggle with wasm in general (including ourselves 😊) so started adding notes pages which may help a bit. https://wazero.io/languages/ |
ps on no CGO there's also a cool devops win which you don't need to care about the OS or install shared libraries etc. https://gist.github.com/codefromthecrypt/edb33284354d592dc6056b9b7263872f |
@codefromthecrypt would it be possible to generate Go code that calls into the WASM automatically? So that you have an actual Go interface like in CGO, and not the |
@jerbob92 I think that's what @knqyf263 is trying to do with https://github.com/knqyf263/go-plugin |
@codefromthecrypt I think that's rather for Go programs compiled into WASM. In this case we're trying to call into the prebuilt PDFium WASM. |
right I guess most common would be TinyGo. It can import functions from other wasm, as well export its own https://wazero.io/languages/tinygo/ |
I actually just found this: emscripten-core/emscripten#14459 |
Hey @jerbob92 maybe a different approach could also be taken . We could run pdfium ( wasm ) with Wazero and wrap golang calls into it via host functions or similar . tetratelabs/wazero#601 Seems to indicate that Wazero does support emscripten based wasm. You could probably also use this technique for browser by running pdfium ( wasm ) inside a Web Worker and then use tinygo to compile the golang wrapper function and run them outside the WebWorjer in the normal Browser Window. So the architectural topology for Browser and Server ( with Wazero ) is similar. Just an idea . I stumbled across this solution when playing around with other another lib . It was not for Emscripten aspect, but was to enable quasi threading by using Workers ( single thread ) with a Controller managing sending work to be done to each Worker. A Bud that works for the Browser target and Wazero target is another thing I am planning to work on so that this Architecture is easier to use. curious what you and @codefromthecrypt think of this approach . It side steps the problem of many languages needing to compile together by isolating them using process barriers ( not a great word to describe it I know ). |
@gedw99 pdfium compiled to wasm does not work currently, whatever compiler you use. Either the compilers need to be extended to have support for the things that pdfium need, or it needs to be patched out of pdfium. So the problem right now is not specifically Wazero or Emscripten. |
Got it .. egg on face :) |
@ncruces wondering if you had to do some heavy lifting or not to get RethinkRaw working nicely in wasm form. If you happen to have hobby or otherwise interest in PDFium feels like this thread is getting stuck and perhaps you have some advice based on your experiences. I'd hate to see folks end in a cul-de-sac regardless of why. |
No, no significant patching necessary . But that's because For instance, I don't even bother exporting functions, I just call This is also because None of these considerations apply to |
It's quite clear what needs to happen to pdfium to make it work correctly: replace the allocator by one that does not use ASLR and probably also not virtual memory pages. Besides not knowing a lot of CPP, I also don't really have the time to put into it. But honestly, the biggest issue right now is that I don't have any idea if it is even going to perform so I'm quite hesitant to put in that time, even if I would have it. |
While it doesn't always work, I was surprised last year. How about adding #hacktoberfest topic to this repo and the same to this issue (possibly re-doing the title and description about the allocator change)? If I find someone with CPP background and some excuse I'll also divert them here. |
Exciting news! I just discovered that newer pdfium versions have a new build option I'm now able to:
I'm now stuck at loading the page with |
hey @jerbob92 Thats amazing stuff !! Don't know for sure but HackPadFS might help with the File System aspects. Its designed for WASM golang and tinygo. |
@gedw99 Pdfium isn't written in Go, it's compiled to WASM using Emscripten, so that package is not usable. |
An Emscripten maintainer told me that there are no current plans to create a WASI backend, and that it's probably best to make one ourselves based on one of the existing backends: https://github.com/emscripten-core/emscripten/tree/main/system/lib/wasmfs/backends Sadly my C++ isn't that good, so not sure if it's going to work out, but I'll try. |
Once you've done All it requires is that you define an That should be much easier than supporting the entire range of WASI syscalls. That's only for reading PDFs. Not sure if there's anything similar for writing them. |
I'm planning to implement all methods, just like in the cgo version. I already have FPDF_LoadCustomDocument implemented in the cgo version, but it might be problematic in the WASM version. Since Wazero already supports WASI, there isn't much for me to implement, just the translation layer for WasmFS in Emscripten. And we will need that anyway for the fonts. |
So after a lot of changes to Emscripten and Wazero to make pdfium usable on non-web environments and with a lot of help from @codefromthecrypt, a fully working example could be made! I can successfully render pages now! Initial tests, rendering a fairly simple PDF into a 2000x2000 image: CGO: This was measured without the engine initialization and without the |
Wow interesting and amazing work @jerbob92 I wonder if the wasm is faster on multiple runs? Might be a warm up aspect |
@gedw99 Tried that for you, secondary renders of the same document are indeed faster (I did close and re-open the file/page/bitmap in the loop), probably it's faster because it doesn't have to load the same fonts again. For the Webassembly version it takes off about 150ms, so then it becomes 650ms-700ms. I would have expected it to be somewhat slower, but not this much, kinda disappointed. |
Ok... I feel a bit stupid, I had a thought this morning in the shower and it was true... I was using the Interpreter of Wazero (and not the Compiler), probably because the tracing only works with the Interpreter and I never switched it back. The Compiler has way way way better speed (from binary data, for some reason secondary renders on the same path is broken in the compiler version): Initial render: 35ms-50ms So, still not as good as the CGO version, but already a lot better. We might be able to make some extra improvements to get the speed up but maybe @codefromthecrypt has some ideas on that. Probably not having to call the host for the invokes will already improve things. |
Shower thinking always helps :) I will have a Play with it - feels like it’s too slow compared to CGO still |
Sorry to ask but is there a makefile for this ? Maybe a wired up example too ? I am also curious about multi threading it in Wazero and the browser . for browser it needs to be a web worker with the main dom window loading up 4 web workers ( typical number used ). |
@gedw99 I have just updated https://github.com/jerbob92/go-pdfium-wasm to include a complete example. It also has some patches that should be applied to Wazero/Emscripten to make it work but you shouldn't have to worry about that, a compiled pdfium and patched Wazero is included. Multi-threading should be quite easy by calling This will allow you to do multiple operations concurrently. Be aware that pdfium itself is not multi-threaded though, so you can't do multiple operations on the same instance at the same time. |
Thank you @jerbob92 will see how I go with it and let you know |
Hey @jerbob92 you might like this Been using it to make dev more streamlined |
I have started on the webassembly implementation here: #64 |
The functionality in #64 has been completed, all the methods that work in the CGO implementation now also work in the WebAssembly implementation. The release will be on Friday after Wazero has released their v1.0.0. The latest benchmark indicate that it's about 2x as slow as native, but here's the thing, it's also 2x as fast as the CGO multithreaded go-plugin implementation, depending on what kind of operations you are doing. We are doing a lot of image rendering, which means a lot of data going back and forth, that has to be encoded en decoded over gRPC which takes a lot of time. And the file data itself also goes over gRPC, with the WebAssembly version you can load from a file path or Go reader directly and have it seek over the file which is much more efficient than loading in the complete file. So, all-in-all, a pretty good competitor for the CGO implementation, for the single-threaded direct CGO version because of the sandboxing and it won't segfault your program in case of CGO errors. For the multi-threaded CGO implementation it's a super-good competitor because it can do all the things that go-plugin couldn't (methods that require callbacks, like form filling, reading from a seekable reader, writing to a Go writer), and has sandboxing while still being about twice as fast. |
That’s an amazing effort and I really appreciate the professional summary. will definitely be using this on the Open Science project … |
May I suggest a PR to add go-pdfium to: |
@ncruces Yes, was going to do that after release :) |
This has been released in v1.4.0 🥳 |
https://github.com/bblanchon/pdfium-binaries Has a web assembly version.
golang is very capable in running web assembly. For example Wazero can run wasm with no cgo
why ?
One pdfium for all targets ( web, desktop, server, etc )
No cgo.
Easy to debug using chrome . https://blog.noops.land/debugging-webAssembly-from-go-sources-in-chrome-devtools
Anyone interested in exploring this architecture ?
The text was updated successfully, but these errors were encountered: