Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebAssembly interface option #11

Merged
merged 4 commits into from
Jun 14, 2022

Conversation

thomascwells
Copy link
Contributor

This PR adds an additional interface option that uses fortran code compiled into WebAssembly (.wasm) with a JavaScript (.js) interface. The wasm and js files are loaded using the V8 package, based on Chrome's V8 engine.

The wasm and js files were generated by @tmm1 in the repo tmm1/taxsim.js. These are also available on the NBER website demo, however the files available on github may represent more current versions.

The wasm interface is roughly 3x faster than ssh on my laptop (tested with input data of 5k and 50k rows). The html interface fails for input data of more than ~1000 rows.

R/calculate_taxes.R Outdated Show resolved Hide resolved
As suggested by @tmm1 to improve compatibility with js
@shanejorr
Copy link
Owner

Thanks for this @thomascwells and @tmm1, being able to do calculations locally is a huge plus!

I tested everything on my laptops and it seems great. I even rebuilt the pkgdown site using the 'wasm' option and all output seemed to work fine. I also sent it to win-builder using devtools::check_win_devel() and devtools::check_win_release(). No issues were spotted with it. That said, I feel comfortable it will pass CRAN.

I have a couple of question prior to merging and re-sending to CRAN:

  1. Should 'wasm' be the default method? It has the benefit that everything is done locally. This is faster and removes the whole connecting to the server issue. Any downsides that would prevent it from being the default?
  2. Now that we have the 'wasm' option, should we remove 'http'? I say this because http throws an error for large datasets. I know the NBER site says 1000, but I was getting errors somewhere between 2000 and 3000 (it worked at 2K, failed at 3K). If we did keep http, it seems to make sense to throw a custom error on any input data set over 2000 rows saying that you cannot use the http option with data sets larger than 2000 rows. This may have been worth it when we only had ssh, as at least users would have one other option if ssh failed. But, now that they have the wasm option, I'm fine taking my chances that either ssh or wasm will work. If the http issue gets fixed on the NBER end then it may be worth adding it back. Until then, I'm not sure we need it. But, I wanted to check and make sure I'm not missing something.

Also, once I merge I will update the documentation on the pkgdown site then resend to CRAN unless you think there will be more changes / additions in the short-term. But, I would like to get the wasm functionality on CRAN sooner rather than later.

@tmm1
Copy link
Contributor

tmm1 commented Jun 14, 2022

I think making wasm default is sensible. There are no caveats that I am aware of.

A possible advantage of ssh/http is that Dan can update those on-demand, whereas changes to wasm will require updating the vendored files and making a new release.

As far as http, I think it's worth keeping around. It's unclear why it's limited at the moment, but I plan to investigate further and hopefully it can be resolved quickly. There may be systems/platforms where v8 doesn't work and ssh is unavailable.

@thomascwells
Copy link
Contributor Author

... unless you think there will be more changes / additions in the short-term.

Maybe hold off for a day or two. Dan said he was going to work on possibly generating a version of the Fortran code that could be called directly from R (e.g. using the .Fortran() function). If that's the case, it could be ever better than wasm. I'll try to get in touch with him tomorrow.

There may be systems/platforms where v8 doesn't work and ssh is unavailable.

I also think that's possible. It's unlikely, but good enough reason to keep the http option around for now. Can always deprecate it later if needed.

A possible advantage of ssh/http is that Dan can update those on-demand, whereas changes to wasm will require updating the vendored files and making a new release.

I also see that as a possible advantage of wasm - it's more reproducible. Rewind to a previous version of the package and you will get whatever results were being produced at the time. Easier to revert if bugs are ever introduced, etc. I do like the idea of having "latest and greatest" always available on the web, but I also really like reproducibility.

@shanejorr
Copy link
Owner

I was not planning on resubmitting to CRAN until this weekend, so a few days should not be an issue. I will merge and make wasm the default. I will also also add an error or at a minimum warning about using http with large data sets. Of course, I will double double check the issue right before resubmitting to see if it has been resolved. Finally, I will get all the documentation updated and in good shape. Thanks again!

@shanejorr shanejorr merged commit 0ffcf93 into shanejorr:main Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants