Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nodejs-polars): bindings for nodejs #1703

Merged
merged 90 commits into from Dec 20, 2021

Conversation

universalmind303
Copy link
Collaborator

This addresses #83

I wanted to see the level of difficulty with adding JS bindings to the project.

the neon-bindings allow zero copy interactions with the rust implementation. With some limitations of course. Anything being written to JS needs to be done via a copy, but that shouldnt be a big deal as most of the JS interface is just passing instructions to rust.

The neon-bindings package is much less developed than the python bindings so it will be some work to get the conversions to work. I tried to follow the patterns in the py-polars as much as possible.

Happy to start some discussions around this, or even some feedback on it. I end up using a lot of JS in my daily work, and would love to have something like this in the JS ecosystem.

@ritchie46
Copy link
Member

Hi @universalmind303 Thanks a lot for your PR. Doing so via FFI makes also a lot of sense. Could you make this PR alongside the already existing WASM example? I want to keep that as a POC.

Furthermore, could we as much as possible use the python polars API as reference SPEC? This way we can keep the names/ methods of the API similar between different languages.

@Brooooooklyn
Copy link
Contributor

@universalmind303 do you want to have a try with https://github.com/napi-rs/napi-rs ? It provide out of box solution for cross-compile and publish npm packages. Like SWC and many other packages

@universalmind303
Copy link
Collaborator Author

@universalmind303 do you want to have a try with https://github.com/napi-rs/napi-rs ? It provide out of box solution for cross-compile and publish npm packages. Like SWC and many other packages

good to know about this project. I do like that there is support for BigInt and &str . I'll give it a try and see how it goes!

@universalmind303
Copy link
Collaborator Author

@Brooooooklyn I was testing out the napi-rs package today, and things look good so far. I tried it out with a few methods that were difficult, or impossible to do using the neon apis. Specifically, methods that had to do in place mutations.

I was curious about one thing. The napi docs available mention using the wrap method to operate on classes and rust native objects. The neon package use external references, likely from the napi_create_external api.

both of these methods below behave similarly and perform the same tasks. Is there a reason the wrap is preferred in all of the docs? Is it because it gives you more direct bindings between rust and JS without having to create a client side wrapper for the implementation? Are there any major performance implications of using one over the other?. The create_external apis would be much simpler to refactor over from my existing implementation. I would be apt to using the wrap apis if there were compelling reasons to do so.

struct NativeClass {
  value: i32,
}


#[js_function(1)]
fn using_wrap(ctx: CallContext) -> JsResult<JsUndefined> {
  let count: i32 = ctx.get::<JsNumber>(0)?.try_into()?;
  let mut this: JsObject = ctx.this_unchecked();
  ctx
    .env
    .wrap(&mut this, NativeClass { value: count + 100 })?;
  this.set_named_property("count", ctx.env.create_int32(count)?)?;
  ctx.env.get_undefined()
}

#[js_function(1)]
fn add_wrapped(ctx: CallContext) -> JsResult<JsNumber> {
  let add: i32 = ctx.get::<JsNumber>(0)?.try_into()?;
  let this: JsObject = ctx.this_unchecked();
  let native_class: &mut NativeClass = ctx.env.unwrap(&this)?;
  native_class.value += add;
  ctx.env.create_int32(native_class.value)
}

#[js_function(1)]
fn using_external(ctx: CallContext) -> JsResult<JsExternal> {
  let count: i32 = ctx.get::<JsNumber>(0)?.try_into()?;
  let c = NativeClass { value: count + 100};
  ctx.env.create_external(c, None)
}

#[js_function(2)]
fn add_external(ctx: CallContext) -> JsResult<JsNumber> {
  let add: i32 = ctx.get::<JsNumber>(0)?.try_into()?;
  let ext = ctx.get::<JsExternal>(1)?;
  let native_class: &mut NativeClass = ctx.env.get_value_external(&ext)?;
  ctx.env.create_int32(native_class.value)
}


#[module_exports]
pub fn init(mut exports: JsObject, env: Env) -> Result<()> {
  let test_class = env
    .define_class("WrappedClass", using_wrap, &[
      Property::new(&env, "addWrapped")?.with_method(add_wrapped),
    ])?;
  exports.set_named_property("WrappedClass", test_class)?;

  exports.set_named_property("external", using_external)?;
  exports.set_named_property("addExternal", add_external)?;

  Ok(())
}
const rust_lib = require('./index.node')

const wrapped = new rust_lib.WrappedClass(100)
const external = rust_lib.external(100)

wrapped.addWrapped(100)
rust_lib.addExternal(100, external)

@Brooooooklyn
Copy link
Contributor

@universalmind303 There is no difference in performance between external and wrap. I prefer to use wrap in examples is just because it will reduce the codes in js side, and avoid create a new class which hold the external value.

@universalmind303
Copy link
Collaborator Author

universalmind303 commented Nov 19, 2021

@Brooooooklyn does napi-rs support setting symbols yet? There are a few symbols that I need to set on the object. I was looking through the code, and did not see any available methods for setting symbols.

Ideally it would be nice to do this in rust

const inspect = Symbol.for('nodejs.util.inspect.custom');
pl.Series.prototype[inspect] = function() {
  return this.get_fmt()
}

Something like this would be ideal

#[js_function()]
pub(crate) fn get_fmt(cx: CallContext) -> JsResult<JsString> {
  let this: JsObject = cx.this_unchecked();
  let series: &mut JsSeries = cx.env.unwrap(&this)?;
  let s = format!("{}", &series.series);
  cx.env.create_string(&s)
}

#[js_function(1)]
pub fn new_series(cx: CallContext) -> JsResult<JsUndefined> {
  let params = get_params(&cx)?;
  let name = params.get_as::<&str>("name")?;
  let items = params.get_as::<Vec<bool>>("values")?;
  let series: JsSeries = Series::new(name, items).into();
  let mut this: JsObject = cx.this_unchecked();
  let inspect = cx.env.create_symbol(Some("nodejs.util.inspect.custom"))?;
  this.set_property(inspect, get_fmt); // this wont work because set_property only takes in a JsString


  cx.env.wrap(&mut this, series)?;
  cx.env.get_undefined()
}

@Brooooooklyn
Copy link
Contributor

Symbol.for('nodejs.util.inspect.custom')

I'm afrait Node-API doesn't expose API for Symbol.for, and create_symbol(Some("nodejs.util.inspect.custom")) is not equivalent to Symbol.for

this.set_property(inspect, get_fmt); // this wont work because set_property only takes in a JsString

This is a bug, I've fixed it in 1.7.10.

If you want create Symbol.for in the rust side, you can call it through Global:

let global = ctx.env.get_global()?;
let symbol = global.get_named_property::<JsObject>("Symbol")?;
let symbol_for_fn = symbol.get_named_property::<JsFunction>("for")?;
let symbol_desc = ctx.env.create_string("nodejs.util.inspect.custom")?;
let inspect_symbol = symbol_for_fn.call(Some(symbol), &[symbol_desc ])?;

this.set_property(inspect_symbol , get_fmt); // this should work in the fixed version

@universalmind303 universalmind303 marked this pull request as ready for review December 16, 2021 22:50
@universalmind303 universalmind303 changed the title feat(js-polars): POC for js-polars bindings feat(nodejs-polars): bindings for nodejs Dec 16, 2021
@universalmind303
Copy link
Collaborator Author

universalmind303 commented Dec 17, 2021

here is a link to the working GH actions.

https://github.com/universalmind303/polars/actions/runs/1583566747

and a link to npm
https://www.npmjs.com/package/nodejs-polars

There will still need to be a bit of work on setting up releases & tagging. Right now all the npm stuff is pointed to nodejs-polars. We will probably want to change that to be polars.

This PR serves as a usable starting point, there are still a few items that need to be done before this is on par with the python and rust libraries.

  • Docs page
  • CI tests (tests are done, just need to do ci testing for multi platforms)
  • add more cross compile targets.
  • CI integration with main fork. (npm, release tagging, changelog, etc)
  • benchmarks
  • clean up the series rust interface.
  • better IO support
    • parquet
    • ipc
    • better reader support (read streams, node buffers)
  • toJS for Series and Dataframe are kinda slow.
    • The serde_json implementation for Series is performant, but it doesnt output the data into a dataframe like object {[name]: DataType[]} and instead output as Array<{name:string, type: string, values: any[]}> it also doesnt handle BigInt data types, so some precision could be lost on i64 & u64
  • downsample/pivot
  • apply/map
  • DataFrame proxy for row/column access ex: df[0,0]

Note:
It looks like napi-rs 2.x is almost out of beta, so it might make the most sense for some of these features to just wait until napi 2.x is stable.

@universalmind303
Copy link
Collaborator Author

some early benchmarking on comparing it against the native methods. As expected, it underperforms some native operations on smaller datasets. As the datasets get larger, the performance gains become more apparant.

// did this for 1k, 10k, 100k, and 1m
const ints1k = Array.from({length: 1000}, () => chance.integer());
const strings1k = Array.from({length: 1000}, () => chance.sentence());
const countries1k = Array.from({length: 1000}, () => chance.country());
const ints1kSeries= pl.Series("ints", ints1k);
const strings1kSeries = pl.Series("strings", strings1k);
const countries1kSeries = pl.Series("countries", countries1k);

// array
timeit("intsort", () => ints1k.sort());
timeit("intsum", () => ints1k.reduce((a, b)=> a + b));
timeit("stringsort", () => strings1k.sort());
timeit("valueCounts",  () => getValueCounts(ints1k));
timeit("distinct", () => new Set(strings1kDS));

// series
timeit("intsort", () => ints1kSeries.sort().toJS().values);
timeit("intsum", () => ints1kSeries.sum());
timeit("stringsort", () => strings1kSeries.sort().toJS().values);
timeit("valueCounts", () => countries1kSeries.valueCounts().toJS({orient:"col"}));
timeit("distinct", () => strings1kSeries.unique().toJS().values);

1k

operation array series
intsort 1.893ms 1.384ms
intsum 0.14ms 0.256ms
stringsort 0.566ms 1.286ms
valueCounts 0.463ms 24.512ms
distinct 7.302ms 2.054ms

10k

operation array series
intsort 19.927ms 6.12ms
intsum 1.157ms 0.682ms
stringsort 8.17ms 9.094ms
valueCounts 2.697ms 15.616ms
distinct 74.603m 15.151ms

100k

operation array series
intsort 342.164ms 46.001ms
intsum 27.854ms 0.218ms
stringsort 148.274ms 78.135ms
valueCounts 9.612ms 11.67ms
distinct 425.984ms 73.708ms

1m

operation array series
intsort 5.321s 325.294ms
intsum 36.627ms 0.905ms
stringsort 2.682s 1.041s
valueCounts 94.758ms 50.467ms
distinct 1.714s 1.414s

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man this looks awesome. It seems you have done a lot of work! Thanks a lot! 💯

}
]
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run a formatter that adds a newline char to all the files?

@ritchie46
Copy link
Member

Looks good @universalmind303. The CI is failing, but after that it should be good to go.

@universalmind303
Copy link
Collaborator Author

Looks good @universalmind303. The CI is failing, but after that it should be good to go.

sounds good, i got the formatting fixed, and all tests are passing now.

@ritchie46 ritchie46 merged commit fb4dd2b into pola-rs:master Dec 20, 2021
@ritchie46
Copy link
Member

Thanks a lot @universalmind303! Truly a great deal of work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants