Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve JavaScript objects and object arrays for R data.frame objects #401

Merged
merged 8 commits into from
Apr 4, 2024

Conversation

georgestagg
Copy link
Member

@georgestagg georgestagg commented Apr 3, 2024

Fixes #398.

Reserve JavaScript objects and object arrays for R data.frame objects when constructing R objects with the generic RObject constructor. Also adds the RDataFrame helper class constructor for constructing an R data.frame explicitly.

@lionel- How does this look? With these changes, using the generic constructor looks like:

// Creates `data.frame`
await new webR.RObject({ a: [1, 2, 3], b: ['x', 'y', 'z'] }); 

// Throws an error, inconsistent column length
await new webR.RObject({ a: [1, 2, 3], b: ['x', 'y'] });
//Uncaught Error: Can't construct `data.frame`. Source object is not eligible.

// Creates a list, syntax supports duplicate names
await new webR.RObject({
  type: 'list',
  names: ['a', 'b'],
  values: [[1, 2, 3], ['x', 'y']]
});

// Also creates other types of R objects as before, using heuristics
await new webR.RObject([1, 2, 3]); 

And we have specific RList and RDataFrame constructors:

// Creates a list, no `data.frame` class
await new webR.RList({ a: [1, 2, 3], b: ['x', 'y', 'z'] });

// Creates a list without error, but this JS form does not support duplicate names
await new webR.RList({ a: [1, 2, 3], b: ['x', 'y'] });

// Creates `data.frame`, as with `RObject
await new webR.RDataFrame({ a: [1, 2, 3], b: ['x', 'y', 'z'] });
await new webR.RDataFrame([{a: 1, b: 'x'}, {a: 2, b: 'y'}, {a: 3, b: 'z'} ]);

// Throws an error, inconsistent columns
await new webR.RDataFrame([{a: 1, b: 'x'}, {a: 2, b: 'y'}, {a: 3, b: 'z', c: true} ]);
// Uncaught Error: Can't construct `data.frame`. Source object is not eligible.

// Unlike with the generic `RObject` constructor, this does not fall back to heuristics
await new webR.RDataFrame([1, 2, 3]); 
// Uncaught Error: Can't construct `data.frame`. Source object is not eligible.

This commit generalises the type union `RType | 'object'` used when
constructing new R objects. We create a new `RClass` type for `object`,
(corresponding to the generic `RObject` helper constructor).

This will become useful in a moment when we add a new `data.frame` helper
constructor. Rather than an ever growing type union of R class names,
we'll have a neater `RClass` type.
@georgestagg georgestagg force-pushed the strict-data-frames branch 2 times, most recently from 6282b07 to f3bb7d0 Compare April 4, 2024 07:52
export type RTypeNumber = typeof RTypeMap[RType];

/** @internal */
export type RCtor = 'object' | 'dataframe';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is to explain the introduction of the RCtor type. This is an internal/private type, used only as an implementation detail for certain kinds of message sent over the communication channel.

Previously, there was just one "helper" constructor provided by webR: the RObject generic constructor. This case was signalled over the communication channel using the string 'object', and this was typed using a union of RType and the literal 'object'.

With this PR, there is a new "helper" constructor RDataFrame that similarly does not correspond directly to a fundamental R type. Rather than just extending the type union with multiple literals in multiple places, the type RCtor has been added as a DRY generalisation of this mechanism.

The string 'dataframe' indicates over the communication channel that we want to create an object using the RDataFrame constructor, 'object' indicates the RObject constructor, and we can expand this for future "helper" classes simply by adding more literals to this type.

(PS: I use 'dataframe' here rather than 'data.frame' so that the key does not need to be quoted when used as an object property elsewhere.)

The RType type is unchanged, and as before indicates that we want to construct some fundamental R type, e.g. 'list' indicates the RList constructor.

@georgestagg georgestagg requested a review from lionel- April 4, 2024 07:57
Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I just wonder if the RList constructor should have better support for unnamed lists and lists with duplicate names. Maybe

// Unnamed list
await new webR.RList([1, 2, 3]);

// Named list
await new webR.RList([[1, 2, 3], ["a", "b", "c"]]);

Or alternatively:

await new webR.RList({ values: [1, 2, 3], names: ["a", "b", "c"]});

I guess the latter is less ergonomic, though it's clearer and consistent with the typed RObject syntax.

@georgestagg
Copy link
Member Author

georgestagg commented Apr 4, 2024

// Unnamed list
await new webR.RList([1, 2, 3]);

IIRC this one works already! Though, I should add a test for it.


Another possibility for named lists is the syntax returned by JavaScript's Object.entries()? For example, consider

> Object.entries({ a: [1,2,3], b: [3,4,5] })
< [
    ["a", [1,2,3]],
    ["b", [3,4,5]],
  ]

So for your example, this would look like:

await new webR.RList([ ["a", 1], ["b", 2], ["c", 3] ]);

I think the problem with these in-band approaches is that they can be ambiguous. This could also mean "an unnamed list of atomic vectors", where the vector values are coerced into strings like with c().


Thinking about it, why not just use an extra argument with a default null, meaning unnamed? When non-null, the first argument should be an array of equal length as the names array (I think Typescript should be able to enforce this).

Then, for a named list of three elements we'd have:

await new webR.RList([1, 2, 3], ["a", "b", "c"]);

which trivially supports declaring duplicate names.

EDIT: I like the latter a lot, so I've added it. I might revisit the other options in a future PR.

@georgestagg georgestagg merged commit 442e5ac into main Apr 4, 2024
3 checks passed
@georgestagg georgestagg deleted the strict-data-frames branch April 4, 2024 10:55
@lionel-
Copy link
Member

lionel- commented Apr 4, 2024

That sounds good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tighten up constructing R data.frames from JS objects when using RObject constructor
2 participants