Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column filtering and generating columns not properly supported #42

Open
lukaswagner opened this issue Feb 21, 2022 · 2 comments
Open

column filtering and generating columns not properly supported #42

lukaswagner opened this issue Feb 21, 2022 · 2 comments

Comments

@lukaswagner
Copy link
Owner

The current implementation does not allow filtering columns. On top of this, generated columns aren't working either, as the passed function can't be serialized for passing to a worker. To solve both issues, the following interface should be implemented:

type OriginalColumn = {
    /**
     * References a detected column.
     */
    index: number,
    /**
     * May override name.
     */
    name?: string,
    /**
     * May override data type.
     */
    type?: DataType,
    /**
     * Map values to same type. This may be used as a simpler alternative to
     * generated columns, e.g. for scaling or offsetting of values.
     */
    map?: (v: any) => any
}

type GeneratedColumn = {
    name: string,
    type: DataType,
    /**
     * In order to be passed to workers, this has to be a string.
     * The function must be of type (line: string[]) => any.
     */
    func: string,
    /**
     * Allow sharing a state between func invocations, as well as passing data
     * to func. Note that this has to be fully serializable.
     */
    state: Object // func.bind(state), can be used to pass state
}

type Column = OriginalColumn | GeneratedColumn;
@Floriszenz
Copy link
Collaborator

Does this proposed interface consider "merged columns", i.e. vector columns?
Currently, vector columns are not supported. But if we want to support them, it would be great to merge columns into a vector column using the open({ columns: /* ... */ }) method.
On the other hand, if the parser would support some auto-merging, e.g. merging columns x, y, and z into a vec3, it would be nice to optionally split the vector column into single columns.
I think this should be considered in the proposal to be more future proof.

@lukaswagner
Copy link
Owner Author

lukaswagner commented Feb 22, 2022

This should be solvable by extending index to number | number[]. You could then pass an array of numbers, signalling the parser to merge these columns. This would also require addition of DataType.tuple, as well as a way of specifying the type per merged column.

open({ columns: [{
    index: [0, 1, 2],
    name: 'position',
    type: DataType.Tuple,
    types: [DataType.Number, DataType.Number, DataType.Number],
}]});

Similarly, the parser may pass an array of numbers to when proposing an auto-merge. In order to still access the original column names when rejecting the merge, the parser could pass the names of the original columns:

// returned by open()
const merged = {
    index: [0, 1, 2],
    name: 'position',
    type: DataType.Tuple,
    names: ['x', 'y', 'z'],
    types: [DataType.Number, DataType.Number, DataType.Number],
}});
// split back up
const columns: merged.names.map((n, i) => {
    index: merged.index[i],
    name: n,
    type: merged.types[i],
});

These changes should intergrate nicely into the proposed interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants