Skip to content

Dataset

paulwilcox edited this page Jul 11, 2021 · 5 revisions

Home


A dataset is a structure of values in tabular format ultimately represented by an array of of javascript objects for which querying (SQL-like) operations are available.

Setup

Below, the fluent-data library is loaded into the variable $$ and three datasets are constructed with array arguments. These datasets are used by the examples in this page.

let $$ = require('./dist/fluent-data.server.js');

let students = $$([
    { id: 'a', name: 'Andrea',  topic: 'Abelard',   bias: 'analytic'    },
    { id: 'b', name: 'Brielle', topic: 'Bentham',   bias: 'buddhist'    }
]);

let teachers = $$([
    { id: 'b', name: 'Brielle', topic: 'bijection', school: 'Berkley'   },
    { id: 'c', name: 'Chloe',   topic: 'change',    school: 'Cambridge' }
]);

let purchases = $$([
    { customerId: 'b', books: 4, time: 16.68, price: 560, rating: 73 },
    { customerId: 'a', books: 1, time: 11.50, price:  80, rating: 95 },
    { customerId: 'a', books: 1, time: 12.03, price: 150, rating: 92 },
    { customerId: 'b', books: 2, time: 14.88, price: 220, rating: 88 },
    { customerId: 'a', books: 3, time: 13.75, price: 340, rating: 90 },
    { customerId: 'b', books: 4, time: 18.11, price: 330, rating: 66 },
    { customerId: 'a', books: 5, time: 21.09, price: 401, rating: 54 },
    { customerId: 'b', books: 5, time: 23.77, price: 589, rating: 31 }
]);

Construction

using fluent-data

A dataset will be produced by executing fluent-data as a function.

Parameters:

  • data: An iterable object (such as an array) having complex objects as properties.

Example:

The example below creates a dataset by executing the fluent-data function. It utilizes the log method defined on datasets.

$$(purchases).log(null, '$$(purchases):');
$$(purchases):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
using fluent-data.dataset

A dataset will be produced by referencing fluent-data.dataset.

Parameters:

  • data: An iterable object (such as an array) having complex objects as properties.

Example:

The example below creates a dataset by referencing fluent-data.dataset. It utilizes the log method defined on datasets.

new $$.dataset(purchases).log(null, '$$.dataset(purchases):');
$$.dataset(purchases):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

General Methods

distinct

Eliminates duplicate rows in a dataset.

The following parameters are available:

  • func: An optional parameter takes a function having a dataset row as input and produces a value that is used to determine equality between rows. If omitted, the full row is considered.
  • sorter: When two distinct rows that are equal under the func equality comparer, this parameter ensures there is a definition of order so that the first one can be chosen.

Below is an example of distinct used without a parameter. The number of records in the dataset are fewer because duplicates are removed.

purchases
    .map(p => ({
        customerId: p.customerId,
        books: p.books
    }))
    .distinct()
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚
β”‚ a          β”‚ 1     β”‚
β”‚ b          β”‚ 2     β”‚
β”‚ a          β”‚ 3     β”‚
β”‚ a          β”‚ 5     β”‚
β”‚ b          β”‚ 5     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

And this is an example of distinct used with both the optional func and sorter parameters. Just as before, the number of rows are reduced and duplicates are removed. But this time the definition of what consitutes a duplicate does not involve the whole row. So for the rows representing the distinct group, all columns (even not ones involved in distinct equality) survive.

The sorter helps control which row is selected to represent each distinct group.

purchases
    .distinct(
        p => p.customerId, 
        p => [p.customerId, -p.rating]
    )
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
filter

Returns elements of a dataset that pass a boolean test.

The parameter func should be a function with a dataset row as input and a boolean value as output. A true value will return the row in the final result set, a false value will exclude the row.

The example below filters the purchases dataset to only include purchases from customer 'a'.

purchases
    .filter(p => p.customerId == 'a')
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
map

Creates a new dataset produced by calling a function on every element of the input dataset.

The sole parameter expects a one-parameter function with a dataset row as input and a reshaped row as output.

The example below shows a complex use of the map function that demonstrates many of its flexible features.

purchases
    .map(p => ({
        ...p,                  // return all properties of 'p'
        speed: p.time,         // but also add a 'speed' property that copies 'time'
        time: undefined,       // then delete 'time' [.get() will omit undefined props] 
        rating: undefined,     // and delete 'rating',
        perBook: $$.round(     // and create a new property
            p.price / p.books, 
            1e-2
        )   
    }))
    .log(); 

The use of the spread operator, the fact that repeated properties will return the value listed last, and the fact that get() will not output undefined properties; all combine to result in a mapping that outputs all properties, but with 'time' renamed to 'speed', and with 'rating' deleted, and with 'perBook' added.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ price β”‚ speed β”‚ perBook β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚ 560   β”‚ 16.68 β”‚ 140     β”‚
β”‚ a          β”‚ 1     β”‚ 80    β”‚ 11.5  β”‚ 80      β”‚
β”‚ a          β”‚ 1     β”‚ 150   β”‚ 12.03 β”‚ 150     β”‚
β”‚ b          β”‚ 2     β”‚ 220   β”‚ 14.88 β”‚ 110     β”‚
β”‚ a          β”‚ 3     β”‚ 340   β”‚ 13.75 β”‚ 113.33  β”‚
β”‚ b          β”‚ 4     β”‚ 330   β”‚ 18.11 β”‚ 82.5    β”‚
β”‚ a          β”‚ 5     β”‚ 401   β”‚ 21.09 β”‚ 80.2    β”‚
β”‚ b          β”‚ 5     β”‚ 589   β”‚ 23.77 β”‚ 117.8   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

These tactics aren't always warranted for small numbers of properties, but for more complex rows, they can result in a syntax more pleasing than SQL.

matrix

Converts a dataset to a matrix

Parameters:

  • selector: Comma-separated string of property names or function with a dataset row as input and an array of numbers as output.
  • rowNames: A string pointing to a property name of the dataset rows or a function with a dataset row ad input and returning the name as a string for the given row.

Example:

let froCsv = 
    purchases
    .matrix('books, price, time', 'customerId')
    .log();

let fromFuncs = 
    purchases
    .matrix(p => [p.books, p.price, p.time], p => p.customerId)
    .log();
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚   β”‚ books β”‚ price β”‚ time  β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b β”‚ 4     β”‚ 560   β”‚ 16.68 β”‚
β”‚ a β”‚ 1     β”‚ 80    β”‚ 11.5  β”‚
β”‚ a β”‚ 1     β”‚ 150   β”‚ 12.03 β”‚
β”‚ b β”‚ 2     β”‚ 220   β”‚ 14.88 β”‚
β”‚ a β”‚ 3     β”‚ 340   β”‚ 13.75 β”‚
β”‚ b β”‚ 4     β”‚ 330   β”‚ 18.11 β”‚
β”‚ a β”‚ 5     β”‚ 401   β”‚ 21.09 β”‚
β”‚ b β”‚ 5     β”‚ 589   β”‚ 23.77 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚   β”‚ c0 β”‚ c1  β”‚ c2    β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b β”‚ 4  β”‚ 560 β”‚ 16.68 β”‚
β”‚ a β”‚ 1  β”‚ 80  β”‚ 11.5  β”‚
β”‚ a β”‚ 1  β”‚ 150 β”‚ 12.03 β”‚
β”‚ b β”‚ 2  β”‚ 220 β”‚ 14.88 β”‚
β”‚ a β”‚ 3  β”‚ 340 β”‚ 13.75 β”‚
β”‚ b β”‚ 4  β”‚ 330 β”‚ 18.11 β”‚
β”‚ a β”‚ 5  β”‚ 401 β”‚ 21.09 β”‚
β”‚ b β”‚ 5  β”‚ 589 β”‚ 23.77 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
sort

Orders the rows of a dataset.

The function sorter establishes the criteria in which to sort the rows. If sorter is a one-parameter function, then it's result directly serves as the ordering criteria. If it is a two-parameter function, then it should return an integer. Assume the two parameters, in order, are a and b. If sorter returns a negative number, then effectively, 'a comes before b'. If it returns 0, then 'a equals b', at least in terms of ordering. IF it returns a positive number, then 'b comes before a'.

The example below sorts purchases the 'hard way'. In other words, it uses the two-parameter syntax that returns an integer.

purchases.sort((p,p2) => 
    p.customerId > p2.customerId ? 1
    : p.customerId < p2.customerId ? -1  
    : p.rating > p2.rating ? -1 
    : p.rating < p2.rating ? 1
    : 0
)
.log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

But because sort works with arrays, sorting in tiers based on the element positions, the is done. The elements are first sorted by customer in ascending order, and secondarily ordered by rating in descending order. It is equivalent to the snippet above.

purchases
    .sort(p => [p.customerId, -p.rating])
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Group-Related Methods

A dataset can have a nested structure. Allthough most methods are built around this feature of a dataset, the methods in this section particularly highlight the nested structure.

apply

Applies a function to every base-level grouping of a dataset.

The sole tableLevelFunc parameter expects a dataset-like object as input (i.e. an iterable that produces a dataset row on each iteration). It returns a dataset-like object as output. Alternatively, it can be made async and yield objects as rows.

The apply() method is not recommended for direct use. However, most methods defined on dataset use it under the hood. If it is ever used directly, it is likely in order to extend dataset and write your own method that can operate on grouped data.

This example extends dataset and uses apply in order to create a custom method that converts every row's value to it's type description.

class myDataset extends $$.dataset {

    typeOfs () {
        function* tableLevelFunc (data) {
            for(let row of data) {
                for(let key of Object.keys(row))
                    row[key] = typeof row[key];
                yield row;
            } 
        };
        this.apply(tableLevelFunc);
        return this;
    }

}

new myDataset(purchases)
    .group(p => p.customerId)
    .typeOfs()
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ key: "b"                                           β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ customerId β”‚ books  β”‚ time   β”‚ price  β”‚ rating β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ key: "a"                                           β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ customerId β”‚ books  β”‚ time   β”‚ price  β”‚ rating β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β”‚ string     β”‚ number β”‚ number β”‚ number β”‚ number β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
group

Gathers the rows of a dataset into seperate nestings based on a criteria.

The sole parameter expects a function that has a dataset row as input and returns a value to be used as the criteria on which to group the rows. The value can be complex, such as an array or object.

An important feature of a dataset is that most methods applied to grouped datasets operate inside of each grouping.

The example below creates a special 'flag' property out of certain thresholds stemming from 'rating'. The mapped rows are then grouped by customerId and by the flag. Finally, the rows in each group are filtered to output only rows with a rating greater than 50.

purchases
    .map(p => ({
        ...p,
        flag: p.rating < 60 ? 'bad' : p.rating < 90 ? 'okay' : 'good'
    }))
    .group(p => [p.customerId, p.flag])
    .filter(p => p.rating > 50)
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ key: ["b","okay"]                                      β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚ flag β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚ okay β”‚ β”‚
β”‚ β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚ okay β”‚ β”‚
β”‚ β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚ okay β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ key: ["a","good"]                                      β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚ flag β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚ good β”‚ β”‚
β”‚ β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚ good β”‚ β”‚
β”‚ β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚ good β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ key: ["a","bad"]                                       β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚ flag β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚ bad  β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ key: ["b","bad"]                                       β”‚
β”‚ β”Œβ”€β”€β”                                                   β”‚
β”‚ β”‚  β”‚                                                   β”‚
β”‚ β””β”€β”€β”˜                                                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
ungroup

Rolls back the lowest level of grouping and flattens rows in sibling groups into one set of data.

The optional mapper parameter allows a user to pass a mapping function to be applied to each row simultaneous with the ungrouping process.

If grouping is already at the top level, it is possible to apply ungroup and output a naked object, provided that the original dataset only had one row.

In the code below, purchases are grouped, then filtered, then ungrouped.

purchases
    .group(p => p.customerId)
    .filter(p => p.rating >= 55)
    .ungroup()
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Merge-Based Methods

Methods in this section ultimately wrap the merge() method. It is where the various 'join' and 'exists' methods reside.

exists

A row in a 'left' dataset is output only if it has a 'match' in the 'right' dataset.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .exists(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic   β”‚ bias     β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b  β”‚ Brielle β”‚ Bentham β”‚ buddhist β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
join

Merges two datasets such that when rows between them 'match', a new row is output that combines both their properties (on name collision the right property wins out).
Unmached rows are omitted from the results.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .join(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic     β”‚ bias     β”‚ school  β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b  β”‚ Brielle β”‚ bijection β”‚ buddhist β”‚ Berkley β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
joinLeft

Merges two datasets such that when rows between them 'match', a new row is output that combines both their properties (on name collision the right property wins out).
Unmached rows from the left dataset are output as-is. And rows from the right dataset are omitted from the results.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .joinLeft(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic     β”‚ bias     β”‚ school  β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a  β”‚ Andrea  β”‚ Abelard   β”‚ analytic β”‚         β”‚
β”‚ b  β”‚ Brielle β”‚ bijection β”‚ buddhist β”‚ Berkley β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
joinRight

Merges two datasets such that when rows between them 'match', a new row is output that combines both their properties (on name collision the right property wins out).
Unmached rows from the left dataset are omitted from the results. And rows from the right dataset are output as-is.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .joinRight(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic     β”‚ bias     β”‚ school    β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b  β”‚ Brielle β”‚ bijection β”‚ buddhist β”‚ Berkley   β”‚
β”‚ c  β”‚ Chloe   β”‚ change    β”‚          β”‚ Cambridge β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
joinFull

Merges two datasets such that when rows between them 'match', a new row is output that combines both their properties (on name collision the right property wins out).
Unmached rows from either dataset are output as-is.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .joinFull(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic     β”‚ bias     β”‚ school    β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a  β”‚ Andrea  β”‚ Abelard   β”‚ analytic β”‚           β”‚
β”‚ b  β”‚ Brielle β”‚ bijection β”‚ buddhist β”‚ Berkley   β”‚
β”‚ c  β”‚ Chloe   β”‚ change    β”‚          β”‚ Cambridge β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
merge

Combines the results of two datasets into a single dataset.

This method expects the following parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mapper: Function, required. The logic of what to output when there is a match vs not. Expects two parameters, the 'left row' and the 'right row'. If there is a match, then both are defined. If a match is not found, then the left or the right parameter will be populated, depending on which dataset is sending the row.
  • options: Optional, object. Allows other configurations of the merge process.

If the 'options' parameter is set, then the following properties are recognized:

  • singular: Boolean. Sets leftSingular and rightSingular to this value, assuming they're not set already.
  • leftSingular: Boolean. If true, only distinct rows from the left dataset will be considered. Equality to determine distinction is what is passed to matcher. So it will not be full object equality (unless that is what's passed to matcher).
  • rightSingular: Boolean. The right-dataset counterpart to leftSingular.
  • hasher: Function. If set, leftHasher and rightHasher are also set to this value, assuming they're not set already.
  • leftHasher: Function. If set, the output of this function is compared with the outputs of rightHasher to determine equality between left and right dataset rows using a hashing algorithm. It can work without matcher to be the final word on equality, or it can work together with a matcher to first narrow down nearly equal objects into buckets, and then to make the final determination with the matcher.
  • rightHasher: Function. The right-dataset counterpart to leftHasher.
  • algo: String. The algorithm to use. Can be 'hash' or 'loop'.

All methods in this section wrap this method. In other words, this is the core implementation of all merges. But it can be fairly complex to use directly, hence the existence of the other methods.

Below, two merges are preformed. The purpose of the first is to effectively 'stack' the students and teachers to create a dataset of all people. If a person is both a student and a teacher, then the mapper logic chooses the student record to survive. If a person is only one but not the other, it will output the record regardless. The purpose of the second merge is to effectively 'left join' people to purchases. The final output is mapped in order to select a subset of columns, simply for cleaner visualisation.

students  
    .merge(
        teachers,               
        (s,t) => s.id == t.id, // seek to match records based on the 'id' property
        (s,t) => 
            (s&&t)      // check if s and t both exist (if a match was found for the rows)
            ? s         // if so, ignore t, just output s
            : (s||t)    // if not, output whichever exists
    )
    .merge(
        purchases,
        null,
        (st,p) =>
            (st&&p)             // check if s and t both exist
            ? { ...st, ...p }   // if so, output an object combining the properties of both
            : st,               // if not, only output st, ignore unmatched p's.  
        {
            leftHasher: st => st.id,            // seek to match based on the 'id' property ...
            rightHasher: p => p.customerId      // .. against the customerId property
        }
    )
    .map(stp => ({ 
        name: stp.name, 
        topic: stp.topic, 
        school: stp.school, 
        books: stp.books, 
        price: stp.price 
    }))
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ name    β”‚ topic   β”‚ books β”‚ price β”‚ school    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Andrea  β”‚ Abelard β”‚ 1     β”‚ 80    β”‚           β”‚
β”‚ Andrea  β”‚ Abelard β”‚ 1     β”‚ 150   β”‚           β”‚
β”‚ Andrea  β”‚ Abelard β”‚ 3     β”‚ 340   β”‚           β”‚
β”‚ Andrea  β”‚ Abelard β”‚ 5     β”‚ 401   β”‚           β”‚
β”‚ Brielle β”‚ Bentham β”‚ 4     β”‚ 560   β”‚           β”‚
β”‚ Brielle β”‚ Bentham β”‚ 2     β”‚ 220   β”‚           β”‚
β”‚ Brielle β”‚ Bentham β”‚ 4     β”‚ 330   β”‚           β”‚
β”‚ Brielle β”‚ Bentham β”‚ 5     β”‚ 589   β”‚           β”‚
β”‚ Chloe   β”‚ change  β”‚       β”‚       β”‚ Cambridge β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
notExists

A row in a 'left' dataset is output only if it has no 'match' in the 'right' dataset.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .notExists(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name   β”‚ topic   β”‚ bias     β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a  β”‚ Andrea β”‚ Abelard β”‚ analytic β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
notExistsFull

Returns all 'unmatched' rows between two datasets, regardless of which dataset the row comes from.

Parameters:

  • rightData: Dataset, required. The dataset to combine with the 'leftData' (the dataset from which the method is called).
  • matcher: Function, required. The logic on which to determine equality, indicating whether two records are a 'match' or not. Can also be the string '=', which
  • mergeOptions: Optional, object. Allows other configurations of the merge process. For a list of recognized properties that can be passed, see the description of the 'options' parameter under the merge method.

Example:

students
    .notExistsFull(teachers, (s,t) => s.id == t.id)
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name   β”‚ topic   β”‚ bias     β”‚ school    β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a  β”‚ Andrea β”‚ Abelard β”‚ analytic β”‚           β”‚
β”‚ c  β”‚ Chloe  β”‚ change  β”‚          β”‚ Cambridge β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reduce Based Methods

reduce

reduce accumulates dataset rows to return an aggregated result. It expects an object as a parameter. This object should have properties with reducers as values.

A reducer is an aggregating function. If a reducer only has one parameter, then when aggregating treat the parameter as an array of objects (iterate it, for instance). If a reducer has two parameters, then it works similarly to Array.reduce in that the first parameter is the accumulator and the second parameter is a row. If the two-parameter function passed has a 'seed' property defined, then its value is used as the seed (otherwise the seed is 0). Conveniently, you can set the seed by defining a string property of the same name but with '.seed' suffixed.

Examples of built-in reducers are first(), avg(), and cor(). See the resources and examples below for their usage.

Related resources:

Reducing grouped data:

The example below groups purchaces by customer and then applies the first, avg, and cor reducers. It also demonstrates the use of a seeded two-parameter reducer with the logic in-line, as well as a one-parameter in-line reducer.

purchases
    .group(p => p.customerId) 
    .reduce(({
        customer: $$.first(p => p.customerId), 
        time: $$.avg(p => p.time),
        rating: $$.avg(p => p.rating),
        correlation: $$.cor(p => [p.time, p.rating]),
        timeSum: (acc,next) => acc + next.time,
        ['timeSum.seed']: -10, // eliminate some common time 
        timeMin: (data) => Math.min(...data.map(row => row.time))
    }))
    .log(null,null,1e-8);

This results in an ungrouped array with one row of aggregated results per customer.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customer β”‚ time    β”‚ rating β”‚ correlation β”‚ timeSum β”‚ timeMin β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a        β”‚ 14.5925 β”‚ 82.75  β”‚ -0.99187759 β”‚ 48.37   β”‚ 11.5    β”‚
β”‚ b        β”‚ 18.36   β”‚ 64.5   β”‚ -0.99795664 β”‚ 63.44   β”‚ 14.88   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reducing ungrouped data:

The following example aggregates purchases. This time, it does not first group by customer.

purchases
    .reduce({
        time: $$.avg(p => p.time),
        rating: $$.avg(p => p.rating),
        correlation: $$.cor(p => [p.time, p.rating])
    })
    .log(); 

The result becomes an object, as opposed to an array:

{
  time: 16.47625,
  rating: 73.625,
  correlation: -0.9821574166144001
}

Keeping the Group Level the Same:

In general, one level of grouping is lost on each use of reduce. If this is not desired, reduce has a second parameter: ungroup. This defaults to true, but can be set to false.

The example below is the same as the last, except that it prevents ungrouping:

purchases
    .reduce({
        time: $$.avg(p => p.time),
        rating: $$.avg(p => p.rating),
        correlation: $$.cor(p => [p.time, p.rating])
    }, false)
    .log(); 

This time, the result is still a one-item array:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ time     β”‚ rating β”‚ correlation         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 16.47625 β”‚ 73.625 β”‚ -0.9821574166144001 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reducing to Primitive Values:

The return value of a reduction need not be a dataset of objects having properties.
It can instead return a dataset of primitive values, or a single primitive value.
To do this, return a single reducer not wrapped in an object.

The example below averages the speed property for all purchases.

purchases
    .reduce($$.avg(p => p.time))
    .log();

The result is an integer:

16.47625

Reduce Syntax Limitation:

Reduce will fail if the input function returns an object that does not have a reducer as one of it's properties. In the example below, either of the properties would cause a failure.

purchases
    .reduce({
        speed: $$.avg(p => p.speed) + 10,
        rating: $$.sum(p => p.rating) / $$.count(p.rating)
    })
    .get(); 
// This would fail
window

For each row of a dataset, identifies a subdataset of relative rows, and appends columns representing the subdataset aggregations.

The parameters are an objet with one or more of the following properties:

  • reduce: An object with reducers as properties (see 'reduce'). This is the only required property.
  • group: A function expecting a dataset row as a parameter. The output determines how rows are to be grouped to identify subdataset boundaries.
  • sort: A function expecting a dataset row as a parameter. The output determines how subdataset rows are sorted.
  • filter: A function expecting a dataset row as its first parameter, and returning a boolean value.
  • scroll: A function expecting two integer parameters, and returning a boolean value. The first input paramter (currentIx) is the index of the current row processed in a loop. The second input paramter (compareIx) is the index of a comparison row processed in a nested loop of the subdataset. Using this parameter will impact performance due to the nested looping.

Note that use of window does not preserve the ordering of the resultset.

The example below appends the various columns to the dataset which represent a running row count (n), a time summation (timeSum), a leading rating (rating0), a running row count (nRun), and a running time summation (tRun). The totals are grouped by customerId, meaning they pertain to that given customer and no other. They are based on a sorting by time, and their analysis excludes any price below '100'. The scroll property in the second calling ensures that, for each row, a subdataset of rows preceding the current row (inclusive) are considered for aggregation. This is what produces the 'run' in the 'running' totals.

purchases
    .window({ // the 'standard' windowed totals
        group: p => p.customerId,
        sort: p => p.time,
        filter: p => p.price >= 100,
        reduce: {   
            n: $$.count(p => p),
            timeSum: (accum,p) => accum + p.time,
            rating0: $$.first(p => p.rating)
        }
    })
    .window({ // the 'running' totals
        group: p => p.customerId,
        sort: p => p.time,
        filter: p => p.price >= 100,
        scroll: (currentIx,compareIx) => currentIx >= compareIx,
        reduce: {   
            nRun: $$.count(p => p),
            tRun: (accum,p) => accum + p.time
        }
    })
    .log(null, null, 1e-8);
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚ n β”‚ timeSum β”‚ rating0 β”‚ nRun β”‚ tRun  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚ 4 β”‚ 73.44   β”‚ 88      β”‚ 1    β”‚ 14.88 β”‚
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚ 4 β”‚ 73.44   β”‚ 88      β”‚ 2    β”‚ 31.56 β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚ 4 β”‚ 73.44   β”‚ 88      β”‚ 3    β”‚ 49.67 β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚ 4 β”‚ 73.44   β”‚ 88      β”‚ 4    β”‚ 73.44 β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚ 3 β”‚ 46.87   β”‚ 92      β”‚ 1    β”‚ 12.03 β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚ 3 β”‚ 46.87   β”‚ 92      β”‚ 2    β”‚ 25.78 β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚ 3 β”‚ 46.87   β”‚ 92      β”‚ 3    β”‚ 46.87 β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚   β”‚         β”‚         β”‚      β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Statistical Methods

dimReduce

Takes a number of numeric fields and reduces them to a smaller number of dimensions.

At a future date, this method can toggle between various implementations of principal components or factor analysis. The term 'dimReduce' was chosen as an abstract term that can encompass either of these areas. Also please not that the testing related to this method is different than tests in other methods. It tests broader properties in the result, as opposed to exact figures. This is because the algorighms involved are sensitive and can easily change with changes to the algorithm. This is true not just for this method in particular but for this area of mathematics in general. That's not to say there isn't room for improvement here, of course. That will come with time.

Parameters:

  • csvSelector: A string of the field names that the user desires to reduce in dimension, separated by commas. Note that the goal for the future is that the user can alternatively pass in a function returning an object.
  • options: Optional. An object with properties that aid in the configuration of the analysis.

'Options' has the following properties available:

  • eigenArgs: Arguments passed to the eigen method when producing the eigenvalues. Default is { valueThreshold 1e-12, vectorThreshold: 1e-4, testThreshold: 1e-3 }.
  • maxDims: The maximum mumber of dimensions to extract. Default is null.
  • minEigenVal: = The minimum level an eigenvalue must be to be extracted. Default is 1.
  • rotationMaxIterations: The maximum number of times to iterate when rotating.
    Default is 1000.
  • rotationAngleThreshold: The change in angle below which rotation is considered complete. Defaut is 1e-8.
  • attachData: A boolean indicating whether to return data and append factor scores to that data. Default is false. If true, a new property, 'data' is included in the output.

Return object:

  • correlations: The correlation matrix of the original dimensions.
  • eigenValues: An array of the eigenvalues prouced from the correlation matrix.
  • unrotated: An object with properties of the unrotated dimensions.
  • rotated: An object with properties of the rotated dimensions.
  • data: If attachData = true, then the original data, with the new dimension scores appended. Dimension scores are named 'dim#' where # is the dimension number of the score. Existing columns of the same name will be overwritten. If attachData = false, then this is undefined.
  • log: A method that outputs the analysis details in friendly format. There are three parameters that work the same way as dataset.log() and matrix.log()

'Unrotated' and 'Rotated' outputs have the following structure:

  • loadings: A matrix of the extracted dimension loadings on each original dimension
  • communalities: A matrix of the extracted dimension communalities for each original dimension
  • :sums:: A matrix of any relevant sums. Presently has the sum of the communalities.
  • sumSqs:: A matrix of the sum of squared loadings for each extracted dimension
  • props: A matrix of the proportion of explained variance for each extracted dimension
  • log: A method that outputs the loading details in friendly format

The example below demonstrates a dimension reduction of the numeric properties of the purchases dataset. The 'minEigenVal' was set to a low value simply to produce two dimensions in the output. Likely you will not choose such a low value in real life.

purchases.dimReduce(
    'books, time, price, rating', 
    { minEigenVal: 0.25 } // just for the sake of presenting more than one
)
.log(null, null, 1e-4);
For guidance on how to query dimResults, call "dimResults.help" on the fluent-data object, or see the github wiki for this project

rotated:
╔══════════╀═════════╀═════════╦═════════════╀═════════════╗
β•‘          β”Š dim0    β”Š dim1    β•‘ communality β”Š specificVar β•‘
╠══════════β•ͺ═════════β•ͺ═════════╬═════════════β•ͺ═════════════╣
β•‘ books    β”Š 0.9716  β”Š 0.0243  β•‘ 0.9445      β”Š 0.0555      β•‘
β•‘ time     β”Š 0.9584  β”Š -0.2758 β•‘ 0.9946      β”Š 0.0054      β•‘
β•‘ price    β”Š 0.9359  β”Š 0.3281  β•‘ 0.9835      β”Š 0.0165      β•‘
β•‘ rating   β”Š -0.9396 β”Š 0.3146  β•‘ 0.9818      β”Š 0.0182      β•‘
╠══════════β•ͺ═════════β•ͺ═════════╬═════════════β•ͺ═════════════╣
β•‘ sums     β”Š         β”Š         β•‘ 3.9044      β”Š 0.0956      β•‘
β•‘ sumSqs   β”Š 3.6212  β”Š 0.2832  β•‘             β”Š             β•‘
β•‘ propVars β”Š 0.9275  β”Š 0.0725  β•‘             β”Š             β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•

unrotated:
╔══════════╀═════════╀═════════╦═════════════╀═════════════╗
β•‘          β”Š dim0    β”Š dim1    β•‘ communality β”Š specificVar β•‘
╠══════════β•ͺ═════════β•ͺ═════════╬═════════════β•ͺ═════════════╣
β•‘ books    β”Š 0.9677  β”Š 0.0905  β•‘ 0.9445      β”Š 0.0555      β•‘
β•‘ time     β”Š 0.975   β”Š -0.2098 β•‘ 0.9946      β”Š 0.0054      β•‘
β•‘ price    β”Š 0.9113  β”Š 0.3911  β•‘ 0.9835      β”Š 0.0165      β•‘
β•‘ rating   β”Š -0.9588 β”Š 0.2498  β•‘ 0.9818      β”Š 0.0182      β•‘
╠══════════β•ͺ═════════β•ͺ═════════╬═════════════β•ͺ═════════════╣
β•‘ sums     β”Š         β”Š         β•‘ 3.9044      β”Š 0.0956      β•‘
β•‘ sumSqs   β”Š 3.6369  β”Š 0.2676  β•‘             β”Š             β•‘
β•‘ propVars β”Š 0.9315  β”Š 0.0685  β•‘             β”Š             β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•

eigenValues: [ 3.63687607982, 0.26756319121, 0.08629955021, 0.00926117876 ]

correlations:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        β”‚ books  β”‚ time    β”‚ price   β”‚ rating  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ books  β”‚ 1      β”‚ 0.9245  β”‚ 0.887   β”‚ -0.878  β”‚
β”‚ time   β”‚ 0.9245 β”‚ 1       β”‚ 0.8061  β”‚ -0.9822 β”‚
β”‚ price  β”‚ 0.887  β”‚ 0.8061  β”‚ 1       β”‚ -0.7913 β”‚
β”‚ rating β”‚ -0.878 β”‚ -0.9822 β”‚ -0.7913 β”‚ 1       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
regress

Returns an analysis of multivariate regression between variables.

Parameters:

  • ivSelector: Determines which columns are the independent variables. Use a comma-separated string of property names.
  • dvSelector: Determines which column is the dependent variable. Can be the name of the column.
  • options: An object with properties that configure the analysis and output.

The options parameter recognizes the following properties:

  • attachData: boolean, default = false. If true, the 'data' property returned in the output will include actual, estimated, and residual values when applying the regression model to each row.
  • ci: number, default = undefined. This parameter, if set, should be the quantile desired for the confidence interval around each regression coefficient. It will return a two-item array representing the lower and upper bounds, respectively. If not set, then the ci property for each coefficient will instead be a function that expects a quantile as input and will output such an array.

Return Object:

regress returns an object with the following properties:

  • coefficients: A dataset containing properties of the regression coefficients. There is one row for each coefficient. Each row is an object with the following properties: name, value, stdErr, t, df, pVal, ci.
  • model: an object with the following properties: rSquared, rSquaredAdj, F, pVal.
    If attachData = true, then also breuchPagan and breuchPaganPval.
  • data: if attachData = true, then this is the original data with estimates appended. The estimate properties are 'estimate', 'actual', and 'residual'. Fields in the original data with these names will be overwritten. If attachData = false, this is undefined.
  • log: a method to display the output described above in friendly form. There are three parameters that work the same way as dataset.log() and matrix.log()

Example:

This example runs a regression on the 'purchases' database.

let regression = 
    purchases.regress(
        'books, time', 
        'rating', 
        {ci: 0.95, maxDigits: 4, attachData: true }
    );

regression.log(null, 'Regression Objects:', 1e-6);
regression.data.log(null, '\r\nRegression data:', 1e-6);
-----------------------------------
Regression Objects:


For guidance on how to query regress, call "regress.help" on the fluent-data object, or see the github wiki for this project

coefficients:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ name      β”‚ value      β”‚ stdErr   β”‚ t         β”‚ df β”‚ pVal     β”‚ ci                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ intercept β”‚ 164.851565 β”‚ 9.869433 β”‚ 16.703246 β”‚ 5  β”‚ 0.000997 β”‚ 139.481381,190.221749 β”‚
β”‚ books     β”‚ 2.821533   β”‚ 2.740061 β”‚ 1.029733  β”‚ 5  β”‚ 0.483385 β”‚ -4.222019,9.865085    β”‚
β”‚ time      β”‚ -6.072004  β”‚ 1.037281 β”‚ -5.85377  β”‚ 5  β”‚ 0.020076 β”‚ -8.73842,-3.405588    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

model: {
  rSquared: 0.970821,
  rSquaredAdj: 0.95915,
  F: 83.17851,
  pVal: 0.000145,
  breuchPagan: 2.12636,
  breuchPaganPval: 0.345356
}


Note: Data has been output with estimates attached.  Query "data" on the return object to get it.  
-----------------------------------

Regression data:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚ estimate  β”‚ actual β”‚ residual  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚ 74.85667  β”‚ 73     β”‚ -1.85667  β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚ 97.845052 β”‚ 95     β”‚ -2.845052 β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚ 94.62689  β”‚ 92     β”‚ -2.62689  β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚ 80.143212 β”‚ 88     β”‚ 7.856788  β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚ 89.826109 β”‚ 90     β”‚ 0.173891  β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚ 66.173705 β”‚ 66     β”‚ -0.173705 β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚ 50.900666 β”‚ 54     β”‚ 3.099334  β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚ 34.627695 β”‚ 31     β”‚ -3.627695 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The breuchPagan properties represent tests of uniformity of variance.

standardize

Converts values into standard (z) scores.

Parameters:

  • obj:: An object of functions that accept a dataset row as a parameter and output a number. This parameter is required.
  • isSample:: A boolean indicating whether the sample or population standard deviation should be used to calculate the z-scores. This parameter is optional, the default is 'false'.

The example below takes the purchases dataset, removes some fields, and calcuates standard scores for the time and ratings fields.

purchases
    .map(p => ({ ...p, books: undefined, price: undefined}))
    .standardize({
        zTime: p => p.time,
        zRating: p => p.rating
    })
    .log();
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ time  β”‚ rating β”‚ zTime                β”‚ zRating               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 16.68 β”‚ 73     β”‚ 0.050215204428030694 β”‚ -0.029753999243460543 β”‚
β”‚ a          β”‚ 11.5  β”‚ 95     β”‚ -1.2264216492514772  β”‚ 1.0175867741263505    β”‚
β”‚ a          β”‚ 12.03 β”‚ 92     β”‚ -1.0958005039908327  β”‚ 0.87476757775774      β”‚
β”‚ b          β”‚ 14.88 β”‚ 88     β”‚ -0.39340377947604516 β”‚ 0.6843419825995924    β”‚
β”‚ a          β”‚ 13.75 β”‚ 90     β”‚ -0.671897919371382   β”‚ 0.7795547801786662    β”‚
β”‚ b          β”‚ 18.11 β”‚ 66     β”‚ 0.4026458416407133   β”‚ -0.3629987907702186   β”‚
β”‚ a          β”‚ 21.09 β”‚ 54     β”‚ 1.1370817149930172   β”‚ -0.9342755762446611   β”‚
β”‚ b          β”‚ 23.77 β”‚ 31     β”‚ 1.7975810910279748   β”‚ -2.029222748404009    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Outputting Methods

A dataset itself is more like a processor than a final set of data. But ultimately you will want to print out the data or read out it's real state into a variable. Methods in this section describe how to do that.

get

get processes the stored commands of a dataset and returns an array of objects. The get() function will omit any records or properties that are undefined.

Accepts an optional one-parameter function that invokes a mapping before producing the array. If omitted, the full results of the dataset will be returned.

The example below converts numeric rating values to descriptive labels, then outputs the state of data as an array.

let result = 
    purchases
    .get(p => ({
        customerId: p.customerId,
        rating: p.rating,
        flag: p.rating < 60 ? 'bad' : p.rating < 90 ? 'okay' : 'good'
    }));

console.log(result);
[
  { customerId: 'b', rating: 73, flag: 'okay' },
  { customerId: 'a', rating: 95, flag: 'good' },
  { customerId: 'a', rating: 92, flag: 'good' },
  { customerId: 'b', rating: 88, flag: 'okay' },
  { customerId: 'a', rating: 90, flag: 'good' },
  { customerId: 'b', rating: 66, flag: 'okay' },
  { customerId: 'a', rating: 54, flag: 'bad' },
  { customerId: 'b', rating: 31, flag: 'bad' },
  key: 'null'
]
log

Output data to the console in a friendly table form that converts row property names to headers. After logging, the method returns the matrix that called it so that the fluent syntax chain can continue.

Parameters:

  • element: An html element to print to. Use the same selector syntax as you would use in document.querySelector(). Be sure the element is one in which it makes sense to append a div to. If null (default), prints to the console.
  • caption: A string representing a title for the output. If null (default), does not print out a caption.
  • mapper: A function giving a final mapping of the rows before output. If null (default), x => x is ultimately passed. Alternatively, a number representing the multiple to which output should round numbers.
  • limit: An integer (default = 50). The maximum number of rows to be printed.
purchases
    .log(null, 'pre-mapped')
    .map(p => ({
        customerId: p.customerId,
        time: p.time,
        rating: p.rating,
        flag: p.rating < 60 ? 'bad' : p.rating < 90 ? 'okay' : 'good'
    }))
    .log(null, 'post-mapped', 1); // '1' indicates rounding to multiple of 1 (integer)
pre-mapped
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ books β”‚ time  β”‚ price β”‚ rating β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 4     β”‚ 16.68 β”‚ 560   β”‚ 73     β”‚
β”‚ a          β”‚ 1     β”‚ 11.5  β”‚ 80    β”‚ 95     β”‚
β”‚ a          β”‚ 1     β”‚ 12.03 β”‚ 150   β”‚ 92     β”‚
β”‚ b          β”‚ 2     β”‚ 14.88 β”‚ 220   β”‚ 88     β”‚
β”‚ a          β”‚ 3     β”‚ 13.75 β”‚ 340   β”‚ 90     β”‚
β”‚ b          β”‚ 4     β”‚ 18.11 β”‚ 330   β”‚ 66     β”‚
β”‚ a          β”‚ 5     β”‚ 21.09 β”‚ 401   β”‚ 54     β”‚
β”‚ b          β”‚ 5     β”‚ 23.77 β”‚ 589   β”‚ 31     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
post-mapped
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
β”‚ customerId β”‚ time β”‚ rating β”‚ flag β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ b          β”‚ 17   β”‚ 73     β”‚ okay β”‚
β”‚ a          β”‚ 12   β”‚ 95     β”‚ good β”‚
β”‚ a          β”‚ 12   β”‚ 92     β”‚ good β”‚
β”‚ b          β”‚ 15   β”‚ 88     β”‚ okay β”‚
β”‚ a          β”‚ 14   β”‚ 90     β”‚ good β”‚
β”‚ b          β”‚ 18   β”‚ 66     β”‚ okay β”‚
β”‚ a          β”‚ 21   β”‚ 54     β”‚ bad  β”‚
β”‚ b          β”‚ 24   β”‚ 31     β”‚ bad  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

Communication Between Layers

toJsonString

Converts a dataset to a JSON string.

The method allows two optional parameters, replacer and space, which are passed to the underlying JSON.stringify function and so work the same way.

This method is designed to be invertible by using fromJson(). If the structure of a dataset changes in the future, such as by adding another property, your code can break if you're not accounting for it. Using toJson() and fromJson() keeps your code more stable.

The example below manipulates customers and then converts it to JSON, with some prettification.

let json = 
    $$(students)
    .map(c => ({...c, initial: c.name.substring(0,1)}))
    .toJsonString(null, 4);

console.log(json);
[
    {
        "id": "a",
        "name": "Andrea",
        "topic": "Abelard",
        "bias": "analytic",
        "initial": "A"
    },
    {
        "id": "b",
        "name": "Brielle",
        "topic": "Bentham",
        "bias": "buddhist",
        "initial": "B"
    }
]
fromJson

Creates a dataset from a JSON object or from a JSON string.

This method is designed to be used in tandem with toJson.

This example takes a json string, instantiates a dataset with it, then manipulates the data a bit.

let json = `[
    {
        "id": "a",
        "name": "Andrea",
        "topic": "Abelard",
        "bias": "analytic",
        "initial": "A"
    },
    {
        "id": "b",
        "name": "Brielle",
        "topic": "Bentham",
        "bias": "buddhist",
        "initial": "B"
    }
]`;

$$.dataset.fromJson(json)
    .map(s => ({ ...s, initial: s.initial.toLowerCase() }))
    .log();
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ name    β”‚ topic   β”‚ bias     β”‚ initial β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ a  β”‚ Andrea  β”‚ Abelard β”‚ analytic β”‚ a       β”‚
β”‚ b  β”‚ Brielle β”‚ Bentham β”‚ buddhist β”‚ b       β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

However, keep in mind that you'll never really want to instantiate from a JSON string directly. Rather, you'll want to work with a string created from dataset.toString(), likely created due to the need to transfer the data through a layer.

Guide: Server and Client Communication

To get dataset instances from the client to the server and back, use toJsonString() and fromJson().

Imangine a JSON Sender:

// _jsonSender.r.js
async function serve(req, res) {
    let data = await sample(); // gets some sample datasets.
    let json = $$(data.customers).toJson();
    res.writeHead(200, 'ok');
    res.end(json);
}

It can be fetched and rebuilt on the client quite easily:

let ds = await 
    fetch('._/jsonSender.r.js')
    .then(resp => $$.dataset.fromJson(resp));

Imagine a JSON reciever:

// _jsonReciever.r.js
async function serve(req, res) {

    let json = '';
    req.on('data', chunk => json += chunk);

    req.on('end', () => {
        let ds = $$.dataset.fromJson(json);
        res.writeHead(200, 'ok');
        res.end(
            (ds.get().length > 0) 
            ? 'got it' 
            : 'nope'
        );
    });

}

It can be posted to quite easily:

return await 
    fetch('./_jsonReciever.r.js', { body: json, method: 'post' })
    .then(resp => resp.text()) 
    .then(result => {
        if (result !== 'got it')
            throw `${prefix}: test did not pass on server.`;
    });