Skip to content

Built In Reducers

paulwilcox edited this page Jul 11, 2021 · 3 revisions

Home
Dataset


Every reducer has a 'rowShaper' function as it's first parameter. Some reducers have this as the only parameter, others have additional parameters.

Setup

Below, the fluent-data library is loaded into the variable $$ and a dataset is constructed with an array argument. This datasets is used by the examples in this page.

let $$ = require('./dist/fluent-data.server.js');

let purchases = $$([
    { customerId: 'b', books: 4, time: 16.68, price: 560, rating: 73 },
    { customerId: 'a', books: 1, time: 11.50, price:  80, rating: 95 },
    { customerId: 'a', books: 1, time: 12.03, price: 150, rating: 92 },
    { customerId: 'b', books: 2, time: 14.88, price: 220, rating: 88 },
    { customerId: 'a', books: 3, time: 13.75, price: 340, rating: 90 },
    { customerId: 'b', books: 4, time: 18.11, price: 330, rating: 66 },
    { customerId: 'a', books: 5, time: 21.09, price: 401, rating: 54 },
    { customerId: 'b', books: 5, time: 23.77, price: 589, rating: 31 }
]);

Simple Reducers

avg

Gets an average of values.

purchases
    .group(p => p.customerId)
    .reduce({
        customerId: $$.first(p => p.customerId),
        avg: $$.avg(p => p.price)
    })
    .log();
┌────────────┬────────┐
│ customerId │ avg    │
├────────────┼────────┤
│ a          │ 242.75 │
│ b          │ 424.75 │
└────────────┴────────┘
count

Gets the number of non-null instances of a set of values.

purchases
    .group(p => p.customerId)
    .reduce({
        customerId: $$.first(p => p.customerId),
        purchases: $$.count(p => p)
    })
    .log();
┌────────────┬───────────┐
│ customerId │ purchases │
├────────────┼───────────┤
│ a          │ 4         │
│ b          │ 4         │
└────────────┴───────────┘
first

Gets the first occurring non-null value.

purchases
    .sort(p => p.rating)
    .reduce({ 
        first: $$.first(p => p.customerId) 
    })
    .log();
{
  first: "b"
}
last

Gets the last occurring non-null value.

purchases
    .sort(p => p.rating)
    .reduce({ 
        last: $$.last(p => p.customerId) 
    })
    .log();
{
  last: "a"
}
mad

Gets the Mean Absolute Deviation of a set of values.

purchases
    .reduce({
        mad: $$.mad(p => p.time)
    })
    .log();
{
  mad: 3.43625
}
sum

Gets a total of values.

purchases
    .group(p => p.customerId)
    .reduce({
        customerId: $$.first(p => p.customerId),
        sum: $$.sum(p => p.books)
    })
    .log();
┌────────────┬─────┐
│ customerId │ sum │
├────────────┼─────┤
│ a          │ 10  │
│ b          │ 15  │
└────────────┴─────┘
std

Gets the Standard Deviation of a set of values.

This reducer has a second boolean parameter isSample. If set to 'true', then the sample standard deviation is returned. If set to 'false', then the population standard deviation is returned. The default is 'false'.

purchases
    .reduce({
        popSTD: $$.std(p => p.time),
        sampleSTD: $$.std(p => p.time, true)
    })
    .log();
{
  popSTD: 4.057536005693604,
  sampleSTD: 4.337688447944201
}

Statistical Reducers

cor

Returns the correlation between two sets of values.

A second optional parameter expects an object with properties providing options. At present, only one option is available: 'tails', which can be set to '1' or '2'.

If no options are passed, cor() returns a scalar value representing the correlation. If the tail option is set, then cor() returns a complex object giving not only the correlation but other information relating to it.

purchases
    .reduce({
        cor: $$.cor(p => [p.books, p.time]),
        corDetails: $$.cor(p => [p.books, p.time], { tails: 2 })
    })
    .log();    
{
  cor: 0.9244665165964683,
  corDetails: {
    cor: 0.9244665165964683,
    pVal: 0.0010172440538994687,
    n: 8,
    df: 6,
    t: 5.939390427820604
  }
}
corMatrix

Creates a correlation matrix between variables. Pass comma separated property names or a function expecting a dataset row and returning an array.

Unlike covMatrix(), there is no isSample parameter becuase for a correlation the population and sample calculations are equivalent.

let results = 
    purchases
    .reduce({
        corMatrix: $$.corMatrix('books, price, time'),
        corMatrix2: $$.corMatrix(p => [p.books, p.price, p.time])
    })
    .get();
    
results.corMatrix.log(null, 'corMatrix', 1e-8)
results.corMatrix2.log(null, 'corMatrix2', 1e-8);
corMatrix
┌───────┬────────────┬────────────┬────────────┐
│       │ books      │ price      │ time       │
├───────┼────────────┼────────────┼────────────┤
│ books │ 1          │ 0.88703366 │ 0.92446652 │
│ price │ 0.88703366 │ 1          │ 0.80613464 │
│ time  │ 0.92446652 │ 0.80613464 │ 1          │
└───────┴────────────┴────────────┴────────────┘
corMatrix2
┌────┬────────────┬────────────┬────────────┐
│    │ c0         │ c1         │ c2         │
├────┼────────────┼────────────┼────────────┤
│ c0 │ 1          │ 0.88703366 │ 0.92446652 │
│ c1 │ 0.88703366 │ 1          │ 0.80613464 │
│ c2 │ 0.92446652 │ 0.80613464 │ 1          │
└────┴────────────┴────────────┴────────────┘
corMatrix

Creates a covariance matrix between variables. Pass comma separated property names or a function expecting a dataset row and returning an array.

The second optional boolean parameter, isSample, produces a sample covariance matrix if set to true, and if false (default) produces a population matrix.

let results = 
    purchases
    .reduce({
        covMatrixPop: $$.covMatrix('books, price, time'),
        covMatrixSam: $$.covMatrix(p => [p.books, p.price, p.time], true)
    })
    .get();
    
results.covMatrixPop.log(null, 'covMatrixPop', 1e-8)
results.covMatrixSam.log(null, 'covMatrixSam', 1e-8);
covMatrixPop
┌───────┬────────────┬─────────────┬─────────────┐
│       │ books      │ price       │ time        │
├───────┼────────────┼─────────────┼─────────────┤
│ books │ 2.359375   │ 232.03125   │ 5.76171875  │
│ price │ 232.03125  │ 29001.1875  │ 557.0290625 │
│ time  │ 5.76171875 │ 557.0290625 │ 16.46359844 │
└───────┴────────────┴─────────────┴─────────────┘
covMatrixSam
┌────┬──────────────┬────────────────┬──────────────┐
│    │ c0           │ c1             │ c2           │
├────┼──────────────┼────────────────┼──────────────┤
│ c0 │ 2.69642857   │ 265.17857143   │ 6.58482143   │
│ c1 │ 265.17857143 │ 33144.21428571 │ 636.60464286 │
│ c2 │ 6.58482143   │ 636.60464286   │ 18.81554107  │
└────┴──────────────┴────────────────┴──────────────┘
dimReduce

This method uses the same logic as dataset.dimReduce, except that it operates as a reducer inside of the reduce method. This allows dimReduce to operate simultaneously alongside other reducers.

This is the example provided in the wiki for dataset.dimReduce:

purchases.dimReduce(
    'books, time, price, rating', 
    { minEigenVal: 0.25 } // just for the sake of presenting more than one
)
.log(null, null, 1e-4);

This is how to get the same result from fluentData.dimReduce:

purchases.reduce({
    dimReduced: $$.dimReduce(
        'books, time, price, rating', 
        { minEigenVal: 0.25 } // just for the sake of presenting more than one
    ),
    otherLogic: $$.first(p => p.price)
})
.get()
.dimReduced
.log(null, null, 1e-4);
regress

This method uses the same logic as dataset.regress, except that it operates as a reducer inside of the reduce method. This allows regress to operate simultaneously alongside other reducers.

This is the example provided in the wiki for dataset.regress (except that the output is logged right away, instead of passed to a variable):

purchases.regress(
    'books, time', 
    'rating', 
    {ci: 0.95, maxDigits: 4, attachData: true }
)
.log(null, null, 1e-6);

This is how to get the same result from fluentData.regress:

purchases.reduce({
    regressed: $$.regress(
        'books, time', 
        'rating', 
        {ci: 0.95, maxDigits: 4, attachData: true }
    ),
    otherLogic: $$.first(p => p.time)
})
.get()
.regressed
.log(null, null, 1e-6);

The breuchPagan properties represent tests of uniformity of variance.