- See PropertyUtils.jl for the functions/macros this package is based on.
using DataFrames, StatsBase, Telperion
df = DataFrame(y=rand(100), a=1:100, b=randn(100), c=randn(100), d=rand(1:5, 100))
x, y = @xy df log.(y) ~ 1 + a + zscore(b) + abs.(sin.(c)) + dummy(d)
xjulia> x
OrderedDict{String,Any} with 8 entries:
"1" => [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
"a" => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
"zscore(b)" => [1.13036, -0.280105, 2.29973, -0.267989, -0.240071, -0.797709, -0.315514, -0.322103, 0.0217353, -1.67589 … 1.45323, -0.363556, -0.650576, -1.543…
"abs.(sin.(c))" => [0.753822, 0.992965, 0.41306, 0.733578, 0.21487, 0.958583, 0.163681, 0.238074, 0.166078, 0.920199 … 0.407876, 0.277916, 0.0207317, 0.572013, 0.2…
"dummy(d) [2]" => Bool[0, 0, 0, 0, 0, 0, 1, 0, 0, 0 … 0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
"dummy(d) [3]" => Bool[0, 0, 0, 0, 0, 1, 0, 1, 0, 1 … 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
"dummy(d) [4]" => Bool[0, 1, 0, 1, 0, 0, 0, 0, 0, 0 … 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]
"dummy(d) [5]" => Bool[1, 0, 1, 0, 1, 0, 0, 0, 0, 0 … 1, 0, 0, 1, 0, 0, 0, 0, 0, 1]
(then create the data matrix with reduce(hcat, values(x)))
I wanted to try my own take on StatsModels where each term is generated by valid Julia code rather than a DSL. The formula syntax is the same (e.g. y ~ 1 + term2 + term3) .
- Numbers are the only thing given special treatment: They are turned into vectors e.g.
1-->fill(1, size(df, 1))
- Simplicity (this README has roughly the same number of lines of code).
- Terms can be any Julia code that creates:
- An
AbstractVectoror iterable of the correct length. - An
OrderedDictofAbstractVector/iterables (for terms that create multiple columns)
- An
- Works out of the box with many data structures.
using IndexedTables
t = table((x=rand(10), y=rand(10)))
x, y = @xy rows(t) y ~ 1 + xI would not have been able to write this package without the existence of StatsModels.jl, DataFramesMeta.jl, or StatsPlots.jl, which are all fantastic.