Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
library for working with tabular data in Julia

This branch is 647 commits behind JuliaStats:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
benchmarks
demo
doc/sections
spec
src
test
.gitignore
CONTRIBUTING.md
LICENSE.md
README.md
REQUIRE
TODO.md
run_benchmarks.jl
run_tests.jl

README.md

DataFrames.jl

Package for working with tabular data in Julia using DataFrame's.

Installation

DataFrames.jl is now an installable package.

To install DataFrames.jl, use the following:

Pkg.add("DataFrames")

DataFrames.jl has one main module named DataFrames. You can load it as:

using DataFrames

Features

  • DataFrame for efficient tabular storage of two-dimensional data
  • Minimized data copying
  • Default columns can handle missing values (NA's) of any type
  • PooledDataFrame for efficient storage of factor-like arrays for characters, integers, and other types
  • Flexible indexing
  • SubDataFrame for efficient subset referencing without copies
  • Grouping operations inspired by plyr, pandas, and data.table
  • Basic merge functionality
  • stack and unstack for long/wide conversions
  • Pipelining support (|) for many operations
  • Several typical R-style functions, including head, tail, describe, unique, duplicated, with, within, and more
  • Formula and design matrix implementation

Demos

Here's a minimal demo showing some grouping operations:

julia> using DataFrames

julia> d = DataFrame(quote     # expressions are one way to create a DataFrame
           x = randn(10)
           y = randn(10)
           i = rand(1:3,10)
           j = rand(1:3,10)
       end);

julia> dump(d)    # dump() is like R's str()
DataFrame  10 observations of 4 variables
  x: DataArray{Float64,1}(10) [-0.22496343871037897,-0.4033933555989207,0.6027847717547058,0.06671669747901597]
  y: DataArray{Float64,1}(10) [0.21904975091285417,-1.3275512477731726,2.266353546459277,-0.19840910239041679]
  i: DataArray{Int64,1}(10) [2,1,3,1]
  j: DataArray{Int64,1}(10) [3,2,1,2]

julia> head(d)
6x4 DataFrame:
                x         y i j
[1,]    -0.224963   0.21905 2 3
[2,]    -0.403393  -1.32755 1 2
[3,]     0.602785   2.26635 3 1
[4,]    0.0667167 -0.198409 1 2
[5,]      1.68303  -1.11183 1 3
[6,]     0.346034   1.68227 2 1

julia> d[1:3, ["x","y"]]     # indexing is similar to R's
3x2 DataFrame
                x        y
[1,]    -0.224963  0.21905
[2,]    -0.403393 -1.32755
[3,]     0.602785  2.26635

julia> # Group on column i, and pipe (|) that result to an expression
julia> # that creates the column x_sum.
julia> groupby(d, "i") | :(x_sum = sum(x))
3x2 DataFrame
        i    x_sum
[1,]    1  2.06822
[2,]    2 -1.80867
[3,]    3 0.319517

julia> groupby(d, "i") | :sum   # Another way to operate on a grouping
3x4 DataFrame
        i    x_sum    y_sum j_sum
[1,]    1  2.06822 -2.73985     8
[2,]    2 -1.80867  1.83489     7
[3,]    3 0.319517  1.03072     2

See demo/workflow_demo.jl for a basic demo of the parts of a Julian data workflow.

See demo/design_demo.jl for a more in-depth demo of DataFrame and related types and library.

Documentation

Development work

The Issues highlight a number of issues and ideas for enhancements. Here are some particular enhancements under way or under discussion:

Possible changes to Julia

DataFrames fit well with Julia's syntax, but some features would improve the user experience, including keyword function arguments (Julia issue 485), "~" for easier expression syntax, and overloading "." for easier column access (df.colA). See here for a bit more information.

Current status

Please consider this a development preview. Many things work, but expect some rough edges. We hope that this can become a standard Julia package.

Something went wrong with that request. Please try again.