# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Dec 5, 2017**

A brief introduction to basic usage of `DataFrames`. Tested under `DataFrames` master on 2017-12-05.
I will try to keep it up to date as the package evolves.

In [1]:
using DataFrames # load package

## Load and save DataFrames

In [2]:
using CSV # reading and writing CSV files
using JLD # Julia native binary format

In [3]:
x = DataFrame(A=[true, false, true], B=[1,2,missing],
              C=[missing, "b", "c"], D=['a', missing, 'c']) # create a simple DataFrame for testing purposes


Unnamed: 0,A,B,C,D
1,True,1,missing,'a'
2,False,2,b,missing
3,True,missing,c,'c'


In [4]:
CSV.write("x.csv", x)

CSV.Sink{DateFormat{Symbol("yyyy-mm-dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}},DataType}(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        null: ""
        dateformat: dateformat"yyyy-mm-dd"
        decimal: '.'
        truestring: 'true'
        falsestring: 'false', IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1), "x.csv", 8, true, String["A", "B", "C", "D"], 4, false, Val{false})

In [5]:
y = CSV.read("x.csv")

Unnamed: 0,A,B,C,D
1,True,1,missing,a
2,False,2,b,missing
3,True,missing,c,c


In [6]:
eltypes(y) # notice that by default WeakRefString is used for efficiency

4-element Array{Type,1}:
 Bool                                         
 Union{Int64, Missings.Missing}               
 Union{Missings.Missing, WeakRefString{UInt8}}
 Union{Missings.Missing, WeakRefString{UInt8}}

In [7]:
save("x.jld", "x", x)

In [8]:
y = load("x.jld", "x") # this is identical to x

Unnamed: 0,A,B,C,D
1,True,1,missing,'a'
2,False,2,b,missing
3,True,missing,c,'c'


In [9]:
eltypes(y)

4-element Array{Type,1}:
 Bool                           
 Union{Int64, Missings.Missing} 
 Union{Missings.Missing, String}
 Union{Char, Missings.Missing}  

In [10]:
bigdf = DataFrame(Bool, 10^3, 10^2) # 10^3 rows, 10^5 columns
@time CSV.write("bigdf.csv", bigdf)
@time save("bigdf.jld", "bigdf", bigdf)
getfield.(stat.(["bigdf.csv", "bigdf.jld"]), :size) #  you can expect JLD to be faster, use compress=true to reduce file size

  0.687541 seconds (687.00 k allocations: 30.795 MiB, 2.37% gc time)
  0.023093 seconds (203.74 k allocations: 3.345 MiB)


2-element Array{Int64,1}:
 594055
 154487