# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Dec 5, 2017**

A brief introduction to basic usage of `DataFrames`. Tested under `DataFrames` master on 2017-12-05.
I will try to keep it up to date as the package evolves.

In [1]:
using DataFrames # load package

## Getting basic information about a data frame

In [2]:
x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"], D = [1, "a"])

Unnamed: 0,A,B,C,D
1,1,1.0,a,1
2,2,missing,b,a


In [3]:
size(x), size(x, 1), size(x, 2)

((2, 4), 2, 4)

In [4]:
nrow(x), ncol(x), length(x)

(2, 4, 4)

In [5]:
describe(x)

A
Summary Stats:
Mean:           1.500000
Minimum:        1.000000
1st Quartile:   1.250000
Median:         1.500000
3rd Quartile:   1.750000
Maximum:        2.000000
Length:         2
Type:           Int64

B
Summary Stats:
Mean:           1.000000
Minimum:        1.000000
1st Quartile:   1.000000
Median:         1.000000
3rd Quartile:   1.000000
Maximum:        1.000000
Length:         2
Type:           Union{Float64, Missings.Missing}
Number Missing: 1
% Missing:      50.000000

C
Summary Stats:
Length:         2
Type:           String
Number Unique:  2

D
Summary Stats:
Length:         2
Type:           Any
Number Unique:  2
Number Missing: 0
% Missing:      0.000000



In [6]:
showcols(x)

2×4 DataFrames.DataFrame
│ Col # │ Name │ Eltype                           │ Missing │ Values          │
├───────┼──────┼──────────────────────────────────┼─────────┼─────────────────┤
│ 1     │ A    │ Int64                            │ 0       │ 1  …  2         │
│ 2     │ B    │ Union{Float64, Missings.Missing} │ 1       │ 1.0  …  missing │
│ 3     │ C    │ String                           │ 0       │ a  …  b         │
│ 4     │ D    │ Any                              │ 0       │ 1  …  a         │

In [7]:
names(x)

4-element Array{Symbol,1}:
 :A
 :B
 :C
 :D

In [8]:
eltypes(x)

4-element Array{Type,1}:
 Int64                           
 Union{Float64, Missings.Missing}
 String                          
 Any                             

In [9]:
y = DataFrame(rand(1:10, 20, 10))

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
1,4,10,4,3,6,3,2,4,8,4
2,10,2,4,10,8,7,8,9,2,6
3,5,9,4,8,6,1,2,7,3,8
4,9,10,10,5,4,8,8,10,4,9
5,5,10,3,6,6,3,7,4,1,9
6,4,6,8,7,7,10,10,4,3,1
7,2,10,7,2,8,8,3,5,1,5
8,3,5,6,1,1,2,9,1,8,6
9,4,10,8,4,2,1,3,9,3,10
10,6,4,4,1,3,2,7,6,4,1


In [10]:
head(y)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
1,4,10,4,3,6,3,2,4,8,4
2,10,2,4,10,8,7,8,9,2,6
3,5,9,4,8,6,1,2,7,3,8
4,9,10,10,5,4,8,8,10,4,9
5,5,10,3,6,6,3,7,4,1,9
6,4,6,8,7,7,10,10,4,3,1


In [11]:
tail(y, 3)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
1,6,2,8,8,7,10,7,10,2,7
2,5,2,3,8,4,7,5,3,2,9
3,7,7,1,1,10,1,3,7,10,7


### Most elementary get and set operations

In [12]:
x[1], x[:A], x[:, 1] # get one column

([1, 2], [1, 2], [1, 2])

In [13]:
x[1, :] # get one row

Unnamed: 0,A,B,C,D
1,1,1.0,a,1


In [14]:
x[1, 1] # get one cell

1

In [15]:
x[1:2, 1:2] = 1 # assignment can be done in ranges by a scalar
x

Unnamed: 0,A,B,C,D
1,1,1.0,a,1
2,1,1.0,b,a


In [16]:
x[1:2, 1:2] = [1,2] # by a vector of length equal to number of rows
x

Unnamed: 0,A,B,C,D
1,1,1.0,a,1
2,2,2.0,b,a


In [17]:
x[1:2, 1:2] = DataFrame([5 6; 7 8]) # by another data frame of matching size 
x

Unnamed: 0,A,B,C,D
1,5,6.0,a,1
2,7,8.0,b,a
