# Manipulating DataFrames

In [55]:
using DataFrames

## Getting Started

In [3]:
using RDatasets
using GZip


LoadError: [91mArgumentError: Module RDatasets not found in current path.
Run `Pkg.add("RDatasets")` to install the RDatasets package.[39m

In [5]:
# Import Boston housing data
df = CSV.read(
    GZip.gzopen(joinpath(Pkg.dir("RDatasets"),"data","MASS","Boston.csv.gz")),
    DataFrames.DataFrame,
    );

In [6]:
# Explore header
head(df)

Unnamed: 0,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax,PTRatio,Black,LStat,MedV
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
6,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7


In [7]:
# Column names
names(df)

14-element Array{Symbol,1}:
 :Crim   
 :Zn     
 :Indus  
 :Chas   
 :NOx    
 :Rm     
 :Age    
 :Dis    
 :Rad    
 :Tax    
 :PTRatio
 :Black  
 :LStat  
 :MedV   

# Creating DataFrames

In [41]:
#Empty without types nor labels
df = DataFrame()

In [42]:
#Empty, but labeled and type column
df1 = DataFrame(A = Vector{Int}())

Unnamed: 0,A


In [46]:

#with some elements
df2 = DataFrame(A = rand(1:10, 10))
df3 = DataFrame(B = rand(1:10, 10), C = rand(1:10, 10))


Unnamed: 0,B,C
1,7,3
2,7,8
3,3,10
4,8,7
5,5,5
6,9,6
7,1,10
8,9,1
9,5,1
10,8,5


# Appending

In [52]:
# vcat
df1_2 = [df1; df2]
df1_2_vcat = vcat(df1, df2)

Unnamed: 0,A
1,9
2,9
3,1
4,9
5,6
6,8
7,6
8,3
9,4
10,4


In [54]:
# hcat
df2_3 = [df2 df3]
df2_3_hcat = hcat(df2, df3)

Unnamed: 0,A,B,C
1,9,7,3
2,9,7,8
3,1,3,10
4,9,8,7
5,6,5,5
6,8,9,6
7,6,1,10
8,3,9,1
9,4,5,1
10,4,8,5


Unnamed: 0,A,B,C
1,9,7,3
2,9,7,8
3,1,3,10
4,9,8,7
5,6,5,5
6,8,9,6
7,6,1,10
8,3,9,1
9,4,5,1
10,4,8,5


In [26]:
rand(1:10, 10)

10-element Array{Int64,1}:
 10
  5
  2
  4
  4
  5
  5
 10
  1
  2

In [18]:
?rand

search: [1mr[22m[1ma[22m[1mn[22m[1md[22m [1mr[22m[1ma[22m[1mn[22m[1md[22mn [1mr[22m[1ma[22m[1mn[22m[1md[22m! [1mr[22m[1ma[22m[1mn[22m[1md[22mn! [1mr[22m[1ma[22m[1mn[22m[1md[22mexp [1mr[22m[1ma[22m[1mn[22m[1md[22mperm [1mr[22m[1ma[22m[1mn[22m[1md[22mjump [1mr[22m[1ma[22m[1mn[22m[1md[22mexp! [1mr[22m[1ma[22m[1mn[22m[1md[22mcycle



```
rand([rng=GLOBAL_RNG], [S], [dims...])
```

Pick a random element or array of random elements from the set of values specified by `S`; `S` can be

  * an indexable collection (for example `1:n` or `['x','y','z']`), or
  * a type: the set of values to pick from is then equivalent to `typemin(S):typemax(S)` for integers (this is not applicable to [`BigInt`](@ref)), and to $[0, 1)$ for floating point numbers;

`S` defaults to [`Float64`](@ref).
