## File I/O: Reading from /writing to a datafile

In [5]:
using Pkg
Pkg.add("DataFrames")
using DataFrames

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.0/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


In [7]:
?readdlm

search: [0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1md[22m[0m[1md[22m[0m[1ml[22m[0m[1mm[22m [0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1md[22m[0m[1md[22mir



```
readdlm(source, T::Type; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `\n`.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [5; 6; 7; 8];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end;

julia> readdlm("delim_file.txt", Int64)
4×2 Array{Int64,2}:
 1  5
 2  6
 3  7
 4  8

julia> readdlm("delim_file.txt", Float64)
4×2 Array{Float64,2}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar, T::Type; options...)
```

The end of line delimiter is taken as `\n`.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [1.1; 2.2; 3.3; 4.4];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y], ',')
       end;

julia> readdlm("delim_file.txt", ',', Float64)
4×2 Array{Float64,2}:
 1.0  1.1
 2.0  2.2
 3.0  3.3
 4.0  4.4

julia> rm("delim_file.txt")
```

---

```
readdlm(source; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `\n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = ["a"; "b"; "c"; "d"];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end;

julia> readdlm("delim_file.txt")
4×2 Array{Any,2}:
 1  "a"
 2  "b"
 3  "c"
 4  "d"

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar; options...)
```

The end of line delimiter is taken as `\n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [1.1; 2.2; 3.3; 4.4];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y], ',')
       end;

julia> readdlm("delim_file.txt", ',')
4×2 Array{Float64,2}:
 1.0  1.1
 2.0  2.2
 3.0  3.3
 4.0  4.4

julia> rm("delim_file.txt")

julia> z = ["a"; "b"; "c"; "d"];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x z], ',')
       end;

julia> readdlm("delim_file.txt", ',')
4×2 Array{Any,2}:
 1  "a"
 2  "b"
 3  "c"
 4  "d"

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar, eol::AbstractChar; options...)
```

If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

---

```
readdlm(source, delim::AbstractChar, T::Type, eol::AbstractChar; header=false, skipstart=0, skipblanks=true, use_mmap, quotes=true, dims, comments=false, comment_char='#')
```

Read a matrix from the source where each line (separated by `eol`) gives one row, with elements separated by the given delimiter. The source can be a text file, stream or byte array. Memory mapped files can be used by passing the byte array representation of the mapped segment as source.

If `T` is a numeric type, the result is an array of that type, with any non-numeric elements as `NaN` for floating-point types, or zero. Other useful values of `T` include `String`, `AbstractString`, and `Any`.

If `header` is `true`, the first row of data will be read as header and the tuple `(data_cells, header_cells)` is returned instead of only `data_cells`.

Specifying `skipstart` will ignore the corresponding number of initial lines from the input.

If `skipblanks` is `true`, blank lines in the input will be ignored.

If `use_mmap` is `true`, the file specified by `source` is memory mapped for potential speedups. Default is `true` except on Windows. On Windows, you may want to specify `true` if the file is large, and is only read once and not written to.

If `quotes` is `true`, columns enclosed within double-quote (") characters are allowed to contain new lines and column delimiters. Double-quote characters within a quoted field must be escaped with another double-quote.  Specifying `dims` as a tuple of the expected rows and columns (including header, if any) may speed up reading of large files.  If `comments` is `true`, lines beginning with `comment_char` and text following `comment_char` in any line are ignored.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [5; 6; 7; 8];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end

julia> readdlm("delim_file.txt", '\t', Int, '\n')
4×2 Array{Int64,2}:
 1  5
 2  6
 3  7
 4  8
```


In [9]:
using DelimitedFiles
myData = readdlm("exampleData.csv", ';', Any, '\r', header=true)

(Any["A" 14.51 … "CH" 2017; "B" 24.96 … "D" 2016; … ; "D" 34.65 … "I" 2017; "E" 15.49 … "USA" 2018], AbstractString["line" "trait1" … "location" "year"])

In [11]:
myData[1]

5×6 Array{Any,2}:
 "A"  14.51  164.26  54.92  "CH"   2017
 "B"  24.96  554.82  75.18  "D"    2016
 "C"  24.39   94.43  94.8   "F"    2015
 "D"  34.65  915.16  45.62  "I"    2017
 "E"  15.49  725.89  26.24  "USA"  2018

In [13]:
myData[2]

1×6 Array{AbstractString,2}:
 "line"  "trait1"  "trait2"  "trait3"  "location"  "year"

In [15]:
output = open("exampleData.reformatted", "w")

SystemError: SystemError: opening file exampleData.reformatted: Operation not permitted

In [17]:
writedlm(output, [myData[2]; myData[1]])
close(output)

UndefVarError: UndefVarError: output not defined

In [19]:
output = open("exampleData.reformatted", "a") 
writedlm(output, myData[1])
close(output)

SystemError: SystemError: opening file exampleData.reformatted: Operation not permitted

In [21]:
Pkg.add("CSV")
using CSV

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.0/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


In [23]:
myCSVdata = CSV.read("exampleData.csv", delim=';',header=true)

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰,Float64⍰,String⍰,Int64⍰
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,C,24.39,94.43,94.8,F,2015
4,D,34.65,915.16,45.62,I,2017
5,E,15.49,725.89,26.24,USA,2018


In [25]:
typeof(myCSVdata)

DataFrame

In [27]:
CSV.write("exampleData.txtReformatted",myCSVdata, delim='\t')

SystemError: SystemError: opening file exampleData.txtReformatted: Operation not permitted

In [29]:
myDF = readtable("exampleData.csv", separator=';', header=true)

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰,Float64⍰,String⍰,Int64⍰
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,C,24.39,94.43,94.8,F,2015
4,D,34.65,915.16,45.62,I,2017
5,E,15.49,725.89,26.24,USA,2018


In [31]:
?CSV.write

```
CSV.write(file::Union{String, IO}, file; kwargs...) => file
table |> CSV.write(file::Union{String, IO}; kwargs...) => file
```

Write a [Tables.jl interface input](https://github.com/JuliaData/Tables.jl) to a csv file, given as an `IO` argument or String representing the file name to write to.

Keyword arguments include:

  * `delim::Union{Char, String}=','`: a character or string to print out as the file's delimiter
  * `quotechar::Char='"'`: character to use for quoting text fields that may contain delimiters or newlines
  * `openquotechar::Char`: instead of `quotechar`, use `openquotechar` and `closequotechar` to support different starting and ending quote characters
  * `escapechar::Char='\'`: character used to escape quote characters in a text field
  * `missingstring::String=""`: string to print
  * `dateformat=Dates.default_format(T)`: the date format string to use for printing out Date & DateTime columns
  * `append=false`: whether to append writing to an existing file/IO, if `true`, it will not write column names by default
  * `writeheader=!append`: whether to write an initial row of delimited column names, not written by default if appending
  * `header`: pass a list of column names (Symbols or Strings) to use instead of the column names of the input table


### Summary

- readdlm(source, delim::AbstractChar, T::Type, eol::AbstractChar; header=false, skipstart=0, - skipblanks=true, use_mmap, quotes=true, dims, comments=false, comment_char='#')
- writedlm(f, A, delim='\t'; opts)
- CSV.read(fullpath::Union{AbstractString,IO}, sink::Type{T}=DataFrame, args...; kwargs...)
- CSV.write(file_or_io::Union{AbstractString,IO}, source::Type{T}, args...; kwargs...) 
- readtable(filename, [keyword options])

https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/index.html

http://juliadata.github.io/CSV.jl/v0.1.1/



## Dataframes

In [33]:
names(myDF)

6-element Array{Symbol,1}:
 :line    
 :trait1  
 :trait2  
 :trait3  
 :location
 :year    

In [35]:
typeof(:line)

Symbol

In [37]:
head(myDF)

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰,Float64⍰,String⍰,Int64⍰
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,C,24.39,94.43,94.8,F,2015
4,D,34.65,915.16,45.62,I,2017
5,E,15.49,725.89,26.24,USA,2018


In [39]:
size(myDF)

(5, 6)

In [41]:
describe(myDF)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Int64,DataType
1,line,,A,,E,5.0,0,String
2,trait1,22.8,14.51,24.39,34.65,,0,Float64
3,trait2,490.912,94.43,554.82,915.16,,0,Float64
4,trait3,59.352,26.24,54.92,94.8,,0,Float64
5,location,,CH,,USA,5.0,0,String
6,year,2016.6,2015,2017.0,2018,,0,Int64


In [43]:
myDF[1:3,:trait1]

3-element Array{Union{Missing, Float64},1}:
 14.51
 24.96
 24.39

In [45]:
myDF[[1, 2, 4],[:line,:trait1]]

Unnamed: 0_level_0,line,trait1
Unnamed: 0_level_1,String⍰,Float64⍰
1,A,14.51
2,B,24.96
3,D,34.65


In [47]:
myDF[:year]

5-element Array{Union{Missing, Int64},1}:
 2017
 2016
 2015
 2017
 2018

In [49]:
myDF[:year] .> 2015

5-element BitArray{1}:
  true
  true
 false
  true
  true

In [51]:
myDF[myDF[:year] .> 2015,:]

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰,Float64⍰,String⍰,Int64⍰
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,D,34.65,915.16,45.62,I,2017
4,E,15.49,725.89,26.24,USA,2018


In [53]:
show(myDF)

5×6 DataFrame
│ Row │ line    │ trait1   │ trait2   │ trait3   │ location │ year   │
│     │ [90mString⍰[39m │ [90mFloat64⍰[39m │ [90mFloat64⍰[39m │ [90mFloat64⍰[39m │ [90mString⍰[39m  │ [90mInt64⍰[39m │
├─────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────┤
│ 1   │ A       │ 14.51    │ 164.26   │ 54.92    │ CH       │ 2017   │
│ 2   │ B       │ 24.96    │ 554.82   │ 75.18    │ D        │ 2016   │
│ 3   │ C       │ 24.39    │ 94.43    │ 94.8     │ F        │ 2015   │
│ 4   │ D       │ 34.65    │ 915.16   │ 45.62    │ I        │ 2017   │
│ 5   │ E       │ 15.49    │ 725.89   │ 26.24    │ USA      │ 2018   │

In [55]:
colwise(typeof,myDF)

6-element Array{DataType,1}:
 Array{Union{Missing, String},1} 
 Array{Union{Missing, Float64},1}
 Array{Union{Missing, Float64},1}
 Array{Union{Missing, Float64},1}
 Array{Union{Missing, String},1} 
 Array{Union{Missing, Int64},1}  

In [57]:
size(myDF)

(5, 6)

In [58]:
typeof(myDF)

DataFrame

In [59]:
myDF2 = DataFrame([myDF[:line] myDF[:trait1]*2 myDF[:trait2]*2.5 myDF[:trait3]*2.6 myDF[:location] myDF[:year]], names(myDF))

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,A,29.02,410.65,142.792,CH,2017
2,B,49.92,1387.05,195.468,D,2016
3,C,48.78,236.075,246.48,F,2015
4,D,69.3,2287.9,118.612,I,2017
5,E,30.98,1814.72,68.224,USA,2018


In [60]:
vcat(myDF, myDF2)

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,C,24.39,94.43,94.8,F,2015
4,D,34.65,915.16,45.62,I,2017
5,E,15.49,725.89,26.24,USA,2018
6,A,29.02,410.65,142.792,CH,2017
7,B,49.92,1387.05,195.468,D,2016
8,C,48.78,236.075,246.48,F,2015
9,D,69.3,2287.9,118.612,I,2017
10,E,30.98,1814.72,68.224,USA,2018


In [61]:
[myDF
myDF2]

Unnamed: 0_level_0,line,trait1,trait2,trait3,location,year
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,A,14.51,164.26,54.92,CH,2017
2,B,24.96,554.82,75.18,D,2016
3,C,24.39,94.43,94.8,F,2015
4,D,34.65,915.16,45.62,I,2017
5,E,15.49,725.89,26.24,USA,2018
6,A,29.02,410.65,142.792,CH,2017
7,B,49.92,1387.05,195.468,D,2016
8,C,48.78,236.075,246.48,F,2015
9,D,69.3,2287.9,118.612,I,2017
10,E,30.98,1814.72,68.224,USA,2018


 ## Summary
 - accessing header /column names of dataframe
 - concatenating data
 - selecting elements, rows, cols
 - http://juliadata.github.io/DataFrames.jl/v0.9.1/
 