Import the main packages used for handling data.

In [5]:
using BenchmarkTools 
using DataFrames 
using DelimitedFiles 
using CSV
using XLSX

┌ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
└ @ Base loading.jl:1260
┌ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1260
┌ Info: Precompiling XLSX [fdbf4ff8-1666-58a4-91e7-1b58723a45e0]
└ @ Base loading.jl:1260


### Download files

Download a CSV file. Julia provides the *download* function, which uses the the commands *wget*, *curl* or *fetch*. 

In [12]:
P = download("https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv",
    "programminglanguages.csv")

"programminglanguages.csv"

As we can see, the dowloaded file is a simple CSV file with two columns.

In [14]:
;head "programminglanguages.csv"

year,language
1951,Regional Assembly Language
1952,Autocode
1954,IPL
1955,FLOW-MATIC
1957,FORTRAN
1957,COMTRAN
1958,LISP
1958,ALGOL 58
1959,FACT


### Read files

#### DelimitedFiles package

There are two main ways of reading files in Julia. The first option consists in using the *DelimitedFiles* package, which provides the fuction **readdlm**. However, this package should only be used when the file to be read is really complicated - not the case -.

In [19]:
B, H = readdlm(P, ','; header=true)

(Any[1951 "Regional Assembly Language"; 1952 "Autocode"; … ; 2012 "Julia"; 2014 "Swift"], AbstractString["year" "language"])

The file's path (P) and the delimeter (',') have been provided. The *header* option tells the function to return the data's body and header separately.

In [30]:
H

1×2 Array{AbstractString,2}:
 "year"  "language"

In [29]:
B[1:5, :]

5×2 Array{Any,2}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"

The *DelimetedFiles* package also provides a method to write a text file:

In [34]:
writedlm("programminglanguages_dlm.txt", P, '-')

#### CSV package

Using the *CSV* package has the following advantages over the *DelimitedFiles* package: 

- Converts the data into a DataFrame. 
- Faster

In [36]:
C = CSV.read(P);

In [37]:
typeof(C) # 

DataFrame

In [39]:
C[1:5, :]

Unnamed: 0_level_0,year,language
Unnamed: 0_level_1,Int64,String
1,1951,Regional Assembly Language
2,1952,Autocode
3,1954,IPL
4,1955,FLOW-MATIC
5,1957,FORTRAN


DataFrames allow different ways of accessing the different files.

In [46]:
C[1:5, :year]

5-element Array{Int64,1}:
 1951
 1952
 1954
 1955
 1957

In [48]:
C.year[1:5]

5-element Array{Int64,1}:
 1951
 1952
 1954
 1955
 1957

In order to see which field the DataFrame contains we can use the *names* function.

In [49]:
names(C)

2-element Array{String,1}:
 "year"
 "language"

We can check that the *CSV* package is faster than the *DelimitedFiles* package with the *@btime* macro.

In [58]:
@btime B, H  = readdlm(P, ','; header=true);
@btime C = CSV.read(P);

  77.572 μs (326 allocations: 51.09 KiB)
  49.440 μs (173 allocations: 18.88 KiB)


In order to write to a *.csv file using the CSV package

In [60]:
CSV.write("programminglanguages_csv.csv", DataFrame(B))

"programminglanguages_csv.csv"

#### XLSX package