# Loading and Saving Data

to/from csv, hdf5, jld2, mat and xlsx files.

# Load Packages

The packages are loaded in the respective sections below. This allows you to run parts of this notebook without having to install all packages.

The data files created by this notebook are written to and loaded from the subfolder "Results".

In [1]:
using Dates

include("printmat.jl")   #a function for prettier matrix printing

if !isdir("Results")
    error("create the subfolder Results before running this program")
end

# Saving and Loading a csv File

The csv ("comma-separated values") format provides a simple and robust method for moving data, and it can be read by most software.

The basic commands of the package DelimitedFiles are
```
writedlm(FileName,matrix)
x = readdlm(FileName)
```

Extra arguments control the delimiters (for instance, `','`) in the file, the type of data (Float, Int, etc), and whether the file starts with header lines.

For instance, if you want to specify that the delimter is comma (,) and also disregard the first 3 lines (perhaps because they contain variable names etc), then use

```
x = readdlm(FileName,',',skipstart=3)
```

If you need more powerful write/read routines, try https://github.com/JuliaData/CSV.jl.

In [2]:
using DelimitedFiles     

A = copy(reshape(1:20,5,4))     #to be on the safe side: only try to save
                                #"independent" arrays, not reshapes or views 

writedlm("Results/NewCsvFile.csv",A,',')  #write csv file
println("NewCsvFile.csv has been created in the subfolder Results. Check it out.")

NewCsvFile.csv has been created in the subfolder Results. Check it out.


In [3]:
A2 = readdlm("Results/NewCsvFile.csv",',',Int)  #read csv file, try Float64 instead of Int

println("\nA (in memory):")
printmat(A)
println("\nA2 (read from csv file):")
printmat(A2)


A (in memory):
         1         6        11        16
         2         7        12        17
         3         8        13        18
         4         9        14        19
         5        10        15        20


A2 (read from csv file):
         1         6        11        16
         2         7        12        17
         3         8        13        18
         4         9        14        19
         5        10        15        20



# (extra) Loading csv and Fixing Missing Values

The next cells show how to load a csv files with dates (15/01/1979) and some missing values. 

In [4]:
x2 = readdlm("Data/DataWithDates.csv",',',skipstart=1)
dN = Date.(x2[:,1],"d/m/y")                   #convert 1st column to Date
x  = x2[:,2:end]                              #the data, but Any[] since missing data

println("dates and data (first 4 obs)")
printmat(dN[1:4])
printmat(x[1:4,:])

vv     = .!isa.(x,Number)               #locate cells that are not numbers,
x[vv] .= NaN                            #then set them to NaN (or perhaps missing), and 
x      = convert(Array{Float64},x)      #convert the matrix from Any to Float64

println("data after fix (first 4 obs)")
printmat(x[1:4,:])

dates and data (first 4 obs)
1979-01-02
1979-01-03
1979-01-04
1979-01-05

    96.730          
               9.310
    98.580     9.310
    99.130     9.340

data after fix (first 4 obs)
    96.730       NaN
       NaN     9.310
    98.580     9.310
    99.130     9.340



# Saving and Loading an hdf5 File

hdf5 files are used in many computer languages. They can store different types of data: integers, floats, strings (but not Julia dates). 

The basic syntax of the package HDF5 is 
```
fh = h5open(FileName,"w")
    write(fh,"x",x,"y",y)
close(fh)

fh = h5open(FileName,"r")   #open for reading
    (x,y) = read(fh,"x","y")
close(fh)
```


The package HDF5 is at https://github.com/JuliaIO/HDF5.jl. 

See https://support.hdfgroup.org/products/java/hdfview/
for a program that lets you look at the contents of a hdf5 file. (It is not needed here.)

In [5]:
using HDF5   

A = copy(reshape(1:20,5,4))
B = 1
C = "Nice cat"

fh = h5open("Results/NewH5File.h5","w")    #open file for writing   
    write(fh,"A",A,"B",B,"C",C)
close(fh)                                  #close file

println("NewH5File.h5 has been created in the subfolder Results")

NewH5File.h5 has been created in the subfolder Results


In [6]:
fh = h5open("Results/NewH5File.h5","r")    #open for reading
     println("\nVariables in h5 file: ",names(fh))
    (A2,B2) = read(fh,"A","B")             #load some of the data
close(fh)

println("\nA from h5 file is")
printmat(A2)


Variables in h5 file: ["A", "B", "C"]

A from h5 file is
         1         6        11        16
         2         7        12        17
         3         8        13        18
         4         9        14        19
         5        10        15        20



# Saving and Loading a jld2 File

jld2 files can store very different types of data: integers, floats, strings, dictionaries, etc. It is a dialect of hdf5, designed to save different Julia objects (including Dates). 

The basic syntax of the package JLD2 is 
```
save(FileName,"MatrixName1",matrix1,"MatrixName2",matrix2)
(x1,x2) = load(FileName,"MatrixName1","MatrixName2")  
```
(It also possible to use the same syntax as for HDF5, except that we use ```jldopen``` instead of ```h5open```.)

The package JLD2 is at https://github.com/simonster/JLD2.jl.

In [7]:
using FileIO, JLD2           #the FileIO package is also needed        
                                        
A = copy(reshape(1:20,5,4))
B = 1
C = "Nice cat"
(A2,B2,C2) = (nothing,nothing,nothing)               #erase earlier results

save("Results/NewJldFile.jld2","A",A,"B",B,"C",C)       #write jld file

println("NewJldFile.jld2 has been created in the subfolder Results")

NewJldFile.jld2 has been created in the subfolder Results


In [8]:
x = load("Results/NewJldFile.jld2")                   #load entire file
println("The variables are: ",keys(x))               #list contents of the file 

(A2,B2) = load("Results/NewJldFile.jld2","A","B")     #read some of the data
println("\nA from jld2 file is")
printmat(A2)

The variables are: ["B", "A", "C"]

A from jld2 file is
         1         6        11        16
         2         7        12        17
         3         8        13        18
         4         9        14        19
         5        10        15        20



# (extra) Saving and Loading a Matlab mat File

The MAT package allows you to load/save (Matlab) mat files, which is another dialect of HDF5. 

See https://github.com/JuliaIO/MAT.jl.

In [9]:
using MAT     

A = copy(reshape(1:20,5,4))
B = 1
C = "Nice cat"
(A2,B2,C2) = (nothing,nothing,nothing)               #erase earlier results

fh = matopen("Results/NewMatFile.mat","w")   
    write(fh,"A",A)             #write one variable at a time                       
    write(fh,"B",B)
    write(fh,"C",C)
close(fh)

println("\nNewMatFile.mat has been created in the subfolder Results")


NewMatFile.mat has been created in the subfolder Results


In [10]:
fh = matopen("Results/NewMatFile.mat")           
    println("\nVariables in mat file: ",names(fh))
    (A2,B2) = read(fh,"A","B")                                
close(fh) 

println("\nA from mat file is ")
printmat(A2)


Variables in mat file: ["A", "B", "C"]

A from mat file is 
         1         6        11        16
         2         7        12        17
         3         8        13        18
         4         9        14        19
         5        10        15        20



# (extra) Loading an xls File

The XLSX package (https://github.com/felipenoris/XLSX.jl) allows you to read and write xls (and xlsx) files. 

As an alternative, you can use ExcelReaders (https://github.com/davidanthoff/ExcelReaders.jl), which requires python and python's xlrd libarary. For instance, this would work
```
using ExcelReaders
data1 = readxl("Data/readXlsTsT_Data.xlsx","Data!B2:C11")
x1    = convert(Array{Float64},data1)            
printmat(x1)
```

In [11]:
import XLSX

data1 = XLSX.readxlsx("Data/readXlsTsT_Data.xlsx")   #reading the entire file
x1    = data1["Data!B2:C11"]                         #extracting a part of the sheet "Data"
x1    = convert(Array{Float64},x1)                   #converting from Any to Float64

println("part of the xlsx file:")
printmat(x1)

part of the xlsx file:
    16.660  -999.990
    16.850  -999.990
    16.930  -999.990
    16.980  -999.990
    17.080  -999.990
    17.030     7.000
    17.090     8.000
    16.760  -999.990
    16.670  -999.990
    16.720  -999.990

