# Importing data from other sources

Julia offers a number of packages with which to import data from other languages and formats

## Importing from Matlab

There is a `MAT` package that imports `.mat` files as a `Julia` dictionary. For this example, we will use the file located at <https://github.com/jmxpearson/duke-julia-ssri-2016/carbig.mat>

In [None]:
using MAT

In [2]:
vars = matread("carbig.mat")

Dict{ASCIIString,Any} with 13 entries:
  "Weight"       => 406x1 Array{Float64,2}:…
  "Acceleration" => 406x1 Array{Float64,2}:…
  "Mfg"          => Union{ASCIIString,UTF8String}["chevrolet","buick","plymouth…
  "cyl4"         => Union{ASCIIString,UTF8String}["Other","Other","Other","Othe…
  "Origin"       => Union{ASCIIString,UTF8String}["USA","USA","USA","USA","USA"…
  "when"         => Union{ASCIIString,UTF8String}["Early","Early","Early","Earl…
  "Displacement" => 406x1 Array{Float64,2}:…
  "MPG"          => 406x1 Array{Float64,2}:…
  "Model"        => Union{ASCIIString,UTF8String}["chevrolet chevelle malibu","…
  "Cylinders"    => 406x1 Array{Float64,2}:…
  "org"          => Union{ASCIIString,UTF8String}["USA","USA","USA","USA","USA"…
  "Model_Year"   => 406x1 Array{Float64,2}:…
  "Horsepower"   => 406x1 Array{Float64,2}:…

In [3]:
Weight       = vars["Weight"];
Acceleration = vars["Acceleration"];
Mfg          = vars["Mfg"];
cyl4         = vars["cyl4"];
Origin       = vars["Origin"];
when         = vars["when"];
Displacement = vars["Displacement"];
MPG          = vars["MPG"];
Model        = vars["Model"];
Cylinders    = vars["Cylinders"];
org          = vars["org"];
Model_Year   = vars["Model_Year"];
Horsepower   = vars["Horsepower"];

In [4]:
whos()

                  Acceleration   3248 bytes  406x1 Array{Float64,2} : [12.0…
                          Base  25205 KB     Module : Base
                         Blosc     37 KB     Module : Blosc
                        Compat     59 KB     Module : Compat
                          Core   3346 KB     Module : Core
                     Cylinders   3248 bytes  406x1 Array{Float64,2} : [8.0…
                  Displacement   3248 bytes  406x1 Array{Float64,2} : [307.0…
                          HDF5   2075 KB     Module : HDF5
                    Horsepower   3248 bytes  406x1 Array{Float64,2} : [130.0…
                        IJulia    318 KB     Module : IJulia
                IPythonDisplay     27 KB     Module : IPythonDisplay
                          JSON    193 KB     Module : JSON
                           MAT    243 KB     Module : MAT
                           MPG   3248 bytes  406x1 Array{Float64,2} : [18.0…
                          Main  29750 KB     Module : Main
          

## Exporting to Matlab

Similarly, we can write to Matlab format by creating a dictionary the other direction:

In [5]:
X=rand(50000,5);
matwrite("tester.mat", {
    "X" => X,
    "Acceleration" => Acceleration,
    "Horsepower" => Horsepower
});

## Importing delimited text files as Julia arrays

As with other scientific software, it is straightforward to import any delimited  text file into Julia. Let's consider the file located at https://github.com/jmxpearson/duke-julia-ssri-2016/auto.csv

In [7]:
?readdlm

search: 

```
readdlm(source, delim::Char, T::Type, eol::Char; header=false, skipstart=0, skipblanks=true, use_mmap, ignore_invalid_chars=false, quotes=true, dims, comments=true, comment_char='#')
```

Read a matrix from the source where each line (separated by `eol`) gives one row, with elements separated by the given delimeter. The source can be a text file, stream or byte array. Memory mapped files can be used by passing the byte array representation of the mapped segment as source.

If `T` is a numeric type, the result is an array of that type, with any non-numeric elements as `NaN` for floating-point types, or zero. Other useful values of `T` include `ASCIIString`, `AbstractString`, and `Any`.

If `header` is `true`, the first row of data will be read as header and the tuple `(data_cells, header_cells)` is returned instead of only `data_cells`.

Specifying `skipstart` will ignore the corresponding number of initial lines from the input.

If `skipblanks` is `true`, blank lines in the input will be ignored.

If `use_mmap` is `true`, the file specified by `source` is memory mapped for potential speedups. Default is `true` except on Windows. On Windows, you may want to specify `true` if the file is large, and is only read once and not written to.

If `ignore_invalid_chars` is `true`, bytes in `source` with invalid character encoding will be ignored. Otherwise an error is thrown indicating the offending character position.

If `quotes` is `true`, column enclosed within double-quote (") characters are allowed to contain new lines and column delimiters. Double-quote characters within a quoted field must be escaped with another double-quote.  Specifying `dims` as a tuple of the expected rows and columns (including header, if any) may speed up reading of large files.  If `comments` is `true`, lines beginning with `comment_char` and text following `comment_char` in any line are ignored.

```
readdlm(source, T::Type; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `n`.

```
readdlm(source, delim::Char, T::Type; options...)
```

The end of line delimiter is taken as `n`.

```
readdlm(source, delim::Char, eol::Char; options...)
```

If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a cell array of numbers and strings is returned.

```
readdlm(source, delim::Char; options...)
```

The end of line delimiter is taken as `n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a cell array of numbers and strings is returned.

```
readdlm(source; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a cell array of numbers and strings is returned.


readdlm readdir readandwrite



In [10]:
auto = readdlm("auto.csv", ',',header=true);

In [12]:
auto

(
74x12 Array{Any,2}:
 "AMC Concord"      4099  22  3    2.5  11  2930  186  40  121  3.58  0
 "AMC Pacer"        4749  17  3    3.0  11  3350  173  40  258  2.53  0
 "AMC Spirit"       3799  22   ""  3.0  12  2640  168  35  121  3.08  0
 "Buick Century"    4816  20  3    4.5  16  3250  196  40  196  2.93  0
 "Buick Electra"    7827  15  4    4.0  20  4080  222  43  350  2.41  0
 "Buick LeSabre"    5788  18  3    4.0  21  3670  218  43  231  2.73  0
 "Buick Opel"       4453  26   ""  3.0  10  2230  170  34  304  2.87  0
 "Buick Regal"      5189  20  3    2.0  16  3280  200  42  196  2.93  0
 "Buick Riviera"   10372  16  3    3.5  17  3880  207  43  231  2.93  0
 "Buick Skylark"    4082  19  3    3.5  13  3400  200  42  231  3.08  0
 "Cad. Deville"    11385  14  3    4.0  20  4330  221  44  425  2.28  0
 "Cad. Eldorado"   14500  14  2    3.5  16  3900  204  43  350  2.19  0
 "Cad. Seville"    15906  21  3    3.0  13  4290  204  45  350  2.24  0
 ⋮                                       ⋮

In [13]:
typeof(auto)

Tuple{Array{Any,2},Array{AbstractString,2}}

In [15]:
csvdata=auto[1]

74x12 Array{Any,2}:
 "AMC Concord"      4099  22  3    2.5  11  2930  186  40  121  3.58  0
 "AMC Pacer"        4749  17  3    3.0  11  3350  173  40  258  2.53  0
 "AMC Spirit"       3799  22   ""  3.0  12  2640  168  35  121  3.08  0
 "Buick Century"    4816  20  3    4.5  16  3250  196  40  196  2.93  0
 "Buick Electra"    7827  15  4    4.0  20  4080  222  43  350  2.41  0
 "Buick LeSabre"    5788  18  3    4.0  21  3670  218  43  231  2.73  0
 "Buick Opel"       4453  26   ""  3.0  10  2230  170  34  304  2.87  0
 "Buick Regal"      5189  20  3    2.0  16  3280  200  42  196  2.93  0
 "Buick Riviera"   10372  16  3    3.5  17  3880  207  43  231  2.93  0
 "Buick Skylark"    4082  19  3    3.5  13  3400  200  42  231  3.08  0
 "Cad. Deville"    11385  14  3    4.0  20  4330  221  44  425  2.28  0
 "Cad. Eldorado"   14500  14  2    3.5  16  3900  204  43  350  2.19  0
 "Cad. Seville"    15906  21  3    3.0  13  4290  204  45  350  2.24  0
 ⋮                                       ⋮  

In [16]:
csvdata[:,2]

74-element Array{Any,1}:
  4099
  4749
  3799
  4816
  7827
  5788
  4453
  5189
 10372
  4082
 11385
 14500
 15906
     ⋮
  3995
 12990
  3895
  3798
  5899
  3748
  5719
  7140
  5397
  4697
  6850
 11995

Notice that we can specify any type of delimter; we chose comma here because it is one of the most common.

## Exporting Julia arrays as delimited files

To export, the process is simply reversed and we instead use the `writedlm()` function:

In [17]:
?writedlm

search:

```
writedlm(f, A, delim='\\t')
```

Write `A` (a vector, matrix or an iterable collection of iterable rows) as text to `f` (either a filename string or an `IO` stream) using the given delimeter `delim` (which defaults to tab, but can be any printable Julia object, typically a `Char` or `AbstractString`).

For example, two vectors `x` and `y` of the same length can be written as two columns of tab-delimited text to `f` by either `writedlm(f, [x y])` or by `writedlm(f, zip(x, y))`.


 writedlm



In [20]:
writedlm("autoout.csv",csvdata,',');

## Importing delimited files as Julia data frames

As you may have noticed, `readdlm()` was clunky when the file has mixed data types (e.g. strings and numbers). We can resolve this by directly importing to a data frame using the `readtable()` function in the `DataFrames` package:

In [None]:
using DataFrames

In [24]:
autoDF = readtable("auto.csv")

Unnamed: 0,make,price,mpg,rep78,headroom,trunk,weight,length,turn,displacement,gear_ratio,foreign
1,AMC Concord,4099,22,3,2.5,11,2930,186,40,121,3.58,0
2,AMC Pacer,4749,17,3,3.0,11,3350,173,40,258,2.53,0
3,AMC Spirit,3799,22,,3.0,12,2640,168,35,121,3.08,0
4,Buick Century,4816,20,3,4.5,16,3250,196,40,196,2.93,0
5,Buick Electra,7827,15,4,4.0,20,4080,222,43,350,2.41,0
6,Buick LeSabre,5788,18,3,4.0,21,3670,218,43,231,2.73,0
7,Buick Opel,4453,26,,3.0,10,2230,170,34,304,2.87,0
8,Buick Regal,5189,20,3,2.0,16,3280,200,42,196,2.93,0
9,Buick Riviera,10372,16,3,3.5,17,3880,207,43,231,2.93,0
10,Buick Skylark,4082,19,3,3.5,13,3400,200,42,231,3.08,0


Notice that the result allows for both strings and numbers to be stored, as well as missing values to be represented by `NA`.

### Other features of `readtable()`

`readtable()` offers the following other options:
- Specify which strings convert to `NA` [default is `""`]
- Specify how text file is delimited [default is `','`]
- Specify whether or not a header of column names exists

## Exporting Julia data frames as delimited text files

The `writetable()` function operates as the inverse of `readtable()`:

In [25]:
writetable("auto1.csv", autoDF, separator = ',', header = false);

## Interfacing with other data types

There exist packages for directly reading data from SAS, Stata, R, SPSS, etc. However, these packages are not well supported. (example: `DataRead` package)