What you can find here:
 - Arrow
 - Txt Files
 - Delimited Files
 - CSV
 - XLSX
 - JLD/NPZ/RDA/MAT

## Arrow

In general using Arrow.jl is a preferred way to store your data frames, but note that in order to ensure maximum speed it uses its own `AbstractVector` type.
To materialize this abstract columns into standard `Vector`s just `copy` a data frame. df_mat = copy(df_arrow)

In [None]:
Arrow.write("auto2.arrow", df2)

UndefVarError: UndefVarError: Arrow not defined

## Txt Files

In [None]:
readlines("data/auto.txt"); # read all lines

In [None]:
raw_str = read("data/auto.txt", String); # read all lines and but in a String
str_no_tab = replace(raw_str, '\t'=>' ') # replace the delimiter
io = IOBuffer(str_no_tab) # put the string as a IO buffer
df1 = CSV.File(io, delim=' ', # read txt file as a csv/dataframe
               ignorerepeated=true,
               header=[:mpg, :cylinders, :displacement, :horsepower,
                       :weight, :acceleration, :year, :origin, :name],
               missingstring="NA") |>
      DataFrame # the '|> DataFrame' transform the data into a dataframe

## Throw link

Can download a file direct from a link
Note: `download` depends on external tools such as curl, wget or fetch. So you must have one of these.

In [None]:
P = Downloads.download("https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv",
    "data/programming_languagesV2.csv")

RequestError: ssl_handshake returned - mbedTLS: (-0x0050) NET - Connection was reset by peer while requesting https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv

## DelimitedFiles

The key question here is to load data from files such as `csv` files, `xlsx` files, or just raw text files. We will go over some Julia packages that will allow us to read such files very easily. Belong to the standard library.

In [None]:
#=
readdlm(source,
    delim::AbstractChar,
    T::Type,
    eol::AbstractChar;
    header=false,
    skipstart=0,
    skipblanks=true,
    use_mmap,
    quotes=true,
    dims,
    comments=false,
    comment_char='#')
=#
P,H = readdlm("data/programming_languages.csv",',';header=true);

In [None]:
# To write to a text file, you can:
writedlm("data/programminglanguages_dlm.txt", P, '-')

In [None]:
# Transform to a DataFrames
DataFrame(P)

## CSV

Better to use - will return the data as DataFrame object.

In [None]:
C = CSV.read("data/programming_languages.csv", DataFrame);

In [None]:
df1 = CSV.File("data/auto2.csv") |> DataFrame

In [None]:
# To write to a *.csv file using the CSV package
CSV.write("data/programminglanguages_CSV.csv", C)

"programminglanguages_CSV.csv"

## XLSX

In [None]:
T = XLSX.readdata("data/zillow_data_download_april2020.xlsx", #file name
    "Sale_counts_city", #sheet name
    "A1:F9" #cell range, good if possible, if not, can take too much time
    )

G = XLSX.readtable("data/zillow_data_download_april2020.xlsx","Sale_counts_city");

9×6 Matrix{Any}:
      "RegionID"  "RegionName"    …      "2008-03"      "2008-04"
  6181            "New York"             missing        missing
 12447            "Los Angeles"      1446           1705
 39051            "Houston"          2926           3121
 17426            "Chicago"          2910           3022
  6915            "San Antonio"   …  1479           1529
 13271            "Philadelphia"     1609           1795
 40326            "Phoenix"          1310           1519
 18959            "Las Vegas"        1618           1856

In [None]:
# Transform into a DataFrames
DataFrame(T, :auto)

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04
2,6181,New York,New York,1,missing,missing
3,12447,Los Angeles,California,2,1446,1705
4,39051,Houston,Texas,3,2926,3121
5,17426,Chicago,Illinois,4,2910,3022
6,6915,San Antonio,Texas,5,1479,1529
7,13271,Philadelphia,Pennsylvania,6,1609,1795
8,40326,Phoenix,Arizona,7,1310,1519
9,18959,Las Vegas,Nevada,8,1618,1856


In [None]:
G[1] # is the data (a vector of vector: 'G[1][1] for the first col); 'G[2]' is the header w/ cols names

148-element Vector{Any}:
 Any[6181, 12447, 39051, 17426, 6915, 13271, 40326, 18959, 54296, 38128  …  396952, 397236, 398030, 398104, 398357, 398712, 398716, 399081, 737789, 760882]
 Any["New York", "Los Angeles", "Houston", "Chicago", "San Antonio", "Philadelphia", "Phoenix", "Las Vegas", "San Diego", "Dallas"  …  "Barnard Plantation", "Windsor Place", "Stockbridge", "Mattamiscontis", "Chase Stream", "Bowdoin College Grant West", "Summerset", "Long Pond", "Hideout", "Ebeemee"]
 Any["New York", "California", "Texas", "Illinois", "Texas", "Pennsylvania", "Arizona", "Nevada", "California", "Texas"  …  "Maine", "Missouri", "Wisconsin", "Maine", "Maine", "Maine", "South Dakota", "Maine", "Utah", "Maine"]
 Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  28750, 28751, 28752, 28753, 28754, 28755, 28756, 28757, 28758, 28759]
 Any[missing, 1446, 2926, 2910, 1479, 1609, 1310, 1618, 772, 1158  …  0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
 Any[missing, 1705, 3121, 3022, 1529, 1795, 1519, 1856, 1057, 1232  …  0, 0, 0, 0,

In [None]:
# Transform into a DataFrame - uses the 'splat' operator to unwrap these arrays and pass them
D = DataFrame(G...) # equivalent to DataFrame(G[1],G[2])

Unnamed: 0_level_0,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04,2008-05
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any
1,6181,New York,New York,1,missing,missing,missing
2,12447,Los Angeles,California,2,1446,1705,1795
3,39051,Houston,Texas,3,2926,3121,3220
4,17426,Chicago,Illinois,4,2910,3022,2937
5,6915,San Antonio,Texas,5,1479,1529,1582
6,13271,Philadelphia,Pennsylvania,6,1609,1795,1709
7,40326,Phoenix,Arizona,7,1310,1519,1654
8,18959,Las Vegas,Nevada,8,1618,1856,1961
9,54296,San Diego,California,9,772,1057,1195
10,38128,Dallas,Texas,10,1158,1232,1240


In [None]:
# if you already have a dataframe:
# XLSX.writetable("filename.xlsx", collect(DataFrames.eachcol(df)), DataFrames.names(df))
XLSX.writetable("data/writefile_using_XLSX.xlsx",G[1],G[2])

## jld, npz, rda, and mat

In [None]:
using MAT
Matlab_data = matread("data/mytempdata.mat")
MAT.matwrite("mywrite.mat",Matlab_data)

In [None]:
using JLD
jld_data = JLD.load("data/mytempdata.jld")
JLD.save("mywrite.jld", "A", jld_data)

In [None]:
using NPZ
npz_data = NPZ.npzread("data/mytempdata.npz")
NPZ.npzwrite("mywrite.npz", npz_data)

In [None]:
using RData
R_data = RData.load("data/mytempdata.rda")
# We'll need RCall to save here. https://github.com/JuliaData/RData.jl/issues/56
using RCall
@rput R_data
R"save(R_data, file=\"mywrite.rda\")"

RObject{NilSxp}
NULL


In [None]:
using GeoJSON
geoJSON_data = GeoJSON.read("data/mytempdata.geojson")
df = DataFrame(fc) # use as a data frame
GeoJSON.write("mywrite.geojson", jeoJSON_data)
