# Comma police in Julia

Julia can read and write csv files as well.

The straight forward way is using `load()` and `save()`. Julia will try to guess the format of the file by its extension (e.g., if the file is called "blablab.csv" it will assume it is a csv file. `load()` can also load entire directories of files in one go: just provide the path to the directory. It will try to produce 

The Input and Output of Files is handled in genral by the package `FileIO`. You already have it if you have installed Queryverse in the previous labs. You can load it either directly (in "isolation"):

In [None]:
using Pkg

In [None]:
# using FileIO

or loading the full Queryverse. The Queryverse has very good documentation https://www.queryverse.org/, get familiar with it.

In [None]:
using Queryverse

and we load also the rest of the packages we need

In [None]:
using VegaDatasets, VegaLite

To write a csv is quite easy: pass a dataframe (or any  other format for tabular data) to `save()` specifying the name of the file you want to create:

In [None]:
dataset("cars") |>
  save("Data/cars.csv")

Uh, what happens there? Well, let's read. Apparently, the folder "Data" is not there. And indeed, it is not there, yet. If we start a cell with ";" then we can pass any shell command. We use this to create the folder we need.

In [None]:
; mkdir Data

And then we try again.

In [None]:
dataset("cars") |>
  save("Data/cars.csv")

Reading a csv requires to know where it is:

In [None]:
cars = load("Data/cars.csv")

The ouput of `load()` is NOT a dataframe, but ... a CSVFiles. This is because Julia does not read ALL the file in memory, but it keeps on disk and access it as it needs it.

In [None]:
cars |>
 typeof

Yet, the output is still tabular data, in particular an "IndexedTable" (in the Queryverse jargon), and so we can filer and operate on it as we do with dataframes:

In [None]:
cars |>
  @filter(_.Origin=="Europe") |>
  @vlplot(:point, x=:Horsepower, y=:Acceleration, color="Cylinders:n")

And we can always transform it into a dataframe:

In [None]:
cars |>
  DataFrame |>
  describe

### beyond csv

`load()` is able to load a bunch of different formats: csv and excel files, but also SPSS, Stat and SAS files. And more. 

For the time being, `save()` is more limited and can save in csv and two special formats (feather and bedgraph).

For more details see here: http://www.david-anthoff.com/jl4ds/stable/fileio/#The-load-and-save-function-1

## Complicated csvs

If you find some nasty and complicated csv, which `load()` are not able to handle, the way out is using the dedicated package [CSV](http://juliadata.github.io/CSV.jl/stable/index.html#High-level-interface-1).

In [None]:
# Pkg.add("CSV")

In [None]:
using CSV

The main functions here are `CSV.write()` to write a csv, `CSV.read()` to read a csv, and `CSV.validate()` to get information about why the reading of a csv file fails.

## Excelles

Julia handles excel files in a similar way that `readxl` in R does. The package in this case is `ExcelFiles` (see: https://github.com/queryverse/ExcelFiles.jl ).

In [None]:
# Pkg.add("ExcelFiles")

In [None]:
using ExcelFiles

The main functions here are named similary to the CSV package.

# Your turn

Load a dataset from VegaDataset (e.g., `dataset("iris")`), do some wrangling, and write it to disk as a csv. Then read it back

In [None]:
# your code here.