We will learn how to load files in R with the `readr` package, which is part of the core tidyverse.

# Functions for data import

Most of `readr`’s functions are concerned with turning flat files into data frames:

* `read_csv()` reads comma delimited files, `read_csv2()` reads semicolon separated files (common in countries where `, `is used as the decimal place), `read_tsv()` reads tab delimited files, and `read_delim()` reads in files with any delimiter.

* `read_fwf()` reads fixed width files. You can specify fields either by their widths with `fwf_widths()` or their position with `fwf_positions()`. `read_table()` reads a common variation of fixed width files where columns are separated by white space.

* `read_log()` reads Apache style log files. (But also check out webreadr which is built on top of `read_log()` and provides many more helpful tools.)

These functions all have similar syntax: once you have mastered one, you can use the others with ease. We will focus on `read_csv()`, one of the most common forms of data storage.

The first argument to `read_csv()` is the most important: it is the path to the file to read.

```
heights <- read_csv("data/heights.csv")
```

You can also supply an inline csv file. This is useful for experimenting with readr and for creating reproducible examples to share with others:

```r
read_csv("a,b,c
1,2,3
4,5,6")
```

`read_csv()` uses the first line of the data for the column names, which is a very common convention. There are two cases where you might want to tweak this behaviour:

1. Sometimes there are a few lines of metadata at the top of the file. You can use `skip = n` to skip the first `n` lines; or use `comment = "#"` to drop all lines that start with `#`.
```r
read_csv("The first line of metadata
  The second line of metadata
  x,y,z
  1,2,3", skip = 2)
```


The data might not have column names. You can use `col_names = FALSE` to tell `read_csv()` not to treat the first row as headings, and instead label them sequentially from `X1` to `Xn`:
```r
read_csv("1,2,3\n4,5,6", col_names = FALSE)
```

Alternatively you can pass col_names a character vector which will be used as the column names:
```r
read_csv("1,2,3\n4,5,6", col_names = c("x", "y", "z"))
```


Another option that commonly needs tweaking is `na`: this specifies the value (or values) that are used to represent missing values in your file:

```r
read_csv("a,b,c\n1,2,.", na = ".")
```

## Your turn
* What function would you use to read a file where fields were separated with
“|”, e.g. `"1|2|3\n4|5|6"`? (check `read_delim()`)


* Identify what is wrong with each of the following inline CSV files. What happens when you run the code? 

```r
read_csv("a,b\n1,2,3\n4,5,6")
read_csv("a,b,c\n1,2\n1,2,3,4")
read_csv("a,b\n1,2\na,b")
read_csv("a;b\n1;3")
```


## Parsing a vector
 The `parse_*()` functions take a character vector and return a more specialized vector like a logical, integer, or date.
Like all functions in the tidyverse, the `parse_*()` functions are uniform: the first argument is a character vector to parse, and the `na` argument specifies which strings should be treated as missing:

```r
parse_integer(c("1", "231", ".", "456"), na = ".")
```

If parsing fails, you’ll get a warning:
```r
x <- parse_integer(c("123", "345", "abc", "123.45"))
```


Using parsers is mostly a matter of understanding what’s available and how they deal with different types of input. There are eight particularly important parsers:

1. `parse_logical()` and `parse_integer()` parse logicals and integers respectively. 

2. `parse_double()` is a strict numeric parser, and `parse_number()` is a flexible numeric parser. These are more complicated than you might expect because different parts of the world write numbers in different ways.

3. `parse_character()`  There is one complication: character encodings.

4. `parse_factor()` create factor

5. `parse_datetime()`, `parse_date()`, and `parse_time()` allow you to parse various date and time specifications. These are the most complicated because there are so many different ways of writing dates.

## Parsing a file
`readr` uses a heuristic to figure out the type of each column: it reads the first 1000 rows and uses some  heuristics to figure out the type of each column. You can emulate this process with a character vector using `guess_parser()`, which returns readr’s best guess, and `parse_guess()` which uses that guess to parse the column:
```r
guess_parser("2010-10-01")
guess_parser("15:01")
guess_parser(c("TRUE", "FALSE"))
```

These defaults may not  work for larger files. There are two basic problems:

The first thousand rows might be a special case, and readr guesses a type that is not sufficiently general. For example, you might have a column of doubles that only contains integers in the first 1000 rows.

The column might contain a lot of missing values. If the first 1000 rows contain only NAs, readr will guess that it’s a logical vector, whereas you probably want to parse it as something more specific.


# Writing to a file
`readr` also comes with two useful functions for writing data back to disk: `write_csv()` and `write_tsv()`. Both functions increase the chances of the output file being read back in correctly by:

* Always encoding strings in UTF-8.

* Saving dates and date-times in ISO8601 format so they are easily parsed elsewhere.

<!-- If you want to export a csv file to Excel, use `write_excel_csv()` — this writes a special character (a “byte order mark”) at the start of the file which tells Excel that you’re using the UTF-8 encoding. -->

The most important arguments are `x` (the data to save), and `path` (the location to save it). You can also specify how missing values are written with `na`
```r
write_csv(name_of_data, "path_to_save")
```

Note that the type information is lost when you save to `csv`. To include the type information, you can use `write_rds() and read_rds().  These store data in R’s custom binary format called RDS.
```r
write_rds(name_of_data, "path_to_save.rds")
read_rds("path_to_read.rds")
```

RDS saves only one object. If you want to save multiple objects, you can use RData format:
```r
save(a,b,c,file = "path_to_save.RData")
load(file = "path_to_load.RData")
```

# Save figures

ggsave() is a convenient function for saving a plot. It defaults to saving the last plot that you displayed, using the size of the current graphics device. It also guesses the type of graphics device from the extension.

```r
ggplot(mtcars, aes(mpg, wt)) + geom_point()

ggsave("mtcars.pdf")
ggsave("mtcars.png")

ggsave("mtcars.pdf", width = 4, height = 4)
ggsave("mtcars.pdf", width = 20, height = 20, units = "cm")
```