## Reading Tabular Data

Fundamental functions:

- read.table, read.csv: read text files and return dataframe
- readLines: read lines of a text file as character vector
- source: read R code
- dget: read R objects
- load/unserialize: read binary objects

## Writing Data

Analogous Functions: 

- write.table
- writeLines
- dump
- save/serialize

## Reading data file with **read.table**

arguments: 

- file: name of file
- header: is the first line a header?
- sep: seperator
- colClasses: character vector, class of each column
- nrows
- comment.char: indicating the comment character
- skip: no. of lines to skip from beginning
- stringsAsFactors

```R
data <- read.table("foo.txt")
```

- automatically skip lines with #
- auto memory allocation
- figure out variables
- read.csv is identical but default seperator is ','

## Reading Larger tables:

- estimate memory requirement since R puts entire dataset into RAM
- comment.char = "" if no comments in file
- assign colClasses
- set nRows

## Textual Formats

dumping: 

- dump
- dput

(advantage: includes metadata)

*not space efficient*

## dput-ting R Objects:

In [2]:
y <- data.frame(a = 1, b = "a")
dput(y, file = "y.R")

In [3]:
new.y <- dget("y.R")
new.y

a,b
1,a


## Dumping R Objects: 

Difference with *dput* - Dumping can be done with multiple R objects.

In [4]:
x <- "foo"
y <- data.frame(a = 1, b = "a")
dump(c("x", "y"), file = "data.R")
rm(x,y)
source("data.R")

In [5]:
y

a,b
1,a


In [6]:
x

## Interfaces to the Outside world:

- file
- gzfile (gzip)
- bzfile (bzip2)
- url

## File Connections:

In [7]:
str(file)

function (description = "", open = "", blocking = TRUE, encoding = getOption("encoding"), 
    raw = FALSE, method = getOption("url.method", "default"))  


- description (filename)
- open:
    - r (read)
    - w (write)
    - a (append)
    - rb, wb, ab (binary mode)

## Connections:

In [None]:
con <- file("foo.txt", "r")
data <- read.csv(con)
close(con)

In [None]:
# same as: 

data <- read.csv("foo.txt")

Useful when reading parts of a file:

In [None]:
con <- gzfile("words.gz")
x <- readLines(con, 10)
x

writeLines: similar

## Useful for reading lines from webpage:

In [8]:
con <- url("http://www.jhsph.edu", "r")
x <- readLines(con)
head(x)