# Reading and Writing Data in R
### by [Jason DeBacker](http://jasondebacker.com), October 2019

This notebook illustrates how to read and write data in R.

## Reading Data

### CSV Files

In [1]:
# see what directory we are in
getwd()
# read in Kauffman data we looked at in the Python tutorial
# Read in csv file and put in data frame named kisa
# column names are set from the header of the read file if
# header = TRUE
kisa <- read.table("../Python/kisa_2015.csv", 
                 header = TRUE,
                 sep = ",", fill = TRUE)
kisa[1:5,] # show the first 5 rows
str(kisa)

month,grdatn,marstat,age,class,region,state,hours,mlr,natvty,⋯,homeown,hoursu1b,hoursu1b_t1,se15u,se15u_t1,ent015u,ent015ua,vet,wgtat,wgtat1
<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<lgl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>
12,42,5,57,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,269.1724,270.4338
12,39,7,26,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,403.0235,404.9121
12,41,1,43,4,2,41,46,1,110,⋯,,46,40,0,0,0,0,0,402.7901,404.6776
12,39,1,38,4,2,41,40,1,57,⋯,,40,30,0,0,0,0,0,342.9345,344.5415
12,42,1,51,-1,3,58,-1,6,57,⋯,,-1,-1,0,0,0,0,0,560.2244,562.8497


'data.frame':	95 obs. of  37 variables:
 $ month      : int  12 12 12 12 12 12 12 12 12 12 ...
 $ grdatn     : int  42 39 41 39 42 42 39 40 43 39 ...
 $ marstat    : int  5 7 1 1 1 1 7 4 1 1 ...
 $ age        : int  57 26 43 38 51 50 33 52 41 43 ...
 $ class      : int  4 4 4 4 -1 4 4 5 4 6 ...
 $ region     : int  1 1 2 2 3 3 1 1 4 4 ...
 $ state      : int  14 14 41 41 58 58 16 16 81 81 ...
 $ hours      : int  40 40 46 40 -1 50 -1 42 30 40 ...
 $ mlr        : int  1 1 1 1 6 1 4 1 1 1 ...
 $ natvty     : int  57 57 110 57 57 57 57 57 57 57 ...
 $ msafp      : int  49340 49340 0 0 12060 12060 49340 49340 0 0 ...
 $ msastat    : int  2 2 4 4 2 2 2 2 3 3 ...
 $ faminc     : int  15 15 12 12 13 13 7 7 14 14 ...
 $ spneth     : int  -1 2 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ race       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ year       : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ class_t1   : int  4 4 4 4 -1 4 4 5 4 6 ...
 $ mlr_t1     : int  1 1 1 1 6 1 4 1 1 1 ...
 $ wgta       : 

In [2]:
# the read.table() function above can also read in tab delimited text
# there is a function specifically for csv (essentially the same,
# just has sep default set to "," and fill default to TRUE)
kisa2 <- read.csv("../Python/kisa_2015.csv", 
                 header = TRUE)
kisa2[1:5,] # show the first 5 rows

month,grdatn,marstat,age,class,region,state,hours,mlr,natvty,⋯,homeown,hoursu1b,hoursu1b_t1,se15u,se15u_t1,ent015u,ent015ua,vet,wgtat,wgtat1
<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<lgl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>
12,42,5,57,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,269.1724,270.4338
12,39,7,26,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,403.0235,404.9121
12,41,1,43,4,2,41,46,1,110,⋯,,46,40,0,0,0,0,0,402.7901,404.6776
12,39,1,38,4,2,41,40,1,57,⋯,,40,30,0,0,0,0,0,342.9345,344.5415
12,42,1,51,-1,3,58,-1,6,57,⋯,,-1,-1,0,0,0,0,0,560.2244,562.8497


### Reading data from Excel

In [4]:
# import readxl package  -- need to make sure installed 
install.packages("readxl") # install package (only need to do if don't have already)
library(readxl)

# read in data used for PS #4
df <- read_excel("../Matching/radio_merger_data.xlsx",
                            sheet = 1)
df[1:5,]


The downloaded binary packages are in
	/var/folders/b0/wwxd0byd1hx0y_rqrnqhmc0m0000gn/T//RtmpnaXeor/downloaded_packages


year,buyer_id,target_id,buyer_lat,buyer_long,target_lat,target_long,price,hhi_target,num_stations_buyer,population_target,corp_owner_buyer
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2007,1,1,46.59251,-92.54956,44.37507,-92.03954,157763.9,80,3,21676,0
2007,2,2,32.57818,-85.349,33.02537,-86.0597,1472463.2,376,1,11539,0
2007,3,3,30.63987,-88.25445,31.1225,-87.76641,3786333.9,129,1,182265,0
2007,4,4,38.95681,-94.68324,36.19695,-94.00682,473291.7,188,20,203065,0
2007,5,5,41.05408,-73.53622,40.9099,-73.45702,1840579.0,284,0,1493350,0


### Reading Stata data files

In [5]:
# Activate the `haven` library
install.packages("haven") # if not already installed
library(haven)

# Read Stata data file used for PS #3
ps3_data <- read_dta("../Optimization/PS3_data.dta") 
ps3_data[1:5,]

also installing the dependencies ‘forcats’, ‘readr’





The downloaded binary packages are in
	/var/folders/b0/wwxd0byd1hx0y_rqrnqhmc0m0000gn/T//RtmpnaXeor/downloaded_packages


id68,year,intid,relhh,hannhrs,wannhrs,hlabinc,wlabinc,nochild,wrace,⋯,redpregovinc,hsex,wsex,age,wage,hpersno,wpersno,hyrsed,wyrsed,pce
<dbl>,<dbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1967,1,1,1200,2000,,,0,,⋯,5614,1,2,52,46,1,2,8.0,8,0
2,1967,2,1,0,0,,,0,,⋯,0,1,2,56,57,1,2,3.0,3,0
3,1967,3,1,0,0,,,0,,⋯,0,1,2,77,64,1,2,,3,0
4,1967,4,1,1560,0,,,6,1.0,⋯,3280,1,2,45,44,1,2,8.0,5,0
5,1967,5,1,2500,2000,,,3,1.0,⋯,7900,1,2,24,22,1,2,10.0,9,0


Be sure to read more about these functions in there documentation.  They are quite flexible in terms of skipping lines and reformating data that you put into your R data frame.  They can also ready data from URLs (as we did in Python) with some modification.  Additionally, there are functions to read in data in many other formats.  DataCamp provides a good overview of these [here](https://www.datacamp.com/community/tutorials/r-data-import-tutorial#stata).

## Writing data

As with reading in data, there is the ability to write data into multiple formats from R.

###  To CSV

In [6]:
# write the radio mergers data to csv (df was the
# name of the data frame from above)
write.csv(file='test_write.csv', x=df)

### To Excel

In [7]:
install.packages("openxlsx") # if not already installed
library(openxlsx)

# for writing a data.frame or list of data.frames to an xlsx file
write.xlsx(df, "test_write.xlsx")

also installing the dependency ‘zip’





The downloaded binary packages are in
	/var/folders/b0/wwxd0byd1hx0y_rqrnqhmc0m0000gn/T//RtmpnaXeor/downloaded_packages


Note: zip::zip() is deprecated, please use zip::zipr() instead



### To Stata

In [8]:
# Uses the haven package we imported above
write_dta(df, 'test_write.dta', version = 14) # note, haven only supports to Stata 14

Again, be sure to read the documentation about these functions to see the available keyword arguments that will allow you to adjust how the data are saved.

## Saving R-Objects to Disk

Sometimes you want to save R-objects in a way that will preserve all their properties when they are read back in.  E.g., a plot or R-object that represents output from an econometric model.  To do this, you can use the `save()` function and save your objects to .RData or .rda files. 

Note that files saved in this way can only be read in R.

In [9]:
# Saving on object in RData format
save(df, file = "df_out.RData")
# Save multiple objects
save(ps3_data, df, file = "2df_out.RData")
# To load the data again
load("2df_out.RData")

In [10]:
ps3_data[1:5,]

id68,year,intid,relhh,hannhrs,wannhrs,hlabinc,wlabinc,nochild,wrace,⋯,redpregovinc,hsex,wsex,age,wage,hpersno,wpersno,hyrsed,wyrsed,pce
<dbl>,<dbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1967,1,1,1200,2000,,,0,,⋯,5614,1,2,52,46,1,2,8.0,8,0
2,1967,2,1,0,0,,,0,,⋯,0,1,2,56,57,1,2,3.0,3,0
3,1967,3,1,0,0,,,0,,⋯,0,1,2,77,64,1,2,,3,0
4,1967,4,1,1560,0,,,6,1.0,⋯,3280,1,2,45,44,1,2,8.0,5,0
5,1967,5,1,2500,2000,,,3,1.0,⋯,7900,1,2,24,22,1,2,10.0,9,0


In [11]:
df[1:5, ]

year,buyer_id,target_id,buyer_lat,buyer_long,target_lat,target_long,price,hhi_target,num_stations_buyer,population_target,corp_owner_buyer
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2007,1,1,46.59251,-92.54956,44.37507,-92.03954,157763.9,80,3,21676,0
2007,2,2,32.57818,-85.349,33.02537,-86.0597,1472463.2,376,1,11539,0
2007,3,3,30.63987,-88.25445,31.1225,-87.76641,3786333.9,129,1,182265,0
2007,4,4,38.95681,-94.68324,36.19695,-94.00682,473291.7,188,20,203065,0
2007,5,5,41.05408,-73.53622,40.9099,-73.45702,1840579.0,284,0,1493350,0


In [12]:
x <- -3

In [13]:
x <- 3
x

In [14]:
(x<-3)

In [None]:
(x < -3)