### <center> Data Input and Output with R </center> 

#### CSV Input and Output 

* CSV stands for Comma Seperated Values and is one of the most common ways we will be working with data throughout this course. 
* The basic format of a csv file is the first line indicating the column names, and the rest lines/rows being data points seperated by commas. 
* One of the most basic ways to read in CSV files in R is to use **`read.csv()`** function which is build into R. 
* Later one, we'll also learn about **`fread()`**, which is a bit fasfter and more convenient. 
* When using **`read.csv()`**, you'll need to either pass the entire path of the file, or have the file in the same directory as your R script. 
* Make sure to account for possible **spaces in the file path** name, **you may need to use backslashes to account for this.**
* This is often a point of confusion for people new to programming, so make sure you understand the above before continuing!

**CSV Input**

In [15]:
install.packages("data.table")
library("data.table")

package 'data.table' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\foufo\AppData\Local\Temp\RtmpiApISr\downloaded_packages


"package 'data.table' was built under R version 3.6.3"

In [4]:
df <- read.csv(file="example.csv")
head(df)

Name,Orders,Date
John,12,12/05/2016
Charlie,11,12/06/2016
Matilda,10,12/07/2016


In [2]:
str(df)

'data.frame':	3 obs. of  3 variables:
 $ Name  : Factor w/ 3 levels "Charlie","John",..: 2 1 3
 $ Orders: int  12 11 10
 $ Date  : Factor w/ 3 levels "12/05/2016","12/06/2016",..: 1 2 3


In [3]:
colnames(df)

* The **`read.table()`** function is the general form of **`read.csv`**. 
* In fact, **`read.csv()`** is actually just a thin wrapper around **`read.table()`**, which just makes it easier to use sometimes. 

In [10]:
df2 <- read.table("example.csv") # Incorrect!
df2

V1
"Name,Orders,Date"
"John,12,12/05/2016"
"Charlie,11,12/06/2016"
"Matilda,10,12/07/2016"


* Note that we got an error here: 
    * The three columns were not seperated
    * The columns names were not set as the headers 

In [11]:
df2 <- read.table(file="example.csv", sep=",", header=TRUE)
df2

Name,Orders,Date
John,12,12/05/2016
Charlie,11,12/06/2016
Matilda,10,12/07/2016


* **`fread()`** is similar but faster and more convenient. 

In [17]:
df = fread('example.csv')
df

Name,Orders,Date
John,12,12/05/2016
Charlie,11,12/06/2016
Matilda,10,12/07/2016


**CSV Output**

In [20]:
write.csv(x=df, file="foo.csv")

In [21]:
fread("foo.csv")

V1,Name,Orders,Date
1,John,12,12/05/2016
2,Charlie,11,12/06/2016
3,Matilda,10,12/07/2016


In [22]:
write.csv(x=df, file="foo.csv", row.names=FALSE) # write without row names

In [23]:
fread("foo.csv")

Name,Orders,Date
John,12,12/05/2016
Charlie,11,12/06/2016
Matilda,10,12/07/2016


#### Excel Input and Output

**Excel Input** 

* R has the ability to read and write to Excel, which makes it very convenient to work on the same datasets as business analysts or colleagues who only know Excel. 
* To do this you need the **`readxl`** package for R.

In [51]:
install.packages("readxl") # Install the readxl package

ERROR: Error in install.packages(packages = "readxl"): no packages were specified


In [28]:
library(readxl) # Load package

In [29]:
excel_sheets("Sample-Sales-Data.xlsx") # List workbook worksheets

In [30]:
df <- read_excel(path="Sample-Sales-Data.xlsx", sheet="Sheet1")

In [31]:
head(df)

Postcode,Sales_Rep_ID,Sales_Rep_Name,Year,Value
2121,456,Jane,2011,84219.5
2092,789,Ashish,2012,28322.19
2128,456,Jane,2013,81879.0
2073,123,John,2011,44491.14
2134,789,Ashish,2012,71837.72
2162,123,John,2013,64531.55


* If you had multiple sheets that you wanted to import into a **`list`**, you could do this with **`lapply()`** function which applies a given function to every element of a list and obtains a **`list`** as a result.

In [52]:
entire_workbook <- lapply(X=excel_sheets("Sample-Sales-Data.xlsx"), 
                          FUN=read_excel,
                          path="Sample-Sales-Data.xlsx")

**Excel Output**

In [41]:
install.packages("writexl")

package 'writexl' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\foufo\AppData\Local\Temp\RtmpiApISr\downloaded_packages


In [42]:
library(writexl)

"package 'writexl' was built under R version 3.6.3"

In [50]:
df <- data.frame(matrix(1:10))
head(df)

matrix.1.10.
1
2
3
4
5
6


In [47]:
write_xlsx(x=df, path="output.xlsx", col_names=TRUE)

In [48]:
read_excel("output.xlsx")

matrix.1.10.
1
2
3
4
5
6
7
8
9
10
