## Data Wrangling

### Matthew Ariel Wangsit  
### s3859392

Learning objective:

    - Learn R
    - import data to R
    - export to tabular/spreadsheer
    - save R objects
    
    
        

## Reading/Importing Data

Before importing data to R, we need to install package/library because some library is faster than the case R functions.


In [None]:
#install.packages("readr")
library(readr)

#Base R functions

read.table()
read.csv()
read.delim()

# asume iris.csv is in the same folder

iris1 <- read.csv("iris.csv")


#to view the content we can use View(iris1)
# better use head() because view() display all the content
# view will not print the ouput in a report

View(iris1)

#better, the head() function display the top part of the data
head(iris1)

#display only 15 rows
head(iris1, n=15)

# tail() is simliar to head(), it display the bottom portion
tail(iris1)


# set the working directory to "/home/user"
setwd("/home/user")

# print out the current working directory
getwd()



In [None]:
# reading csv
iris <- read.csv("iris.csv")

# display the structure of an R object using str()
str(iris)

## 'data.frame':    150 obs. of  6 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...


iris <- read.csv("iris.csv", stringsAsFactors = TRUE)

str(iris4)


## 'data.frame':    150 obs. of  6 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


# provides same results as read.csv above

iris5 <- read.table("iris.csv", sep=",", header = TRUE)



In [None]:
#install.packages("readr")
library(readr)


# readr package functions
# same functionality as read.csv()
iris9 <- read_csv("iris.csv")


##  When we use read_csv function, 
##  “Parsed with column specification” information will be reported.

## -- Column specification --------------------------------------------------------
## cols(
##   X1 = col_double(),
##   Sepal.Length = col_double(),
##   Sepal.Width = col_double(),
##   Petal.Length = col_double(),
##   Petal.Width = col_double(),
##   Species = col_character()
## )




# reading data from excel files
# list of packages
xlsx, openxlsx, readxl


# If the dataset is stored in the .xls or .xlsx format
read.xlsx()

# warning !!!
# However, Java and macOS (Version ≥ 10.12) 
# do not play well due to some legacy issues.




#install.packages("openxlsx")
library(openxlsx)


# read in xlsx worksheet via openxlsx package using sheet name

iris10<- read.xlsx("iris.xlsx", sheet = "iris")


# read in xlsx worksheet starting from the third row

iris11<- read.xlsx("iris.xlsx", sheet = "iris", startRow = 3)



# Unlike xlsx package, the readxl package has no external dependencies 
# (like Java or Perl), so you can use it to read Excel data 
# on any platform. Moreover, readxl has the ability to load dates 
# and times, it automatically drops blank columns, 
# reads in character variables as characters, 
# and returns outputs as data.table format which is more convenient 
# for viewing large data sets.




In [None]:
# The foreign package provides functions that help you 
# read data files from other statistical software such as 
# SPSS, SAS, Stata, and others into R.


#install.packages("foreign")
library(foreign)

# read in spss data file and store it as data frame 

iris_spss <- read.spss("iris.sav", to.data.frame = TRUE)

# Note that we set the to.data.frame = TRUE option in order to 
# have a data frame format, otherwise, 
# the defaults (to.data.frame = FALSE) 
# will read in the data as a list.



# import csv from web

url <- "https://data.gov.au/dataset/29128ebd-dbaa-4ff5-8b86-d9f30de56452/resource/cf663ed1-0c5e-497f-aea9-e74bfda9cf44/download/otptimeseriesweb.csv"

# use read.csv to import

ontime_data <- read.csv(url)

# display first six rows and four variables in the data

ontime_data[1:6,1:4]


##                 Route Departing_Port Arriving_Port      Airline
## 1   Adelaide-Brisbane       Adelaide      Brisbane All Airlines
## 2   Adelaide-Canberra       Adelaide      Canberra All Airlines
## 3 Adelaide-Gold Coast       Adelaide    Gold Coast All Airlines
## 4  Adelaide-Melbourne       Adelaide     Melbourne All Airlines
## 5      Adelaide-Perth       Adelaide         Perth All Airlines
## 6     Adelaide-Sydney       Adelaide        Sydney All Airlines






Exporting Data to text files
Similar to the previous examples provided in the importing text files section, in this section I will introduce the base R and readr package functions to export data to text files.

Base R functions
write.table() is the multi-purpose function in base R for exporting data. The function write.csv() is a special case of write.table() in which the defaults have been adjusted for efficiency. To illustrate, let’s create a data frame and export it to a CSV file in our working directory.