<h2>Table_of_Contents</h2>

- [R Input and Output](#R_Input_and_Output)
    - [Load the data](#Load_the_data)
        - [Load the txt file](#Load_the_txt_file)
        - [Load the csv file](#Load_the_csv_file)
        - [Load the excel file](#Load_the_excel_file)
        - [View the data structure](#View_the_data_structure)
    - [Export the data](#Export_the_data)
        - [Export the txt file](#Export_the_txt_file)
        - [Export the csv file](#Export_the_csv_file)
        - [Export the excel file](#Export_the_excel_file)
        - [Export the data as R](#Export_the_data_as_R)
- [References](#References)

<h2>R_Input_and_Output</h2>

<h3>Load_the_data</h3>

<li>Read.table: load the txt file # 0201.grade, separated by tab</li>
<li>Read.csv: load the csv file # 0201.grade.csv, separated by ","</li>
<li>Read_excel: load the excel file # 0201.grade.xlsx, prerequisite adding library</li>
<li>In addition, files such as SPSS and JSON can be connected</li>

In [None]:
r_data_input_output_dataset = "
    example/0201.grade.txt,
    example/0202.grade.csv,
    example/0203.grade.xlsx
"

<h4>Load_the_txt_file</h4>

In [None]:
# Load the data (read.table)
# header = FALSE, (When there is no variable name)"
# sep (separtor): "," or " " or ":" or "\t" (tab)
# stringAsFactor = recognize text data as a factor (True)
# na.strings = "", ".", "NA" etc.
# str() : view attributes

gradetxt <- read.table("example/0201.grade.txt",
                       header=FALSE,
                       sep = "\t",
                       stringsAsFactor = FALSE,
                       na.strings = ""
                      )
str(gradetxt)

In [None]:
print(gradetxt)

[Back to the top](#Table_of_Contents)

<h4>Load_the_csv_file</h4>
read.data vs fread vs read_csv: which is faster?? <br>
In the past I have used the read.data function to read in data, but now there are faster options that work much better if you’re reading in huge data: fread (data.table package) and read_csv (readr library).

In [None]:
#install.packages("data.table") # Only run this if you've never installed it
#install.packages("readr")
library(data.table)
library(readr)

In [None]:
# This is simply the demo from the R help for fread, but it shows how much faster fread is!
# Make up a data set (I haven't covered data.table, but it is like data.frame)
n=1e6
DT = data.table( a=sample(1:1000,n,replace=TRUE),
                 b=sample(1:1000,n,replace=TRUE),
                 c=rnorm(n),
                 d=sample(c("foo","bar","baz","qux","quux"),n,replace=TRUE),
                 e=rnorm(n),
                 f=sample(1:1000,n,replace=TRUE) )
# Save it and get the size info
write.table(DT,"test.csv",sep=",",row.names=FALSE,quote=FALSE)
cat("File size (MB):", round(file.info("test.csv")$size/1024^2),"\n")

#Here is the read.table timing.  The bit in the system.time() is what 
# you'd typically use to read it in.
# header=TRUE lets it know there's a header (column names)
# sep = "," is the delimiter
# quote="" disables quoting
# stringsAsFactors=FALSE is really handy, since you often
#     don't want the strings to automatically be made into factors

# note, the "=" notation won't work since the assignment is within another function
# so <- is used instead.
system.time(DF2 <- read.table("test.csv",header=TRUE,sep=",",stringsAsFactors=FALSE))
class(DF2)

# read_csv from readr
system.time(DF3 <- read_csv("test.csv"))

In [None]:
class(DF3)

# fread from data.table
system.time(DT <- fread("test.csv"))
class(DT)

It used to be the case the fread was only a data.table format, which made plotting in ggplot2 difficult, since that requires a data.frame. Guess they’ve changed it! I would use either fread or read_csv, although read_csv will only work for comma delimited files.

One thing you must be careful about is that data.tables behave differently than regular data.frames. Specifically, data.table uses pass-by-reference, which improves performance but has odd behavior. For example, if I create a copy of DT2, it is still linked to DT and if I change a column heading in DT2 it will change it in DT as well!

In [None]:
DT2 = DT
names(DT2)

# change a to a2
names(DT2)[1] = "a2"
names(DT2)

# Now look at DT
names(DT)
#!!

# If I add to DT2, what happens?
DT2$g = DT2$f^2
names(DT2)
names(DT)
# Nope, only in the new one

# What if I manipulate the values in a column of DT2?
# see what we're starting with
DT2$b[1:5]
DT2$b = DT2$b^2
# See the change
DT2$b[1:5]
# Did it alter the original?
DT$b[1:5]

#  What if I change something in DT, is it automatically changed in DT2?
# See what we start with
DT$c[1:10]
DT$c = DT$c^2
# See the change in DT
DT$c[1:10]
# How's DT2?
DT2$c[1:10]
# unchanged

In [None]:
# system() allows us to run a command in the Linux environment
# system("rm test.csv")

In [None]:
# Load the data (read.csv)
# No separator due to separted by ","
# stringsAsFactor = It doesn't matter if you don't specify

gradecsv <- read.csv("example/0202.grade.csv",
                     header = TRUE, # T
                     na.strings = "."
                    )
str(gradecsv)

In [None]:
print(gradecsv)

[Back to the top](#Table_of_Contents)

<h4>Load_the_excel_file</h4>

In [None]:
# Load the data (read_excel)
install.packages('readxl') # install the packages
library(readxl) # import the library
gradexls <- read_excel("example/0203.grade.xlsx",
                       sheet = "grade",
                       col_names = TRUE,
                       na = "NA"
                      )
str(gradexls)

In [None]:
gradexls

[Back to the top](#Table_of_Contents)

<h4>View_the_data_structure</h4>

In [None]:
# View the data structure
str(gradexls) # summary of the data structure
dim(gradexls) # data row number and whole data number
summary(gradexls) # summary of data
summary(gradexls$msex) # specific variable summary

[Back to the top](#Table_of_Contents)

<h3>Export_the_data</h3>

In [None]:
# Export the data (write.txt)
str(gradetxt)

write.table(gradetxt,
            file = "example/output/gradetxt.txt",
            row.names = FALSE,
            na = "",
            col.names = FALSE,
            sep = ","
           )

In [None]:
# Reload the data (read.table)
gradetxt1 <- read.table("example/output/gradetxt.txt",
                        header = FALSE,
                        sep = ",",
                        stringsAsFactor = FALSE,
                        na.strings = ""
                       )
# V3 converts from number to character type and saves
str(gradetxt)
str(gradetxt1)

[Back to the top](#Table_of_Contents)

<h4>Export_the_csv_file</h4>

In [None]:
# Export the data (write.csv)
write.csv(gradecsv,
          file = "example/output/gradecsv.csv",
          row.names=FALSE,
          na=""
         )

[Back to the top](#Table_of_Contents)

<h4>Export_the_excel_file</h4>

In [None]:
# Export the data (write_xlsx)
str(gradexls)

In [None]:
install.packages("writexl")
library(writexl)
write_xlsx(gradexls,
           path = "example/output/gradexls.xlsx"
          )

[Back to the top](#Table_of_Contents)

<h4>Export_the_data_as_R</h4>

In [None]:
# Export the data as R data
save(gradetxt, file="example/output/grade.RData")

In [None]:
load(file="example/output/grade.RData")

[Back to the top](#Table_of_Contents)

<h2>References</h2>

Intoduction of the data input and output in R, please refer to the link below: <br>
[Ch02_05.R 데이터 처리(txt데이터가져오기)05](https://youtu.be/NXxfNWYUxyg) <br>
[Ch02_06.R 데이터 처리(csv데이터가져오기)06](https://youtu.be/RH82mghumAg) <br>
[Ch02_07.R 데이터 처리(excel데이터가져오기)07](https://youtu.be/WAsnbhQOiQ4) <br>
[Ch03_01.R 기술통계분석(범주형)(작업환경설정)01](https://youtu.be/fXjy0w-QEgA) <br>
[Ch03_02.R 기술통계분석(범주형)(데이터 내보내기 1/2)02](https://youtu.be/8a-0cw07Iao) <br>
[Ch03_03.R 기술통계분석(범주형)(데이터 내보내기 2/2)03](https://youtu.be/sZt1rfiyHS8) <br>
[Reading in data](https://jeanettemumford.org/R-tutorial/09-reading-in-data/) <br>
[Extra bits we probably won't have time to cover](https://jeanettemumford.org/R-tutorial/10-extras-dplyr-tidyr-fmri/)