# Introduction

In this notes, examples concerning data processing are presented

# Read and Write Data Files

## CSV Files

### Read CSV Files
#### The Syntax
```
read.csv(file, header = TRUE, sep = ",", quote = "\"",
              dec = ".", fill = TRUE, comment.char = "", ...)
```

#### Read a CSV File with Default Options
* `read.csv` returns a data.frame

The comma-separated values (CSV) file used here for demonstration has a content:  
```
"","BPchange","Dose","Run","Treatment","Animal"
"1",0.5,6.25,"C1","Control","R1"
"2",4.5,12.5,"C1","Control","R1"  
"3",10,25,"C1","Control","R1"
"4",26,50,"C1","Control","R1"
"5",37,100,"C1","Control","R1"
"6",32,200,"C1","Control","R1"
```

In [1]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv")

# Print the class of the variable rabbit_sample
print(class(rabbit_sample))

# Printing first few lines of the dataframe
head(rabbit_sample)

[1] "data.frame"


Unnamed: 0_level_0,X,BPchange,Dose,Run,Treatment,Animal
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>
1,1,0.5,6.25,C1,Control,R1
2,2,4.5,12.5,C1,Control,R1
3,3,10.0,25.0,C1,Control,R1
4,4,26.0,50.0,C1,Control,R1
5,5,37.0,100.0,C1,Control,R1
6,6,32.0,200.0,C1,Control,R1


#### Not Assuming the First Row in the CSV File is Labels
* The column labels will be "V1", "V2", etc...

In [2]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv", header = FALSE)

# Printing first few lines of the dataframe
head(rabbit_sample)

Unnamed: 0_level_0,V1,V2,V3,V4,V5,V6
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<chr>
1,,BPchange,Dose,Run,Treatment,Animal
2,1.0,0.5,6.25,C1,Control,R1
3,2.0,4.5,12.5,C1,Control,R1
4,3.0,10,25,C1,Control,R1
5,4.0,26,50,C1,Control,R1
6,5.0,37,100,C1,Control,R1


#### Using Custom Column Names
* The rule is the same as rows.

In [3]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv", col.names = c("A", "B", "C", "D", "E", "F"))

# Printing first few lines of the dataframe
head(rabbit_sample)

Unnamed: 0_level_0,A,B,C,D,E,F
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<chr>,<chr>,<chr>
1,1,0.5,6.25,C1,Control,R1
2,2,4.5,12.5,C1,Control,R1
3,3,10.0,25.0,C1,Control,R1
4,4,26.0,50.0,C1,Control,R1
5,5,37.0,100.0,C1,Control,R1
6,6,32.0,200.0,C1,Control,R1


### Write CSV Files
#### The Syntax
```
write.csv(x, file = "", quote = TRUE, eol = "\n", na = "NA", row.names = TRUE, fileEncoding = "")

```
* A more general implementation is `write.table`. Check `?write.table` for more detail.

#### Simple Use of `write.csv`

In [4]:
# Write the data.frame to "testing.csv"
write.csv(rabbit_sample, "datasets/testing.csv")

The file "testing.csv" contains:  
```
"","A","B","C","D","E","F"
"1",1,0.5,6.25,"C1","Control","R1"
"2",2,4.5,12.5,"C1","Control","R1"
"3",3,10,25,"C1","Control","R1"
"4",4,26,50,"C1","Control","R1"
"5",5,37,100,"C1","Control","R1"
"6",6,32,200,"C1","Control","R1"
```

## XLSX Files

### Read XLSX Files

#### The Syntax

```
read.xlsx(
       file,
       sheetIndex,
       sheetName = NULL,
       rowIndex = NULL,
       startRow = NULL,
       endRow = NULL,
       colIndex = NULL,
       as.data.frame = TRUE,
       header = TRUE,
       colClasses = NA,
       keepFormulas = FALSE,
       encoding = "unknown",
       password = NULL,
       ...
)
```
Here ... are other arguments to ‘data.frame’, for example ‘stringsAsFactors’

#### Read a CSV File with Default Options

* `xlsx::read.xlsx` returns a data.frame
* The xlsx file used for demonstration contains the following data:

![title](img/Fig_02_01.png)

In [5]:
# Loading the xlsx library
library(xlsx)

# Get the iris dataset from iris.xlsx, the second argument is the index of the worksheet in the xlsx file.
iris_table <- xlsx::read.xlsx("datasets/iris.xlsx", 1)

# Print first few lines of the table
head(iris_table)

Unnamed: 0_level_0,NA.,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,1,5.1,3.5,1.4,0.2,setosa
2,2,4.9,3.0,1.4,0.2,setosa
3,3,4.7,3.2,1.3,0.2,setosa
4,4,4.6,3.1,1.5,0.2,setosa
5,5,5.0,3.6,1.4,0.2,setosa
6,6,5.4,3.9,1.7,0.4,setosa


### Write XLSX Files

#### The Syntax

```
write.xlsx(
       x,
       file,
       sheetName = "Sheet1",
       col.names = TRUE,
       row.names = TRUE,
       append = FALSE,
       showNA = TRUE,
       password = NULL
     )
```

#### Write a CSV File with Default Options

In [6]:
# Staff table to export
staff_table = data.frame(
    ID = c(1L, 2L, 3L, 4L),
    Name = c("Tom", "Ann", "Peter", "Kelly"), 
    Phone = c(73490245L, 77990904L, 47876737L, 35146136L)
)

# Write the xlsx file to the file namely staff_table.xlsx
xlsx::write.xlsx(staff_table, "datasets/staff_table.xlsx", append = FALSE)

* The output xlsx file:

![title](img/Fig_02_02.png)

# Database

## MySQL

### Connect to MySQL Server

To connection to MySQL servers, we need to include two libraries:

In [7]:
# Include libraries for MySQL connection
library(DBI)
library(RMySQL)

Then we connect to database namely "classicmodels" on the MySQL server at 127.0.0.1 using function DBI::dbConnect:

In [8]:
con <- DBI::dbConnect(RMySQL::MySQL(), 
                      dbname="classicmodels", 
                      host="127.0.0.1", 
                      user="alan", 
                      password="password")

Now the connection pipe is stored in object `con`. To list tables, we could use `DBI::dbListTables`.

In [9]:
DBI::dbListTables(conn = con)

### Reading Tables by SQL Commands

#### Using `DBI::dbGetQuery`

In [10]:
select_result <- DBI::dbGetQuery(conn = con, statement = "
    select customerNumber,customerName,phone from customers;
")

cat("\nThe type of the output object:", class(select_result) ,". \n")

head(select_result) 


The type of the output object: data.frame . 


Unnamed: 0_level_0,customerNumber,customerName,phone
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,103,Atelier graphique,40.32.2555
2,112,Signal Gift Stores,7025551838
3,114,"Australian Collectors, Co.",03 9520 4555
4,119,La Rochelle Gifts,40.67.8555
5,121,Baane Mini Imports,07-98 9555
6,124,Mini Gifts Distributors Ltd.,4155551450


#### Using `DBI::dbSendQuery` and `DBI::bFetch`

In [11]:
select_result_raw <- DBI::dbSendQuery(conn = con, statement = "
    select customerNumber,customerName,state from customers;
")

select_result <- DBI::dbFetch(select_result_raw)

cat("\nThe type of the output object:", class(select_result) ,". \n")

head(select_result) 


The type of the output object: data.frame . 


Unnamed: 0_level_0,customerNumber,customerName,state
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,103,Atelier graphique,
2,112,Signal Gift Stores,NV
3,114,"Australian Collectors, Co.",Victoria
4,119,La Rochelle Gifts,
5,121,Baane Mini Imports,
6,124,Mini Gifts Distributors Ltd.,CA


## SQLite

# Packages for Data Handling