# Introduction

In this notes, examples concerning data processing are presented

# Read and Write Data Files

## CSV Files

### Read CSV Files
#### The Syntax
```
read.csv(file, header = TRUE, sep = ",", quote = "\"",
              dec = ".", fill = TRUE, comment.char = "", ...)
```

#### Read a CSV File with Default Options
* `read.csv` returns a data.frame

The comma-separated values (CSV) file used here for demonstration has a content:  
```
"","BPchange","Dose","Run","Treatment","Animal"
"1",0.5,6.25,"C1","Control","R1"
"2",4.5,12.5,"C1","Control","R1"  
"3",10,25,"C1","Control","R1"
"4",26,50,"C1","Control","R1"
"5",37,100,"C1","Control","R1"
"6",32,200,"C1","Control","R1"
```

In [1]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv")

# Print the class of the variable rabbit_sample
print(class(rabbit_sample))

# Printing first few lines of the dataframe
head(rabbit_sample)

[1] "data.frame"


Unnamed: 0_level_0,X,BPchange,Dose,Run,Treatment,Animal
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>
1,1,0.5,6.25,C1,Control,R1
2,2,4.5,12.5,C1,Control,R1
3,3,10.0,25.0,C1,Control,R1
4,4,26.0,50.0,C1,Control,R1
5,5,37.0,100.0,C1,Control,R1
6,6,32.0,200.0,C1,Control,R1


#### Not Assuming the First Row in the CSV File is Labels
* The column labels will be "V1", "V2", etc...

In [2]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv", header = FALSE)

# Printing first few lines of the dataframe
head(rabbit_sample)

Unnamed: 0_level_0,V1,V2,V3,V4,V5,V6
Unnamed: 0_level_1,<int>,<fct>,<fct>,<fct>,<fct>,<fct>
1,,BPchange,Dose,Run,Treatment,Animal
2,1.0,0.5,6.25,C1,Control,R1
3,2.0,4.5,12.5,C1,Control,R1
4,3.0,10,25,C1,Control,R1
5,4.0,26,50,C1,Control,R1
6,5.0,37,100,C1,Control,R1


#### Using Custom Column Names
* The rule is the same as rows.

In [3]:
# Getting data from file "rabbit.csv"
rabbit_sample <- read.csv("datasets/rabbit.csv", col.names = c("A", "B", "C", "D", "E", "F"))

# Printing first few lines of the dataframe
head(rabbit_sample)

Unnamed: 0_level_0,A,B,C,D,E,F
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>
1,1,0.5,6.25,C1,Control,R1
2,2,4.5,12.5,C1,Control,R1
3,3,10.0,25.0,C1,Control,R1
4,4,26.0,50.0,C1,Control,R1
5,5,37.0,100.0,C1,Control,R1
6,6,32.0,200.0,C1,Control,R1


### Write CSV Files
#### The Syntax
```
write.csv(x, file = "", quote = TRUE, eol = "\n", 
          na = "NA", row.names = TRUE, fileEncoding = "")

```
* A more general implementation is `write.table`. Check `?write.table` for more detail.

#### Simple Use of `write.csv`

In [4]:
# Write the data.frame to "testing.csv"
write.csv(rabbit_sample, "datasets/testing.csv")

The file "testing.csv" contains:  
```
"","A","B","C","D","E","F"
"1",1,0.5,6.25,"C1","Control","R1"
"2",2,4.5,12.5,"C1","Control","R1"
"3",3,10,25,"C1","Control","R1"
"4",4,26,50,"C1","Control","R1"
"5",5,37,100,"C1","Control","R1"
"6",6,32,200,"C1","Control","R1"
```

## XLSX Files

### Read XLSX Files

#### The Syntax

```
read.xlsx(
       file,
       sheetIndex,
       sheetName = NULL,
       rowIndex = NULL,
       startRow = NULL,
       endRow = NULL,
       colIndex = NULL,
       as.data.frame = TRUE,
       header = TRUE,
       colClasses = NA,
       keepFormulas = FALSE,
       encoding = "unknown",
       password = NULL,
       ...
)
```
Here ... are other arguments to ‘data.frame’, for example ‘stringsAsFactors’

#### Read a CSV File with Default Options

* `xlsx::read.xlsx` returns a data.frame
* The xlsx file used for demonstration contains the following data:

![title](img/Fig_02_01.png)

In [5]:
# Loading the xlsx library
library(xlsx)

# Get the iris dataset from iris.xlsx, the second argument is the index of the worksheet in the xlsx file.
iris_table <- xlsx::read.xlsx("datasets/iris.xlsx", 1)

# Print first few lines of the table
head(iris_table)

Unnamed: 0_level_0,NA.,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,1,5.1,3.5,1.4,0.2,setosa
2,2,4.9,3.0,1.4,0.2,setosa
3,3,4.7,3.2,1.3,0.2,setosa
4,4,4.6,3.1,1.5,0.2,setosa
5,5,5.0,3.6,1.4,0.2,setosa
6,6,5.4,3.9,1.7,0.4,setosa


### Write XLSX Files

#### The Syntax

```
write.xlsx(
       x,
       file,
       sheetName = "Sheet1",
       col.names = TRUE,
       row.names = TRUE,
       append = FALSE,
       showNA = TRUE,
       password = NULL
     )
```

#### Write a CSV File with Default Options

In [6]:
# Staff table to export
staff_table = data.frame(
    ID = c(1L, 2L, 3L, 4L),
    Name = c("Tom", "Ann", "Peter", "Kelly"), 
    Phone = c(73490245L, 77990904L, 47876737L, 35146136L)
)

# Write the xlsx file to the file namely staff_table.xlsx
xlsx::write.xlsx(staff_table, "datasets/staff_table.xlsx", append = FALSE)

* The output xlsx file:

![title](img/Fig_02_02.png)

# Database

## MySQL

### Connect to MySQL Server

To connection to MySQL servers, we need to include two libraries:

In [7]:
# Include libraries for MySQL connection
library(DBI)
library(RMySQL)

Then we connect to database namely "classicmodels" on the MySQL server at 127.0.0.1 using function DBI::dbConnect:

In [8]:
# Create a connection object and store it in "con"
con <- DBI::dbConnect(RMySQL::MySQL(),          # The driver to communicate with the server
                      dbname="classicmodels",   # The name of the database to access on the server
                      host="127.0.0.1",         # The ip / URL / hostname of the server
                      user="alan",              # user name to login
                      password="password")      # password for the user ID

Now the connection pipe is stored in object `con`. To list tables, we could use `DBI::dbListTables`.

In [9]:
# Get the list of table in the database
DBI::dbListTables(conn = con)

### Reading Tables Through the DBI Interface

#### Using `DBI::dbGetQuery`

* Syntax:
```
dbGetQuery(conn, statement, ...)
```
* `DBI::dbGetQuery` returns a data.frame.

In [10]:
# Get the data.frame from the database based on the SQL statement
select_result <- DBI::dbGetQuery(conn = con, statement = "
    select customerNumber,customerName,phone from customers;
")

# Print out the class of the object select_result
cat("\nThe type of the output object:", class(select_result) ,". \n")

# Print first few lines of the object select_result
head(select_result) 


The type of the output object: data.frame . 


Unnamed: 0_level_0,customerNumber,customerName,phone
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,103,Atelier graphique,40.32.2555
2,112,Signal Gift Stores,7025551838
3,114,"Australian Collectors, Co.",03 9520 4555
4,119,La Rochelle Gifts,40.67.8555
5,121,Baane Mini Imports,07-98 9555
6,124,Mini Gifts Distributors Ltd.,4155551450


#### Using `DBI::dbSendQuery` and `DBI::dbFetch`

* Syntax:
```
dbSendQuery(conn, statement, ...)
```
* `DBI::dbSendQuery` returns a S4 object. The S4 object can be translate to data.frame by `DBI::dbFetch`.
* The syntax of `DBI::dbFetch` :
  ```
  dbFetch(res, n = -1, ...)
  ```
    - Here $n$ is the number of records to retrieve.

In [11]:
# Get the S4 object from the database based on the SQL statement
select_result_raw <- DBI::dbSendQuery(conn = con, statement = "
    select customerNumber,customerName,state from customers;
")

# Translate the S4 object into data.frame
select_result <- DBI::dbFetch(select_result_raw)

# Print the class of the object select_result
cat("\nThe type of the output object:", class(select_result) ,". \n")

# Print first few lines of the object select_result
head(select_result) 


The type of the output object: data.frame . 


Unnamed: 0_level_0,customerNumber,customerName,state
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,103,Atelier graphique,
2,112,Signal Gift Stores,NV
3,114,"Australian Collectors, Co.",Victoria
4,119,La Rochelle Gifts,
5,121,Baane Mini Imports,
6,124,Mini Gifts Distributors Ltd.,CA


#### Getting the Whole Table

* If we want to retrieve the whole table, `DBI::dbReadTable` will be a shorter command.

In [12]:
# Storing the Whole Table
whole_table <- DBI::dbReadTable(con, "offices")

# Show the first few lines of the data.frame whole_table
print(whole_table[1:3,])

  officeCode          city           phone         addressLine1 addressLine2
1          1 San Francisco +1 650 219 4782    100 Market Street    Suite 300
2          2        Boston +1 215 837 0825     1550 Court Place    Suite 102
3          3           NYC +1 212 555 3000 523 East 53rd Street      apt. 5A
  state country postalCode territory
1    CA     USA      94080        NA
2    MA     USA      02107        NA
3    NY     USA      10022        NA


### Adding an Entry to a Table

* There are two routine to add entries to tables. But I found only `DBI::dbWriteTable` is working in the current scenario.
* `DBI::dbWriteTable` has a syntax:
  ```
  dbWriteTable(conn, name, value, ...)
  ```
  - `...` includes:
    1. ‘row.names’ (default: ‘FALSE’)
    2. ‘overwrite’ (default: ‘FALSE’)
    3. ‘append’ (default: ‘FALSE’)
    4. ‘field.types’ (default: ‘NULL’)
    5. ‘temporary’ (default: ‘FALSE’)

In [13]:
# Create a data.frame for new entries
new_entry = data.frame(
    customerNumber = c(1001L,1002L),
    customerName = c("Tom", "Mary"),
    state = c("NA", "NY"),
    phone = c(173173173, 246246246)
)

# Show the content of the new entries
print(new_entry)

# Appending new rows in the table namely customers
DBI::dbWriteTable(conn = con, "customers", new_entry, append=TRUE, row.names=FALSE)

# Print out the new entries to show their existence
select_result <- DBI::dbGetQuery(conn = con, statement = "
    select customerNumber,customerName,state,phone from customers where customerNumber > 1000;
")

# Print first few lines of the object select_result
head(select_result) 

  customerNumber customerName state     phone
1           1001          Tom    NA 173173173
2           1002         Mary    NY 246246246


Unnamed: 0_level_0,customerNumber,customerName,state,phone
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>
1,1001,Tom,,173173173
2,1002,Mary,NY,246246246


### Deleting an Entry from a Table
* DBI seems not include a routine to delete an entry from tables. However SQL statement is still a working option.

In [14]:
# Delete the entries by a SQL comment
result <- DBI::dbSendStatement(con, "delete from customers where customernumber > 1000 ;")

# Attempted to select new records to show the deletion
select_result <- DBI::dbGetQuery(conn = con, statement = "
    select customerNumber,customerName,state,phone from customers where customerNumber > 1000;
")

# Print first few lines of the object select_result
head(select_result) 

customerNumber,customerName,state,phone
<int>,<chr>,<chr>,<chr>


### Disconnect the Databse

In [15]:
# The connection stored in con will be disconnected
DBI::dbDisconnect(con)

## SQLite

### Open a SQLite File

* Like MySQL, this operation requires DBI library, while RSQLite is the driver pacakge to enable the connection.

In [16]:
# Loading the required libraries
library(DBI, RSQLite)

* SQLite is server-less. The database is stored in a database file. Once the database file is connected, we may use it as if a SQL server.
* Like the MySQL example, we used `DBI::dbConnect` to open the sqlite file and store the connection object in `con`.
* There is an importent option called `flags`. This option controls the mode of database file opening.
  - If ``flags=RSQLite::SQLITE_RWC `` implies the database file is readable, writable, and creatable (if it does not exist).
  - If ``flags=RSQLite::SQLITE_RO `` implies the database file will be read-only in the follow operation.
  - ``flags=RSQLite::SQLITE_RWC `` is the default.

In [17]:
# Open the sqlite file and store the connection object in con
con <- DBI::dbConnect(RSQLite::SQLite(), "datasets/patient_record.sqlite", flags=RSQLite::SQLITE_RWC)

* Here we may query the list of tables inside the database.

In [18]:
# Getting the list of table
DBI::dbListTables(con)

### Reading Tables Through DBI Functions
* Like the examples in the MySQL section, DBI functions are workable in SQLite.

#### Using `DBI::dbGetQuery`

In [19]:
# Get the PatientRecord from the database
patient_record <- DBI::dbGetQuery(con, "select * from PatientRecord;")

# Print first few lines of the object patient_record
head(patient_record)

Unnamed: 0_level_0,Name,StartDate,EndDate,Hospital,Ward
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
1,Chantelle,2000-04-16,2000-04-21,RH,8C
2,Silva,2000-05-07,2000-05-16,TSKH,5A
3,Maybelle,2000-06-10,2000-06-13,WCHH,5A
4,Wilhemina,2000-06-12,2000-06-18,WCHH,5A
5,Alleen,2000-07-07,2000-07-17,SJH,6A
6,Natalia,2000-07-25,2000-08-02,PYNEH,8B


#### Using DBI::dbSendQuery and DBI::dbFetch

In [20]:
# Get the PatientRecord from the database
patient_record_raw <- DBI::dbSendQuery(con, "select * from PatientRecord;")

# Fetch the raw data to data.frame 
patient_record <- DBI::dbFetch(patient_record_raw)

# Print first few lines of the object patient_record
head(patient_record)

Unnamed: 0_level_0,Name,StartDate,EndDate,Hospital,Ward
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
1,Chantelle,2000-04-16,2000-04-21,RH,8C
2,Silva,2000-05-07,2000-05-16,TSKH,5A
3,Maybelle,2000-06-10,2000-06-13,WCHH,5A
4,Wilhemina,2000-06-12,2000-06-18,WCHH,5A
5,Alleen,2000-07-07,2000-07-17,SJH,6A
6,Natalia,2000-07-25,2000-08-02,PYNEH,8B


# Data Manipulation

## Filtering

## Sorting

## Column Shifting

## Table Joining