# PART 04 - Handling CSV and XLSX and dplyr

In this part, you will learn:
* Handling CSV and XLSX files
* Managing data.frame with dplyr


## CSV files

read.table(file, ...)

Params: 
- header = TRUE  / FALSE
- sep = ”,”
- row.names=”...”
- stringsAsFactors=FALSE

There are also aliases dedicated for CSV files, e.g., write.csv, however, they will freeze some of the parameters (e.g., the separator).

In [None]:
code_data <- read.table("example.csv", header=TRUE, sep=";", stringsAsFactors=FALSE)
code_data
code_data$CodeQuality <- factor(code_data$CodeQuality, 
                                levels=c("low", "medium", "high"),
                                ordered=T)
str(code_data)

In [None]:
write.table(code_data, "example.csv", sep=";", row.names = FALSE, quote=FALSE)

## XLSX files

In [None]:
require(readxl)

In [None]:
read_excel("example.xlsx", sheet=1)

## dplyr

dplyr package – operations on data.frames

Install dplyr

In [None]:
require(dplyr)

Chaining commands %.% or %>% (ver. dep.)

Filter – filters data by value (and = &, or = |)

In [None]:
code_data %>% filter(CodeQuality=="medium" |  LOC > 10)

Arrange - sorting rows (desc to revert order)

In [None]:
code_data %>% arrange(CodeQuality)

In [None]:
code_data %>% arrange(desc(CodeQuality))

Select – selects columns

In [None]:
code_data[,c("Package", "Class", "CodeQuality")]

In [None]:
code_data %>% select(Package, Class, CodeQuality)

Mutate – calculate new values 

In [None]:
data.frame(code_data, KLOC=code_data$LOC / 1000)

In [None]:
code_data %>% mutate(KLOC=LOC/1000)

group_by – groups values (adds meta-data)

In [None]:
code_data %>% group_by(Package)

Summarise – aggregates after grouped

In [None]:
code_data %>% group_by(Package) %>% summarise(meanLOC=mean(LOC), N=n())