/
CpGoe_append_headers_join_files_QC.Rmd
114 lines (90 loc) · 2.4 KB
/
CpGoe_append_headers_join_files_QC.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Compare CpG oe header and join outputs"
output:
html_document:
df_print: paged
code_folding: hide
theme: cerulean
highlight: tango
toc: true
toc_depth: 4
toc_float: true
word_document: default
editor_options:
chunk_output_type: inline
---
## Load libraries
```{r}
library(arsenal)
```
## Read in data
```{r}
new_output <- read.csv("/Volumes/web/metacarcinus/Cvirginica/FROGER/20190225_cpg_oe/ID_CpG_labelled_all", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
sam_output <- read.csv("/Volumes/web/metacarcinus/Cvirginica/FROGER/20190225_cpg_oe/ID_CpG_labelled_all.tab", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
```
## preview data frames
```{r}
head(new_output)
head(sam_output)
```
## Show differences between data frames
```{r}
summary(comparedf(new_output, sam_output, by = "ID"))
```
## confirm data frames are not identical
```{r}
identical(new_output, sam_output)
```
## reorder columns so they are alphabetical
```{r}
colnames(new_output)
```
```{r}
new_order <- data.frame(new_output[,-1])
new_order <- new_order[,order(names(new_order))]
new_output <- cbind(new_output[,"ID"], new_order)
colnames(new_output)[1] <- "ID"
```
```{r}
sam_order <- data.frame(sam_output[,-1])
sam_order <- sam_order[,order(names(sam_order))]
sam_output <- cbind(sam_output[,"ID"], sam_order)
colnames(sam_output)[1] <- "ID"
colnames(sam_output) <- gsub("\\.","_", colnames(sam_output))
```
## Check column names are the same
```{r}
colnames(new_output)
```
```{r}
colnames(sam_output)
```
## See if data frames are identical
```{r}
identical(new_output, sam_output)
```
## Show differences between data frames
```{r}
summary(comparedf(new_output, sam_output, by = "ID"))
```
## There are still a couple differences in the column names
## Change column names to match
```{r}
colnames(sam_output[,32:37])
colnames(new_output[,32:37])
colnames(sam_output)[32:37] <- colnames(new_output)[32:37]
```
```{r}
colnames(sam_output)[47]
colnames(new_output)[47]
colnames(sam_output)[47] <- colnames(new_output)[47]
```
## See if data frames are identical
```{r}
identical(new_output, sam_output)
```
## Show differences between data frames
```{r}
summary(comparedf(new_output, sam_output, by = "ID"))
```
## After ordering columns the same way and making column names consistent, tables show the exact same data confirming nothing weird happened when joining files using the new script