/
s_data_preprocess.Rmd
57 lines (41 loc) · 1.37 KB
/
s_data_preprocess.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: "My data analysis"
output: pdf_document
---
```{r setup, include = FALSE}
options(knitr.kable.NA = "")
```
# Load packages and data
I load my dataset, `s_data`, which is located in the `smallsets` package.
```{r data}
library(smallsets)
library(knitr)
head(s_data) |> kable(booktabs = TRUE)
```
```{r timeline, eval = TRUE, echo = FALSE}
SmallsetTimeline <- Smallset_Timeline(data = s_data,
code = system.file("s_data_preprocess.Rmd", package = "smallsets"))
```
# Preprocessing
I need to preprocess the dataset before I can build a model.
```{r preprocess}
# smallsets snap s_data caption[Remove rows where C2 is FALSE.]caption
s_data <- s_data[s_data$C2 == TRUE,]
# smallsets snap +2 s_data caption[Replace missing values in C6 and C8 with column
# means. Drop C7 because there are too many missing values.]caption
s_data$C6[is.na(s_data$C6)] <- mean(s_data$C6, na.rm = TRUE)
s_data$C8[is.na(s_data$C8)] <- mean(s_data$C8, na.rm = TRUE)
s_data$C7 <- NULL
# smallsets snap +1 s_data caption[Create a new column, C9, by summing C3 and
# C4.]caption
s_data$C9 <- s_data$C3 + s_data$C4
```
Below is a Smallset Timeline, visualising my preprocessing decisions executed above.
```{r print, echo = FALSE, fig.align = "center"}
SmallsetTimeline
```
# Modelling
I build a model.
```{r model}
# code to build a model...
```