-
Notifications
You must be signed in to change notification settings - Fork 51
/
get_data_extracts.R
186 lines (174 loc) · 6.29 KB
/
get_data_extracts.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
#------------------------------------------------------------------------------#
#
# _ _ _ _ _
# (_) | | | | | | | |
# _ __ ___ _ _ __ | |_ | |__ | | __ _ _ __ | | __
# | '_ \ / _ \ | || '_ \ | __|| '_ \ | | / _` || '_ \ | |/ /
# | |_) || (_) || || | | || |_ | |_) || || (_| || | | || <
# | .__/ \___/ |_||_| |_| \__||_.__/ |_| \__,_||_| |_||_|\_\
# | |
# |_|
#
# This file is part of the 'rstudio/pointblank' project.
#
# Copyright (c) 2017-2024 pointblank authors
#
# For full copyright and license information, please look at
# https://rstudio.github.io/pointblank/LICENSE.html
#
#------------------------------------------------------------------------------#
#' Collect data extracts from a validation step
#'
#' @description
#'
#' In an agent-based workflow (i.e., initiating with [create_agent()]), after
#' interrogation with [interrogate()], we can extract the row data that didn't
#' pass row-based validation steps with the `get_data_extracts()` function.
#' There is one discrete extract per row-based validation step and the amount of
#' data available in a particular extract depends on both the fraction of test
#' units that didn't pass the validation step and the level of sampling or
#' explicit collection from that set of units. These extracts can be collected
#' programmatically through `get_data_extracts()` but they may also be
#' downloaded as CSV files from the HTML report generated by the agent's print
#' method or through the use of [get_agent_report()].
#'
#' The availability of data extracts for each row-based validation step depends
#' on whether `extract_failed` is set to `TRUE` within the [interrogate()] call
#' (it is by default). The amount of *fail* rows extracted depends on the
#' collection parameters in [interrogate()], and the default behavior is to
#' collect up to the first 5000 *fail* rows.
#'
#' Row-based validation steps are based on those validation functions of the
#' form `col_vals_*()` and also include [conjointly()] and [rows_distinct()].
#' Only functions from that combined set of validation functions can yield data
#' extracts.
#'
#' @param agent *The pointblank agent object*
#'
#' `obj:<ptblank_agent>` // **required**
#'
#' A **pointblank** *agent* object that is commonly created through the use of
#' the [create_agent()] function. It should have had [interrogate()] called on
#' it, such that the validation steps were carried out and any sample rows
#' from non-passing validations could potentially be available in the object.
#'
#' @param i *A validation step number*
#'
#' `scalar<integer>` // *default:* `NULL` (`optional`)
#'
#' The validation step number, which is assigned to each validation step by
#' **pointblank** in the order of definition. If `NULL` (the default), all
#' data extract tables will be provided in a list object.
#'
#' @return A list of tables if `i` is not provided, or, a standalone table if
#' `i` is given.
#'
#' @section Examples:
#'
#' Create a series of two validation steps focused on testing row values for
#' part of the `small_table` object. Use [interrogate()] right after that.
#'
#' ```r
#' agent <-
#' create_agent(
#' tbl = small_table %>%
#' dplyr::select(a:f),
#' label = "`get_data_extracts()`"
#' ) %>%
#' col_vals_gt(d, value = 1000) %>%
#' col_vals_between(
#' columns = c,
#' left = vars(a), right = vars(d),
#' na_pass = TRUE
#' ) %>%
#' interrogate()
#' ```
#'
#' Using `get_data_extracts()` with its defaults returns of a list of tables,
#' where each table is named after the validation step that has an extract
#' available.
#'
#' ```r
#' agent %>% get_data_extracts()
#' ```
#'
#' \preformatted{## $`1`
#' ## # A tibble: 6 × 6
#' ## a b c d e f
#' ## <int> <chr> <dbl> <dbl> <lgl> <chr>
#' ## 1 8 3-ldm-038 7 284. TRUE low
#' ## 2 7 1-knw-093 3 843. TRUE high
#' ## 3 3 5-bce-642 9 838. FALSE high
#' ## 4 3 5-bce-642 9 838. FALSE high
#' ## 5 4 2-dmx-010 7 834. TRUE low
#' ## 6 2 7-dmx-010 8 108. FALSE low
#' ##
#' ## $`2`
#' ## # A tibble: 4 × 6
#' ## a b c d e f
#' ## <int> <chr> <dbl> <dbl> <lgl> <chr>
#' ## 1 6 8-kdg-938 3 2343. TRUE high
#' ## 2 8 3-ldm-038 7 284. TRUE low
#' ## 3 7 1-knw-093 3 843. TRUE high
#' ## 4 4 5-boe-639 2 1036. FALSE low}
#'
#'
#'
#' We can get an extract for a specific step by specifying it in the `i`
#' argument. Let's get the failing rows from the first validation step (the
#' [col_vals_gt()] one).
#'
#' ```r
#' agent %>% get_data_extracts(i = 1)
#' ```
#'
#' \preformatted{## # A tibble: 6 × 6
#' ## a b c d e f
#' ## <int> <chr> <dbl> <dbl> <lgl> <chr>
#' ## 1 8 3-ldm-038 7 284. TRUE low
#' ## 2 7 1-knw-093 3 843. TRUE high
#' ## 3 3 5-bce-642 9 838. FALSE high
#' ## 4 3 5-bce-642 9 838. FALSE high
#' ## 5 4 2-dmx-010 7 834. TRUE low
#' ## 6 2 7-dmx-010 8 108. FALSE low}
#'
#'
#'
#' @family Post-interrogation
#' @section Function ID:
#' 8-2
#'
#' @export
get_data_extracts <- function(
agent,
i = NULL
) {
# Stop function if the agent hasn't
# yet performed an interrogation
if (!inherits(agent, "has_intel")) {
stop(
"The `agent` has not yet performed an interrogation.",
call. = FALSE
)
}
# Get the number of validation steps
validation_steps <- unique(agent$validation_set$i)
if (is.null(i)) {
return(agent$extracts)
}
# Stop function if the `i`th step does not exist in `agent`
if (!(i %in% seq(validation_steps))) {
stop("The provided step number does not exist.", call. = FALSE)
}
# Get the names of the extracts
extract_names <- names(agent$extracts)
# Stop function if the `i`th step does not have an extract available
if (!(as.character(i) %in% extract_names)) {
stop(
"The provided step number does not have an associated extract.",
call. = FALSE
)
}
# Get the data extract
agent$extracts[[as.character(i)]]
}