-
Notifications
You must be signed in to change notification settings - Fork 0
/
Running-Table-Exporter.Rmd
102 lines (70 loc) · 2.75 KB
/
Running-Table-Exporter.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: "Running Table Exporter"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
```
> Note: This functionality is currently experimental. I'm working to get it working on UK Biobank RAP right now.
## UKB RAP RStudio only
The RStudio version on UKB RAP needs an updated dx-toolkit to use this functionality. You can run the code below to update it and install pandas.
```{bash eval=FALSE}
pip3 install dxpy==0.354.0
pip3 install pandas
```
```{r eval=FALSE}
install.packages(c("vctrs", "stringr", "remotes", "rlang"))
remotes::install_github("laderast/xvhelper")
reticulate::use_python("/usr/bin/python3")
```
## Running Table Exporter
If we have a large number of fields (more than 15-20) to extract from the pheno data, then our call to `extract_data()` may fail. That is because this functionality is dependent on a shared resource called the Thrift Server. There is a hard limit to the query execution time on the Thrift server: 2 minutes.
If our query takes longer, we can launch Table Exporter, which is an app on the platform that will do the extraction for us. This vignette outlines how to launch table exporter in your R session, monitor it, and find the CSV file that was generated by it.
The first thing we need to do is find the dataset id, and have a vector of fields that we've generated. Once we have these two items, we can use `launch_table_exporter()` to start the Table Exporter app.
```{r}
library(xvhelper)
ds_id <- find_dataset_id()
fields <- c("participant.eid", "participant.p31", "participant.p41202")
job_id <- launch_table_exporter(ds_id, fields)
job_id
```
When our Table Exporter job is running, we can check on its status using `check_job()`:
```{r}
check_job(job_id)
```
Note that it also returns a `NULL`. When our job finishes successfully, it will return a `file-id` (see below).
The states our job can be are:
- `idle`
- `runnable`
- `running`
- `failed`
- `done`
If we need to terminate our Table Exporter Job, we can use `terminate_job()`:
```{r}
terminate_job(job_id)
```
## Successful Table Exporter Run
If our job finishes successfully or fails, we will receive an email notifiation. We can check on the current status of our job with`check_job()`. Here we're passing in a job ID for a successful run.
```{r eval=FALSE}
file_id <- check_job("job-GY4Zj180Yq3BJyFzg2ygGVX2")
file_id
```
We can download this to our JupyterLab/RStudio storage using:
```{r eval=FALSE}
system(glue::glue("dx download {file_id}"))
```
## Finding all jobs
We can see a list of all jobs and all their states by using `find_all_jobs()`:
```{r}
job_frame <- find_all_jobs()
job_frame
```
We can find the finished jobs by looking for `state == "done"`:
```{r}
job_frame |>
dplyr::filter(state == "done")
```