-
Notifications
You must be signed in to change notification settings - Fork 0
/
quickstart.Rmd
1048 lines (810 loc) · 35.1 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Introduction and Quickstart for Seven Bridges API R Client"
date: "`r Sys.Date()`"
output:
rmarkdown::html_document:
toc: true
toc_float: true
toc_depth: 4
number_sections: true
theme: "flatly"
highlight: "textmate"
css: "sevenbridges.css"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Introduction and Quickstart for Seven Bridges API R Client}
%\VignetteEncoding{UTF-8}
---
<a name="top"></a>
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
`Important note:` This is a work-in-progress project to update the
`sevenbridges2` package. Accordingly, this vignette will also change as new
features are implemented.
# Introduction
`sevenbridges2` is an R package that provides an interface for the Seven Bridges
public API. The supported platforms include the
[Seven Bridges Platform](https://igor.sbgenomics.com/),
[Cancer Genomics Cloud (CGC)](https://www.cancergenomicscloud.org),
[BioData Catalyst (BDC)](https://platform.sb.biodatacatalyst.nhlbi.nih.gov/) and [CAVATICA](https://cavatica.sbgenomics.com).
Learn more from our documentation on the
[Seven Bridges Platform](https://docs.sevenbridges.com/page/api),
[Cancer Genomics Cloud (CGC)](https://docs.cancergenomicscloud.org/v1.0/page/the-cgc-api),
[BioData Catalyst (BDC)](https://sb-biodatacatalyst.readme.io/reference/new-1)
and [CAVATICA](https://docs.cavatica.org/reference/the-api).
Unlike the current `sevenbridges` package that is built on top of
[Reference classes](http://adv-r.had.co.nz/R5.html),
the `sevenbridges2` package is based on more modern and lightweight
[R6](https://adv-r.hadley.nz/r6.html) classes.
However, the basic idea and way of constructing API requests is largely
preserved.
## R Client for the Seven Bridges API
In order to use the `sevenbridges2` package users must authenticate themselves
first by creating an Auth object and providing necessary credentials.
You can read more about the authentication types in our next chapters.
The `sevenbridges2` package only supports v2+ versions of the API, since
versions prior to v2 are not compatible with the Common Workflow Language (CWL).
This package provides a simple interface for accessing and trying out various
methods.
## Installation
The `sevenbridges2` package is available on CRAN and Seven Bridges Github
repository.
To install it from `CRAN`, use simply:
```{r}
# Install package from CRAN
install.packages("sevenbridges2")
```
To install the development version from the `develop` branch on our Github, use
the `remotes` package:
```{r}
# Install package from github
remotes::install_github(
"sbg/sevenbridges2",
build_vignettes = TRUE, dependencies = TRUE
)
```
If you have trouble with `pandoc` and do not want to install it, set
`build_vignettes = FALSE` to avoid the vignettes build.
## API General Information
There are two ways of constructing API calls. For instance, you can use
low-level API calls which use arguments like `path`, `query`, and `body`.
These are documented in the API reference libraries for the [Seven Bridges Platform](https://docs.sevenbridges.com/reference#list-all-api-paths) and the [CGC](https://docs.cancergenomicscloud.org/docs/new-1). An example of a low-level
request to "list all projects" is shown below.
In this request, you can also pass `query` and `body` as a list.
```{r}
# Load the package
library("sevenbridges2")
# Authenticate
a <- Auth$new(token = "<your_token>", platform = "aws-us")
# List all projects with raw api() function
a$api(path = "projects", method = "GET")
```
***(Advanced user option)*** The second way of constructing an API request is
to directly use the [httr2](https://httr2.r-lib.org/) package to make your
API calls.
The `sevenbridges2` package is organized by main resources from the
Seven Bridges API reference.
There we have groups of endpoints to work with projects, files, apps, tasks,
invoices, volumes, etc.
For each group of resources, there is a set of operations such as `query()`,
`get()` and `delete()` which are common, as well as other custom operations.
Before we start, keep in mind the following:
__`offset` and `limit`__
Almost every API call accepts two arguments named `offset` and `limit`.
- Offset defines where the retrieved items start.
- Limit defines the number of items you want to get.
By default, `offset` is set to `0` and `limit` is set to `50`. As such, your
API request returns the __first 50 items__ when you list items or search for
items by name. To search and list all items, use `complete = TRUE`
if you are using the core `api()` function in your API request, or the `all()`
operation within the `Collection` object you've received as the result.
__`Collection`__
Every API call that returns a list of items (usually the output from `query()` operations), operations), like fetching projects, files, apps etc, wraps the
results into a general `Collection` class object containing the `items` field
from which users may access the items returned.
Additional options that the `Collection` class offers are to navigate between
pages of results, for example, to load next or previous page of results by
calling `next_page()` and `prev_page()` methods.
Moreover, users can fetch all results using `Collection`'s `all()` method which
is a shortcut to send multiple API calls for each next page and collect all
results. Keep in mind the `limit` used, as well as the API rate limit.
```{r}
# Create a collection of files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Load next 50 results
public_files$next_page()
# Load previous 50 results
public_files$prev_page()
# Load all results
public_files$all()
```
Lastly, printing `Collection` objects will print the first 10 items (if there
are more than 10 items in the results) by default, but this can be changed with
the `n` parameter in its `print()` function:
```{r}
# Create a collection of files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Default print
public_files
# Print 20 items
public_files$print(n = 20)
```
__Search by ID__
When searching by ID (usually it's the resource's `get()` operation),
your request will return your exact resource as it is unique. Therefore,
you do not have to set `offset` and `limit` manually.
It is good practice to find your resources by their ID and pass this ID as an
input to your task. You can find a resource's ID in the final part of the URL
in the visual interface or via API requests to list resources or get a
resource's details.
__Search by name__
Search by name as criteria in the `query()` operations of Resources, returns
all exact or partial matches depending on the resource.
For example, to list all public files, use the `admin/sbg-public-data` project
query parameter, while if you want to find an exact file by name,
set its `name` parameter to the exact value (partial search by name is not
possible for files).
```{r}
# Search all public files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Search files by name
file_1000G_omni <- a$files$query(
project = "admin/sbg-public-data",
name = "1000G_omni2.5.b37.vcf"
)
```
On the other hand, partial search by name works for Projects and Apps resources.
You can set the corresponding `name` or `query_terms` parameters for this use
case.
In order to query public apps, set the `visibility` parameter to 'public'.
```{r}
# Search all public apps containing the STAR term
public_star_apps <- a$apps$query(
visibility = "public",
query_terms = list("STAR")
)
# Search all projects that contain "demo" in the name
demo_projs <- a$projects$query(name = "demo")
```
<div align="right"><a href="#top">top</a></div>
# Quickstart
## Create `Auth` Object
Before you can access your account via the API, you have to provide your
credentials. You can obtain your credentials in the form of an ["authentication token"](https://docs.sevenbridges.com/v1.0/docs/get-your-authentication-token)
from the **Developer Tab** under **Account Settings** in the visual interface.
Once you've obtained this, create an `Auth` object, so it remembers your
authentication token and the path for the API.
All subsequent requests will use these two pieces of information.
Let's load the package first:
```{r}
# Load package
library("sevenbridges2")
```
You have three different ways to provide your token.
Choose from one of the methods below:
1. [Direct authentication.](#method1) Here you should provide your developer
token and a base URL for the platform of interest (alternatively, you can
provide the name of the platform - these are the available options `cgc`,
`aws-us`, `aws-eu`, `ali-cn`, `cavatica`, `f4c` - the default platform is
`aws-us`) as function call arguments to `Auth$new()`. This will create the
platform authentication object and temporarily set up your token and platform
base URL as environment variables `SB_AUTH_TOKEN` and `SB_API_ENDPOINT`.
This way, your token will not be directly stored in the Auth object, but you
will still be able to access it by calling the `get_token()` method.
Keep in mind that these environment variables are session-specific and are
deleted when the session ends.
2. [Authentication via system environment variables.](#method2) By default
this will read the credential information from two existing system environment
variables: `SB_API_ENDPOINT` and `SB_AUTH_TOKEN`. Of course, assuming that you
have previously set these environment variables. Alternatively, you can specify
the names of the system environment variables you want to be loaded using the
`sysenv_token` and `sysenv_url` arguments.
3. [Authentication via the user configuration file.](#method3) This file, by
default `$HOME/.sevenbridges/credentials`, provides an organized way to collect
and manage all your API authentication information for Seven Bridges platforms.
If you need to be logged into multiple accounts at the same time (which can
also be for different platforms), please use either the second or the third
method.
<a name="method1"/>**Method 1: Direct authentication**
This is the most common method to construct the `Auth` object. For example:
```{r}
# Authenticate with direct method
a <- Auth$new(platform = "aws-us", token = "<your-token>")
```
<a name="method2"/>**Method 2: Environment variables**
To set the two environment variables in your system, you could use
the function `sbg_set_env()`. For example:
```{r}
# Set environment variables
sevenbridges2:::sbg_set_env(
url = "https://api.sbgenomics.com/v2/",
token = "<your_token>"
)
```
Note that these environment variables are session-specific.
Create an `Auth` object:
```{r}
# Authenticate using environment variables
a <- Auth$new(from = "env")
```
<a name="method3"/>**Method 3: User configuration file**
Assume we have already created the configuration file named
`credentials` under the directory `$HOME/.sevenbridges/`:
```
[aws-us-<username>]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user
# another user on the same platform
[aws-us-rosalind-franklin]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user
[cgc]
api_endpoint = https://cgc-api.sbgenomics.com/v2
auth_token = token_for_this_user
[bdc]
api_endpoint = https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2/
auth_token = token_for_this_user
```
To load the user profile `aws-us-<username>` from this configuration
file, simply use:
```{r}
# Load aws-us-<username> profile for authentication
a <- Auth$new(
from = "file",
profile_name = "aws-us-<username>"
)
```
If `profile_name` is not specified, we will try to load the profile
named `[default]`:
```{r}
# Load default profile
a <- Auth$new(from = "file")
```
The option based on the use of a configuration file also enables simultaneous
authentication from multiple accounts.
Assuming that we have a configuration file like the one listed above,
and that we want to create authentication objects for two profiles
(`default` and `aws-us-<username>`), we can achieve this in
the following way:
```{r}
# Create Auth object with 'default' account
a <- Auth$new(from = "file", profile_name = "default")
# Create Auth object with 'aws-us-<username>' account
b <- Auth$new(from = "file", profile_name = "aws-us-<username>")
```
***Note:*** API paths (base URLs) differ for each Seven Bridges environment.
Be sure to provide the correct path for the environment you are using.
API paths for some of the environments are:
+-------------------------------------------+---------------------------------------------------+---------------+
| Platform Name | API Base URL | Short Name |
+===========================================+===================================================+===============+
| Seven Bridges Platform (US) | `https://api.sbgenomics.com/v2` | `"aws-us"` |
+-------------------------------------------+---------------------------------------------------+---------------+
| Seven Bridges Platform (EU) | `https://eu-api.sbgenomics.com/v2` | `"aws-eu"` |
+-------------------------------------------+---------------------------------------------------+---------------+
| Seven Bridges Platform (China) | `https://api.sevenbridges.cn/v2` | `"ali-cn"` |
+-------------------------------------------+---------------------------------------------------+---------------+
| Cancer Genomics Cloud (CGC) | `https://cgc-api.sbgenomics.com/v2` | `"cgc"` |
+-------------------------------------------+---------------------------------------------------+---------------+
| Cavatica | `https://cavatica-api.sbgenomics.com/v2` | `"cavatica"` |
+-------------------------------------------+---------------------------------------------------+---------------+
| BioData Catalyst Powered by Seven Bridges | `https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2` | `"f4c"` |
+-------------------------------------------+---------------------------------------------------+---------------+
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about all available authentication methods.
## Get User Information
<a name="youruser"/>**Get your own information**
This call returns information about your account.
```{r}
# Get currently authenticated user info
a$user()
```
```
── User ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• country: United States
• affiliation: SBG
• last_name: Test
• first_name: User
• email: <user>@sbgenomics.com
• username: <username>
• href: https://api.sbgenomics.com/v2/users/<user>
```
<div align="right"><a href="#top">top</a></div>
**Get information about a user**
This call returns information about the specified user. Note that currently
you can view only your own user information, so this call is equivalent to
the [call to get information about your account](#youruser).
```{r}
# Get user info
a$user(username = "<username>")
```
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about
getting user information.
## Rate Limit
This call returns information about your current rate limit. This is the
number of API calls you can make in five minutes. This call also returns
information about your current instance limit.
```{r}
# Get rate limit info
a$rate_limit()
```
```
── Rate Limit ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• rate
• limit: 1000
• remaining: 1000
• reset: 2022-12-26 11:31:01 CET
• instance
• limit: 25
• remaining: 25
```
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about
rate limit information.
## Show Billing Information
Each project must have a Billing Group associated with it. This Billing
Group pays for the storage and computation in the project.
For example, your first project(s) were created with the free funds from
the Pilot Funds Billing Group assigned to each user at sign-up.
To get information about your billing groups:
```{r}
# Check your billing info
a$billing_groups$query()
```
This call lists all your billing groups, including groups that are pending
or have been disabled.
To get information about your invoices:
```{r}
# Check your invoices
a$invoices$query()
```
The call returns information about all your available invoices, unless you use
the `billing_group_id` query parameter to specify the ID of a particular
billing group, in which case it will return the invoice incurred by that
billing group only.
To get detailed information for a specific billing group, please use the
billing_group method with the billing group ID. The information returned
includes the billing group owner, the total balance, and the status of the
billing group (pending, disabled,...).
```{r}
# Get a single billing group
a$billing_groups$get(id = "<billing_group_id>")
```
```
── Billing group info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• disabled: FALSE
• pending: FALSE
• type: regular
• name: My billing group
• owner: <bg_owner's_username>
• id: <billing_group_id>
• href: https://api.sbgenomics.com/v2/billing/groups/<billing_group_id>
• balance
• currency: USD
• amount: 221
```
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about
billing informations.
## List and query projects
Projects are the core building blocks of the platform. Each project corresponds
to a distinct scientific investigation, serving as a container for its data,
analysis tools, results, and collaborators.
In order to query and explore all projects, use the `projects`
resource path and the `query()` method. One can also filter the projects by
several criteria, like project's name and tags.
The search by name is partial and case-insensitive.
```{r}
# List first 5 projects
my_projects <- a$projects$query(limit = 5)
my_projects
# Load next page of results
my_projects$next_page()
# Return all projects that contain the term "demo"
demo_projects <- a$projects$query(name = "demo")
# Return all projects tagged with "demo"
tagged_projects <- a$projects$query(tags = list("demo"))
```
Note that the output is the `Collection` object and the results (list of
`Project` objects) can be found within the `items` field.
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
projects.
## Create a new project
Create a new project called "API testing" with the billing group `id` obtained
above.
```{r}
# List all available billing groups for currently logged in user
a$billing_groups$query()
# Set the billing group for the new project
bid <- "<billing_group_id>"
# Create a new project
p <- a$projects$create(
name = "API testing", billing_group = bid,
description = "This project has been created using the sevenbridges2 R API
library."
)
```
```
── Project ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• category: PRIVATE
• root_folder: <root_folder_id>
• type: v2
• description: This project has been created using the sevenbridges2 R API library.
• billing_group: <billing_group_id>
• name: API testing
• id: <your_username_or_division>/api-testing
• href: https://api.sbgenomics.com/v2/projects/<your_username_or_division>/api-testing
• settings
• locked: FALSE
• controlled: FALSE
• location: aws:us-east-1
• use_interruptible_instances: TRUE
• use_memoization: FALSE
• intermediate_files: list(duration = 24, retention = "LIMITED")
• allow_network_access: TRUE
• use_elastic_disk: FALSE
```
<div align="right"><a href="#top">top</a></div>
The **new project** is created on the platform. Notice also that the variable
`p` is an R6 object with fields that contain information about the platform
project. The facility also has several methods that allow you to perform
basic platform operations on the project.
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
projects.
## Get details of a specified project
Use the `get()` method and provide the full ID of the project you would like to
fetch.
```{r}
# Get a single project by ID
a$projects$get(id = "<your_username_or_division>/api-testing")
```
```
── Project ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• category: PRIVATE
• root_folder: <root_folder_id>
• type: v2
• description: This project has been created using the sevenbridges2 R API library.
• billing_group: <billing_group_id>
• name: API testing
• id: <your_username_or_division>/api-testing
• href: https://api.sbgenomics.com/v2/projects/<your_username_or_division>/api-testing
• settings
• locked: FALSE
• controlled: FALSE
• location: aws:us-east-1
• use_interruptible_instances: TRUE
• use_memoization: FALSE
• intermediate_files: list(duration = 24, retention = "LIMITED")
• allow_network_access: TRUE
• use_elastic_disk: FALSE
```
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
apps.
## Copy app into the project
Seven Bridges maintains workflows and tools available to all of its users in
the Public Apps repository.
To find out more about public apps, you can do the following:
- Browse them online. Check out the
[tutorial](https://docs.sevenbridges.com/docs/search-for-an-app) in the
"Find apps" section.
- You can use the `sevenbridges2` package to find it, as shown below.
```{r}
# Search by name matching, with limit 10
public_apps <- a$apps$query(
visibility = "public",
limit = 10,
query_terms = list("STAR")
)
# Search by ID
star_app <- a$apps$get(
id = "admin/sbg-public-data/rna-seq-alignment-star/0"
)
```
Now, copy the App your `project` with a new `name`, following this logic.
```{r}
# Copy app into the project
a$apps$copy(
app = star_app,
project = "<username_or_division>/api-testing",
name = "New copy of STAR"
)
# Check if it is copied
p <- a$projects$get(id = "<username_or_division>/api-testing")
# List the apps you have in your project
p$list_apps()
```
The short name is changed to `newcopyofstar`.
```
== App ==
id : <username_or_division>/api-testing/newcopyofstar/0
name : RNA-seq Alignment - STAR
project : <username_or_division>/api-testing
revision : 0
```
Alternatively, you can copy it from the `App` object.
```{r}
# Get public app RNA Sequencing alignment - STAR
star_app <- a$apps$get(
id = "admin/sbg-public-data/rna-seq-alignment-star/0"
)
# Copy it into a project
star_app$copy(
project = "<username_or_division>/api-testing",
name = "Copy of STAR"
)
```
<div align="right"><a href="#top">top</a></div>
Next, we would like to run a task with this app. Let's see what is required.
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
tasks.
## Execute a new task
### Find your app inputs
Once you have copied the public app
`admin/sbg-public-data/rna-seq-alignment-star/0` into your project,
`<username>/api-testing`, the app `id` in your current project is
`<username>/api-testing/newcopyofstar`.
Alternatively, you can use another app you already have in your project for
this Quickstart.
To draft a new task, you need to specify the following:
- The name of the task
- An optional description
- The App object or `id` of the workflow you are executing
- The inputs for your workflow.
You can always check the App details on the visual interface for task input
requirements. However, there is also a function on the App objects to get
basic information about app's inputs and outputs.
To find the required inputs with R, you need to get an `App` object first.
Let's check which inputs this app requires by calling the `input_matrix()`
function and bring them into our project.
```{r}
# Fetch copied app
copied_star_app <- a$apps$get(
id = "<username_or_division>/api-testing/newcopyofstar/0"
)
# Preview its inputs
copied_star_app$input_matrix()
```
Locate the IDs of the required inputs. Note that task inputs need to match
the expected data type and name. In the above example, we see two required
fields:
- **fastq:** This input takes a file array in the following formats: FASTA,
FASTQ, FA, FQ etc.
- **genomeFastaFiles:** This is a single reference file in the FASTA, FA, FNA
or TAR format.
We also want to provide a gene feature file:
- **sjdbGTFfile:** A file array that can be in the GTF, GFF, GFF2, or GFF3
format.
You can find a list of possible input types below:
- **number, character or integer:** you can directly pass these to the input
parameter as they are.
- **enum type:** Pass this value to the input parameter.
- **file:** This input is a file. However, while some inputs accept only a
single file (`File`), other inputs take more than one file (`File` arrays,
`FilesList`, or '`File...`' ). This input requires you to pass a single `File`
object (for a single file input) or list of `File` objects (for inputs which
accept more than one file).
You can search for your file by `id` or by `name`, as shown in the example
below.
<div align="right"><a href="#top">top</a></div>
### Prepare your input files
```{r}
# Get reads (fastq) files and and copy them into a project
reads_1 <- a$files$get(id = "641c48c425ed1842bd0bf799") # file id
reads_1$copy_to(project = p)
reads_2 <- a$files$get(id = "641c48c425ed1842bd0bf835") # file id
reads_2$copy_to(project = p)
# Get a single file reference file and copy into a project
fasta_in <- a$files$get(id = "641c48c525ed1842bd0bf86a") # file id
fasta_in$copy_to(project = p)
# Get gtf file and copy into a project
gtf_in <- a$files$get(id = "641c48c425ed1842bd0bf825") # file id
gtf_in$copy_to(project = p)
# Get copied files
input_files <- p$list_files()$items
```
<div align="right"><a href="#top">top</a></div>
### Create a new draft task
```{r}
# Add new tasks
taskName <- paste0("STAR-alignment ", date())
tsk <- p$create_task(
name = taskName,
description = "STAR test",
app = copied_star_app,
inputs = list(
"fastq" = c(input_files[[1]], input_files[[2]]),
"genomeFastaFiles" = input_files[[3]],
"sjdbGTFfile" = list(input_files[[4]])
)
)
# Preview task
tsk$print()
```
### Preview your app's expected outputs
Similarly as with inputs, you can also preview the structure of the expected
outputs of the task or workflow. You can get details about the output's name,
description and type using `output_matrix()`.
This function can be called from the App object.
```{r}
# Get app's outputs details
copied_star_app$output_matrix()
```
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
tasks.
## Run a Task
Now, we are ready to run our task.
```{r}
# Run your task
tsk$run()
```
Before you run your task, you can adjust your draft task if you have any final
modifications.
```{r}
# Update task
tsk$update(description = "New RNA SEQ Alignment - STAR task")
```
After you run a task, you can track its status by refreshing the object with
`reload()` function.
```{r}
# Reload task
tsk$reload()
tsk$status
```
You can also abort the task execution if needed:
```{r}
# Abort your task
tsk$abort()
```
If you want to rerun your task without any modifications, you can use
`rerun()` function which will clone the current task for you and start the
execution immediately.
```{r}
# Rerun your task
tsk$rerun()
```
On the other side, if you want to update your task first and then re-run it,
you should clone the current task, update it and then run it, as demonstrated
below:
```{r}
# First clone existing task
cloned_task <- tsk$clone_task()
# Then, update GTF input file in the cloned task
cloned_task$update(inputs = list(sjdbGTFfile = "<some new file>"))
cloned_task$run()
```
Alternatively, you can delete the draft task if you no longer wish to run it.
```{r}
# # not run
# tsk$delete()
```
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
running tasks.
## Run tasks using spot instances
Running tasks with [spot instances](https://docs.sevenbridges.com/docs/about-spot-instances)
could potentially [reduce a considerable amount of computational cost](https://www.sevenbridges.com/spot-instances-cost-reduction/).
This option can be controlled on the project level or the task level on
Seven Bridges platforms. Our package follows the same [logic](https://docs.sevenbridges.com/docs/use-spot-instances)
as our platform's web interface (the current default setting for spot instances
is **on**).
For example, when we create a project using Projects resource's method `create()`,
we can set `use_interruptible = FALSE` to use on-demand instances
(non-interruptible but more expensive) instead of the spot instances
(interruptible but cheaper):
```{r}
# Create project with disabled spot instances
p <- a$projects$create(
name = "spot-disabled-project", bid, description = "spot disabled project",
use_interruptible = FALSE
)
```
Then all the new tasks created under this project will use on-demand instances
to run **by default**, unless an argument `use_interruptible_instances`
is specifically set to `TRUE` when drafting the new task using Tasks resource
method `create()`.
For example, if `p` is the above spot disabled project, to draft
a task that will use spot instances to run:
```{r}
# Create task and set usage of interruptible instances to TRUE
tsk <- p$create_task(
name = paste0("spot enabled task in a spot disabled project"),
description = "spot enabled task",
app = copied_star_app,
inputs = list(
"fastq" = c(input_files[[1]], input_files[[2]]),
"genomeFastaFiles" = input_files[[3]],
"sjdbGTFfile" = list(input_files[[4]])
),
use_interruptible_instances = TRUE
)
```
Conversely, you can have a spot instance enabled project,
but draft and run specific tasks using on-demand instances,
by setting `use_interruptible_instances = FALSE` in `create_task()` explicitly.
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
running tasks using spot instances.
## Execution hints per task run
During workflow development and benchmarking, sometimes we need to view and
make adjustments to the computational resources needed for a task to run more
efficiently. Also, if a task fails due to resource deficiency, we often want to
define a larger instance for the task re-run without editing the app itself.
This is particularly important in cases where there is not enough disk space.
The Seven Bridges API allows setting specific task execution parameters by
using `execution_settings`. It includes the instance type (`instance_type`)
and the maximum number of parallel instances (`max_parallel_instances`):
```{r}
# Create task with setting instance type and number of parallel instances
tsk <- p$create_task(
...,
execution_settings = list(
instance_type = "c4.2xlarge;ebs-gp2;2000",
max_parallel_instances = 2
)
)
```
For details about `execution_settings`, please check [create a new draft task](https://docs.sevenbridges.com/reference/create-a-new-task).
<div align="right"><a href="#top">top</a></div>
Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about
execution hints.
## Draft a batch task
Now let's do a batch with 4 files in 2 groups, which is batched by metadata
`sample_id`. We will assume each file has this metadata field entered.
Since these files can be evenly grouped into 2, we will have a single parent
batch task with 2 child tasks.
```{r}
# Add two more fastq files that will be used in our task inputs
# and copy them into our API testing project
reads_3 <- a$files$get(id = "641c48c425ed1842bd0bf7b6") # file id
reads_3$copy_to(project = p)
reads_4 <- a$files$get(id = "641c48c425ed1842bd0bf7a5") # file id
reads_4$copy_to(project = p)
# Get all project files