-
Notifications
You must be signed in to change notification settings - Fork 872
/
slides.html
500 lines (372 loc) · 13.9 KB
/
slides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
---
layout: tutorial_slides
logo: "GTN"
title: "Galaxy Interactive Environments"
draft: true
questions:
- "What are Galaxy Interactive Environments (GIEs)?"
- "How to enable GIEs in Galaxy?"
- "How to develop your own GIE?"
objectives:
- "Implement a Hello-World Galaxy Interactive Environment"
requirements:
-
title: "Docker basics"
type: "none"
time_estimation: "90m"
key_points:
- "Interactive Environments offer access to third-party applications within Galaxy"
- "Interactive Environments run in a docker images for sandboxing and easy dependency management"
contributors:
- shiltemann
- bgruening
- hexylena
subtopic: advanced
---
# Interactive Environments
---
### Why IEs?
- Embedded access to third-party application inside of Galaxy
- Interactively analyze data, access analysis products within Galaxy
![Ipython is shown in the center pane of galaxy](../../images/vis_IE_ipython0.png)
- Bring external analysis platform to the data instead of vice-versa
- no need to download/re-upload your data
---
### Who should use IEs?
- Everyone!
- Programming Environments for Bioinformaticians and Data Scientists
- Jupyter
- Rstudio
- Visualization IEs are great for life scientists
- IOBIO (bam/vcf visualizations)
- Phinch (metagenomics visualizations)
---
## Types of visualizations in Galaxy
GIE for visualization? Check that it is the right choice for your project
- **Trackster** - built-in genome browser
- **Display applications**
- UCSC Genome Browser
- IGV
- **Galaxy tools**
- JBrowse
- Krona
- **Visualization plugins**
- Charts
- Generic
- **Interactive Environments**
- Jupyter/Rstudio
- IOBIO (bam/vcf visualizations)
- Phinch (metagenomics visualizations)
---
## Which should I use?
![Flowchart. Only available on an external website? If yes use a display application. Does it need to be served (e.g. python), if yes use an interactive tool. Is it computationally intensive, then it needs to be a regular tool. Is it written in javascript? Then it shold be a generic plugin. If it passes all these tests it can be a charts plugin.](../../images/which_viz_flowchart.png)
---
### How to launch an IE?
- Can be bound to specific datatypes
- Available under the visualizations button on the dataset
.image-25[![visualisation button in galaxy is clicked on a dataset. Jupyter and Rstudio appear below charts.](../../images/vis_IE_button.png)]
- Or more general-purpose applications (Jupyter/Rstudio)
- IE launcher
.image-25[![A drop down menu from the masthead of galaxy shows interactive environments as well.](../../images/vis_IE_launcher_menu.png)]
---
### IE Launcher
- Choose between different available docker images
- Attach one or more datasets from history
.image-75[![An IE launcher is shown with select boxes for the GIE, image, and datasets. Launch is shown below.](../../images/vis_IE_launcher.png)]
---
### How does it work?
- Docker Containers are launched on-demand by users..
- ..and killed automatically when users stop using them
![schematic of request flow. A users requests an IE session with galaxy which connects to a docker host, launches the IE, and communicates this to a NodeJS proxy living next to galaxy. The user opens the connection through the proxy to access the container.](../../images/vis_IE_infra.png)
.footnote[ Admin Docs: https://docs.galaxyproject.org/en/master/admin/interactive_environments.html ]
---
### Jupyter
- General purpose/ multi-dataset
- Provides special functions to interact with the Galaxy history (get/put datasets)
- Ability to save and load notebooks
.image-75[![jupyter seen in the main panel of galaxy](../../images/vis_IE_ipython1.png)]
---
### Jupyter
![the above screenshot but they have scrolled to show embedded plots computed in jupyter in galaxy](../../images/vis_IE_ipython2.png)
---
### Jupyter
![Another screenshot of jupyter in galaxy, this time fetching datasets from galaxy and running samtools.](../../images/vis_IE_ipython3.png)
---
### Rstudio
- General purpose/ multi-dataset
- Provides special functions to interact with the Galaxy history
- Ability to save and load workbook and R history object
![default rstudio in galaxy, multiple panes are visible for computing, values, and files.](../../images/vis_IE_rstudio.png)
---
### IOBIO
- Visualizes single dataset
- Only available for datasets of specific formats
![IOBIO are dynamic javascript-y visualisations with lots of constantly updating graphs that are recalculated on the fly. They are shown in two scratchbooks in Galaxy](../../images/vis_IE_iobio.png)
---
### IOBIO
![a close up of iobio with several graphs and illegible text.](../../images/vis_IE_iobio2.png)
---
### Phinch
![Phinch is shown inside galaxy, another visually intensive graphing application for metagenomics data.](../../images/vis_IE_phinch.png)
---
### Admin
- Prerequisites: NodeJs (npm) and Docker; Galaxy user must be able to talk to the docker daemon
- Enable IEs in `galaxy.yml`
```bash
interactive_environment_plugins_directory = config/plugins/interactive_environments
```
- Install node proxy
```bash
$ cd $GALAXY/lib/galaxy/web/proxy/js/
$ npm install .
```
- Can configure GIEs to run on another host
.footnote[ Advanced configurations: https://docs.galaxyproject.org/en/master/admin/interactive_environments.html]
---
### Development
- Not hard to build!
- All the magic is in:
```bash
$GALAXY/config/plugins/interactive_environments/$ie_name/
```
| Component | File |
|------------------------------------|------------------------------|
| Visualization Plugin Configuration | ../config/${ie_name}.xml |
| IE specific Configuration | ../config/${ie_name}.ini |
| Mako Template | ../templates/${ie_name}.mako |
---
### Development
![schematic of GIE with a box on the left labelled ipython.mako being invoked by a launch ipython notebook call. This has boxes for running a docker container and then proxying authentication. These point to a docker container with a config.yaml containing the history id, api key and password, the ipython galaxy notebook, and the ipython webservice which loops while the connection is active. This proxies the connection and points to a cartoon of ipython in galaxy's center panel.](../../images/vis_IE_ipython_components.png)
---
### Hello World Example
- All files in this example available from
https://github.com/hexylena/hello-world-interactive-environment/
- Create a GIE that shows the directory listing of `import` folder (datasets loaded into GIE by user)
```bash
$ tree $GALAXY_ROOT/config/plugins/interactive_environments/helloworld/
config/plugins/interactive_environments/helloworld/
├── config
│ ├── helloworld.ini
│ ├── helloworld.ini.sample
│ └── helloworld.xml
├── static
│ └── js
│ └── helloworld.js
└── templates
└── helloworld.mako
```
---
Create GIE plugin XML file `config/helloworld.xml`
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE interactive_environment SYSTEM "../../interactive_environments.dtd">
<!-- This is the name which will show up in the User's Browser -->
<interactive_environment name="HelloWorld">
<data_sources>
<data_source>
<model_class>HistoryDatasetAssociation</model_class>
<!-- filter which types of datasets are appropriate for this GIE -->
<test type="isinstance" test_attr="datatype"
result_type="datatype">tabular.Tabular</test>
<test type="isinstance" test_attr="datatype"
result_type="datatype">data.Text</test>
<to_param param_attr="id">dataset_id</to_param>
</data_source>
</data_sources>
<params>
<param type="dataset" var_name_in_template="hda" required="true">dataset_id</param>
</params>
<!-- Be sure that your entrypoint name is correct! -->
<entry_point entry_point_type="mako">helloworld.mako</entry_point>
</interactive_environment>
```
---
Set up `.ini` file, which controls docker interaction `config/helloworld.ini.sample`
```ini
[main]
# Unused
[docker]
# Command to execute docker. For example `sudo docker` or `docker-lxc`.
#command = docker {docker_args}
# The docker image name that should be started.
image = hello-ie
# Additional arguments that are passed to the `docker run` command.
#command_inject = --sig-proxy=true -e DEBUG=false
# URL to access the Galaxy API with from the spawn Docker container, if empty
# this falls back to galaxy.yml's galaxy_infrastructure_url and finally to the
# Docker host of the spawned container if that is also not set.
#galaxy_url =
# The Docker hostname. It can be useful to run the Docker daemon on a different
# host than Galaxy.
#docker_hostname = localhost
[..]
```
---
- Create mako template `templates/helloworld.mako`
- Loads configuration from `ini` file
- launches docker container,
- builds a URL to the correct endpoint through Galaxy NodeJS proxy
- set environment variable `CUSTOM` to be passed to container
- attach dataset selected by user (`hda`)
```mako
<%namespace name="ie" file="ie.mako" />
<%
# Sets ID and sets up a lot of other variables
ie_request.load_deploy_config()
# Define a volume that will be mounted into the container.
# This is a useful way to provide access to large files in the container,
# if the user knows ahead of time that they will need it.
user_file = ie_request.volume(
hda.file_name, '/import/file.dat', how='ro')
# Launch the IE. This builds and runs the docker command in the background.
ie_request.launch(
volumes=[user_file],
env_override={
'custom': '42'
}
)
[..]
```
---
(continued)
```mako
[..]
# Only once the container is launched can we template our URLs. The ie_request
# doesn't have all of the information needed until the container is running.
url = ie_request.url_template('${PROXY_URL}/helloworld/')
%>
<html>
<head>
${ ie.load_default_js() }
</head>
<body>
<script type="text/javascript">
${ ie.default_javascript_variables() }
var url = '${ url }';
${ ie.plugin_require_config() }
requirejs(['interactive_environments', 'plugin/helloworld'], function(){
load_notebook(url);
});
</script>
<div id="main" width="100%" height="100%">
</div>
</body>
</html>
```
---
Lastly we must write the `load_notebook` function, `static/js/helloworld.js`
```javascript
function load_notebook(url){
$( document ).ready(function() {
test_ie_availability(url, function(){
append_notebook(url)
});
});
}
```
---
### Hello World Example
- The only thing missing now is the GIE (Docker) container itself
- Container typically consists of:
- Dockerfile
- Proxy configuration (e.g. nginx)
- Custom startup script/entrypoint
- Script to monitor traffic and kill unused containers
- The actual application for the users (here: simple python process which serves
directory contents of `/import` folder of container)
---
```dockerfile
FROM ubuntu:14.04
# These environment variables are passed from Galaxy to the container
# and help you enable connectivity to Galaxy from within the container.
# This means your user can import/export data from/to Galaxy.
ENV DEBIAN_FRONTEND=noninteractive \
API_KEY=none \
DEBUG=false \
PROXY_PREFIX=none \
GALAXY_URL=none \
GALAXY_WEB_PORT=10000 \
HISTORY_ID=none \
REMOTE_HOST=none
RUN apt-get -qq update && \
apt-get install --no-install-recommends -y \
wget procps nginx python python-pip net-tools nginx
# Our very important scripts. Make sure you've run `chmod +x startup.sh
# monitor_traffic.sh` outside of the container!
ADD ./startup.sh /startup.sh
ADD ./monitor_traffic.sh /monitor_traffic.sh
# /import will be the universal mount-point for IPython
# The Galaxy instance can copy in data that needs to be present to the
# container
RUN mkdir /import
[..]
```
---
(continued)
```dockerfile
[..]
# Nginx configuration
COPY ./proxy.conf /proxy.conf
VOLUME ["/import"]
WORKDIR /import/
# EXTREMELY IMPORTANT! You must expose a SINGLE port on your container.
EXPOSE 80
CMD /startup.sh
```
---
- Proxy configuration (nginx)
- reverse proxy our directory listing process running on port 8000
```nginx
server {
listen 80;
server_name localhost;
# Note the trailing slash used everywhere!
location PROXY_PREFIX/helloworld/ {
proxy_buffering off;
proxy_pass http://127.0.0.1:8000/;
proxy_redirect http://127.0.0.1:8000/ PROXY_PREFIX/helloworld/;
}
}
```
---
- Create the `startup.sh` file which starts our directory listing service
```bash
#!/bin/bash
# First, replace the PROXY_PREFIX value in /proxy.conf with the value from
# the environment variable.
sed -i "s|PROXY_PREFIX|${PROXY_PREFIX}|" /proxy.conf;
# Then copy into the default location for ubuntu+nginx
cp /proxy.conf /etc/nginx/sites-enabled/default;
# Here you would normally start whatever service you want to start. In our
# example we start a simple directory listing service on port 8000
cd /import/ && python -mSimpleHTTPServer &
# Launch traffic monitor which will automatically kill the container if
# traffic stops
/monitor_traffic.sh &
# And finally launch nginx in foreground mode. This will make debugging
# easier as logs will be available from `docker logs ...`
nginx -g 'daemon off;'
```
---
- Lastly, the script to monitor traffic and shut down if user is no longer connected, `monitor_traffic.sh`
```bash
#!/bin/bash
while true; do
sleep 60
if [ `netstat -t | grep -v CLOSE_WAIT | grep ':80' | wc -l` -lt 3 ]
then
pkill nginx
fi
done
```
---
### Hello World Example
- We are now ready to build our container, and try out our new GIE!
- If the container is hosted on a service like Dockerhub or quay.io, it will be automatically
fetched on first run.
```bash
$ cd hello-ie
$ docker build -t hello-ie .
```
![A directory listing is shown in the center panel.](../../images/vis_IE_helloworld.png)
.footnote[Try it yourself: https://github.com/hexylena/hello-world-interactive-environment]