-
Notifications
You must be signed in to change notification settings - Fork 0
EMPIAR
EMPIAR (the Electron Microscopy Public Image Archive) is an online database provided by EMBL-EBI. It contains several hundred (as of October 1, 2020) electron microscopy data sets, which can be downloaded for use in microscopy-based research.
EMPIAR entries are organized by accession code, which are of the format EMPIAR-#####
. For now, the #####
portion starts with a 1
followed by zeros padding the data set number, e.g., EMPIAR-10017
. Each EMPIAR entry also has a DOI number of the format https://dx.doi.org/10.6019/EMPIAR-#####
, which should be used when citing data.
This document describes how to transfer these data to a local machine or computing cluster for analysis. Since these data sets can be between hundreds of gigabytes and several terabytes in size, it is recommended to use a machine or cluster node with a dedicated high-bandwidth network connection for transfers.
This method performs a direct download operation using wget
. It can therefore fail if the recipient machine experiences power loss or network timeout, or if the wget
process or its containing shell session is interrupted. To protect transfers against these last two possibilities, we suggest using screen
(manual page here) or tmux
(manual page here) if appropriate for your computing setup. See the linked manual pages or the Protecting Transfers with tmux
section for more information.
To download a complete EMPIAR entry in the current working directory, run the following, replacing #####
with the appropriate EMPIAR accession code. Note that the final /
is important when using wget
—otherwise, it will try to interpret #####
as a file instead of a directory.
wget ftp://ftp.ebi.ac.uk/empiar/world_availability/#####/
Note: the port is 22
, user is anonymous
(or Guest
on macOS), and password is left blank (although wget
should identify these by default).
For organizations that have computing clusters with a Globus endpoint, it is also possible to use the Globus service to transfer EMPIAR data much more rapidly, and without risk of interruption. These transfers can be managed from the Globus web interface, as follows, and do not rely on a shell session.
- Towards the bottom of an EMPIAR entry page (e.g., https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10017/) click the
Browse Globus
link - Enter the name of your institution and follow login prompts
- The File Manager should now be displayed; the Collection bar should say
Shared EMBL-EBI public endpoint
and the Path bar should say something like/gridftp/empiar/world_availability/#####
- If the File Manager is not already in double-pane view, click
Transfer or Sync to...
in the menu at right (has an icon of two arrows pointing in opposite directions) - Click the search bar above the right-hand pane and enter the name of your organization's Globus endpoint
- The right-hand pane should now populate with user-specific contents of the endpoint you specified; navigate to a suitable download location
- Select files and/or directories to transfer in the left hand pane, then click
Start >
to download them to the destination specified in the right-hand Path bar
Transfer progress is displayed in the Activity tab in the menu at left. The transfer will be scheduled by Globus, so the webpage can be closed after clicking Start >
.
The tmux
utility can create persistent shell sessions, and is useful in protecting long file transfers to a remote machine against accidental disconnection of your SSH session.
To create a new tmux
session, use
tmux new -s my-session-name
To list active tmux
sessions, use
tmux ls
To attach to an existing tmux
session, use
tmux attach -t my-session-name-or-id
To detach from the current active tmux
session (while keeping its processes running on the remote machine), use the following keyboard shortcut. Note that the tmux
"prefix" combination (here, Ctrl-b
) may be different on your system.
Ctrl-b + d
To kill a tmux
session and all its processes (so that it does not continue consuming system resources after you're finished with it), use
tmux kill-session -t my-session-name-or-id