Skip to content
This repository has been archived by the owner on Jul 8, 2020. It is now read-only.
/ icgc-get Public archive

Universal download client for ICGC data residing in various environments

License

Notifications You must be signed in to change notification settings

icgc/icgc-get

Repository files navigation

⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

Icgc-get is no longer supported. Please use the score-client directly to download files from the ICGC Data Portal. Instructions to download and use the score-client with various repositories can be found at https://docs.icgc.org/pcawg/data/. If you experience issues while downloading with the score-client, please contact us with details at dcc-support@icgc.org.

⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

icgc-get

This is the icgc-get utility, a universal download client for accessing ICGC data residing in various data repositories.

Motivation

The data for ICGC resides in many data repositories around the world. These repositories each have their own environment (public cloud, private cloud, on-premise file systems, etc.), access controls (DACO, OAuth, asymmetric keys, IP filtering), download clients and configuration mechanisms. Thus, there is much for a user to learn and perform before actually acquiring the data. This is compounded by the fact that the number of environments are increasing over time and their characteristics are frequently changing. A coordinated mechanism to bootstrap and streamline this process is highly desirable. This is the problem the icgc-get tool helps to solve.

Installation

To install icgc-get on your local machine, first download the icgc-get package, then unzip the executable. unzip icgc-get_linux_v0.3.13_x64.zip

Once the installation is complete, icgc-get can be invoked with the path to the icgc-get executable. To make the executable callable from anywhere, you need to either move the executable to a folder on your PATH or add the folder you downloaded the executable to to the PATH. You can find out what directories are on your path with echo $PATH on Mac and Linux or path on Windows. You can add folders to your path with export PATH=$PATH:/folder on Mac and Linux or set PATH=%PATH%;/folder on Windows.

icgc-get is capable of interfacing with the ICGC storage client, Genetorrent, the GDC data transfer tool, the EGA download client and the Amazon Web Service command line interface.

If you do not have any of download clients installed locally, icgc-get is capable of running them through the icgc-get Docker container. Running any of the clients through the Docker container will prevent issues from arising related to conflicting software requirements for the data download clients. To enable this functionality, first install Docker. Make sure to create a Docker group when running on a Linux machine to ensure that Docker can be run without root permissions.

This tool requires one or more download clients installed or Docker installed to function

Quick start

After installing icgc-get, you will need to do configure some of the essential usage parameters, such as your access credentials. Enter ./icgc-get configure and follow the instructions of the prompts. To keep the default values for the parameters, press enter.

For further information, please view our documentation here.

Packaging from source

Requirements

Build PyInstaller from source

We depend on PyInstaller for building our binaries. In order to ensure correct behaviour from icgc-get on termination, it is recommended that you build a PyInstaller release from source as historically their bundled dists through PyPI or otherwise have been inconsistent. This will include building their C libraries, so ensure you have the correct build tools for your platform.

wget https://github.com/pyinstaller/pyinstaller/releases/download/v3.2/PyInstaller-3.2.tar.gz
tar zxvf PyInstaller-3.2.tar.gz
cd PyInstaller-3.2/bootloader
./waf all
cd ..
python setup.py install

Packaging

First run sudo pip install -r ./requirements.txt to ensure that all necessary packages have been installed. Then run:

pyinstaller --clean icgc-get-data.spec

The executable icgc-get will be in a folder named dist in your current directory. Compress it into a zip file, with the naming convention of icgc-get_v$VERSION_$OS_x64.zip, and deploy to artifactory under dcc-binaries

Packaging inside Docker

As an easy way to build a Linux version of icgc-get, you can package it inside the Docker container described in the icgc-get Dockerfile. First rebuild the container to make sure all of the latest updates to the code are copied inside the table. This command must be run from the root directory of the icgc-get project.

docker build -t icgc/icgc-get:$VERSION .

Then run the container in interactive mode. You will need to mount a directory as a data volume to transfer the packaged icgc-get out of the Docker container.

docker run -it -v ~/mnt:/icgc/mnt icgc/icgc-get:$VERSION

Once inside, navigate to /icgc/mnt, and run the following version of the pyinstaller call:

python /icgc/pyinstaller/pyinstaller-pyinstaller-1804636/pyinstaller.py --clean --onefile -n icgc-get --additional-hooks-dir /icgc/icgcget/bin /icgc/icgcget/icgcget/cli.py

Then, exit Docker. Your executable will be present in the mounted directory, but the docker container does not natively have the ability to zip files.