A Python 3.9+ library for the Cirro platform.
You can install Cirro using pip:
pip install cirro
or you can install the main branch of the repo by running:
pip install git+https://github.com/CirroBio/Cirro-client.git
To enable pipeline configuration you need to install extras using:
pip install cirro[nextflow] # just nextflow pipeline configuration support
pip install cirro[wdl] # just wdl pipeline configuraiton support
pip install cirro[nextflow,wdl] # both nextflow and wdl pipeline configuration support
NOTE: Configuring Nextflow pipelines also requires a local installation of nextflow
.
Upon first use, the Cirro client will ask you what Cirro instance to use and if you would like to save your login information. It will then give you a link to authenticate through the web browser.
You can change your Cirro instance by running cirro configure
and selecting the desired instance.
If you need to change your credentials after this point, and you've opted to save your login, please see the clearing saved login section.
Usage: cirro download [OPTIONS]
Download dataset files
Options:
--project TEXT Name or ID of the project
--dataset TEXT ID of the dataset
--file TEXT Relative path of the file(s) to download (optional, can be used multiple times)
--data-directory TEXT Directory to store the files
-i, --interactive Gather arguments interactively
--help Show this message and exit.
$ cirro download --project "Test Project 1" --dataset "test" --data-directory "~/download"
Usage: cirro upload [OPTIONS]
Upload and create a dataset
Options:
--name TEXT Name of the dataset
--description TEXT Description of the dataset (optional)
--project TEXT Name or ID of the project
--data-type, --process TEXT Name or ID of the data type (--process is deprecated)
--data-directory TEXT Directory you wish to upload
--file TEXT Relative path of the file(s) to upload (optional, can be used multiple times)
-i, --interactive Gather arguments interactively
--include-hidden Include hidden files in the upload (e.g., files starting with .)
--help Show this message and exit.
$ cirro upload --project "Test Project 1" --name "test" --file "sample1.fastq.gz" --file "sample2.fastq.gz" --data-directory "~/data" --data-type "Paired DNAseq (FASTQ)"
Usage: cirro upload-reference [OPTIONS]
Upload a reference to a project
Options:
--name TEXT Name of the reference
--reference-type TEXT Type of the reference (e.g., Reference Genome (FASTA))
--project TEXT Name or ID of the project
--reference-file TEXT Location of reference file(s) to upload (can be used multiple times)
-i, --interactive Gather arguments interactively
--help Show this message and exit.
Usage: cirro list-datasets [OPTIONS]
List available datasets
Options:
--project TEXT ID of the project
-i, --interactive Gather arguments interactively
--help Show this message and exit.
Usage: cirro create-pipeline-config [OPTIONS]
Create pipeline configuration files
Options:
-p, --pipeline-dir DIRECTORY Directory containing the pipeline definition
files (e.g., WDL or Nextflow) [default: .]
-e, --entrypoint TEXT Entrypoint WDL file (optional, if not
specified, the first WDL file found will be
used). Ignored for Nextflow pipelines.
-o, --output-dir TEXT Directory to store the generated configuration
files [default: .cirro]
-i, --interactive Gather arguments interactively
--help Show this message and exit.
It is highly recommended that:
- Nextflow pipelines utilize a
nextflow_schema.json
file. (If your pipeline originates from NF-Core, this should already be the case.) - WDL pipelines are defined in WDL v1.0 or higher and explicitly define an
input
section in the root-level workflow.
When running a command, you can specify the --interactive
flag to gather the command arguments interactively.
Example:
$ cirro upload --interactive
? What project is this dataset associated with? Test project
? Enter the full path of the data directory /shared/biodata/test
? Please confirm that you wish to upload 20 files (0.630 GB) Yes
? What type of files? Illumina Sequencing Run
? What is the name of this dataset? test
? Enter a description of the dataset (optional)
See the following set of Jupyter notebooks that contain examples on the following topics:
Jupyter Notebook | Topic |
---|---|
Introduction | Installing and authenticating |
Uploading a dataset | Uploading data |
Downloading a dataset | Downloading data |
Interacting with a dataset | Calling data and reading into tables |
Analyzing a dataset | Running analysis pipelines |
Using references | Managing reference data |
Advanced usage | Advanced operations |
Jupyter Notebook | Topic |
---|---|
Downloading a dataset in R | Reading data with R |
View the API documentation for this library here.
Name | Description | Default |
---|---|---|
CIRRO_HOME | Local configuration directory | ~/.cirro |
CIRRO_BASE_URL | Base URL of the data portal |
The cirro configure
command creates a file in CIRRO_HOME
called config.ini
.
You can set the base_url
property in the config file rather than using the environment variable.
The transfer_max_retries
configuration property specifies the maximum number of times to attempt uploading a file to Cirro in the event of a transfer failure.
When uploading files to Cirro, network issues or temporary outages can occasionally cause a transfer to fail.
It will pause for an increasing amount of time for each retry attempt.
The enable_additional_checksums
property manages the utilization of SHA-256 hashing for enhanced data integrity.
This feature computes the SHA-256 hash of a file during the upload process, and subsequently cross-validates it with the server upon completion.
When retrieving files, it ensures that the hash received matches the server's stored hash.
The default hashing algorithm for files is CRC64. In many cases, CRC64 is sufficient to ensure data integrity upon upload.
[General]
base_url = cirro.bio
transfer_max_retries = 15
enable_additional_checksums = true
You can clear your saved login information by removing the ~/.cirro/token.dat
file from your system or
by running cirro configure
and selecting No when it asks if you'd like to save your login information.