Skip to content
Lightweight downloader for bioinformatics data, databases and files. Golang http library, wget, curl, axel, git, and rsync were used as the download engine.
Go Makefile
Branch: master
Clone or download

README.md

Life cycle: experimental GoDoc

bget

Lightweight downloader for bioinformatics data, databases and files. Golang http library, wget, curl, axel, git, and rsync were supported as the download engine.

Possible URLs pool:

  • Reference genomes
  • Source code of bioinformatics tools
  • Bioinformatics databases and files
  • Papers material
  • ......

Note: Now, we are supporting and testing more doi, see here.

doi demo

Installation

go get -u github.com/openbiox/bget

Usage

Demo Video: https://www.notion.so/sjtu/Demo-of-bget-doi-key-seq-url-78c2c334bf894668aa17fd128bd3255c

Lightweight downloader for bioinformatics data, databases and files (under development). It will provides a simple and parallelized method to access various bioinformatics resoures. More see here https://github.com/openbiox/bget.

Usage:
  bget [flags]
  bget [command]

Available Commands:
  doi         Can be used to access files via DOI.
  help        Help about any command
  key         Can be used to access URLs via a key string.
  seq         Can be used to access sequence data via unique id (dbGAP and EGA) or manifest files (TCGA).
  url         Can be used to access URLs via Golang http, wget, curl, axel and git, and rsync.

Flags:
      --clean                    Remove _download and _log in current dir.
  -g, --engine string            Point the download engine: go-http, wget, curl, axel, git, and rsync. (default "go-http")
  -e, --extra-cmd string         Extra flags and values pass to internal CMDs
  -h, --help                     help for bget
      --ignore                   Contine to download and skip the check of existed files.
  -l, --list-file string         A file contains dois for download.
      --log-dir string           Log dir. (default "/home/ljf/repositories/github/openbiox/bget/_log")
  -m, --mirror string            Set the mirror of resources.
  -o, --outdir string            Set the download dir. (default "/home/ljf/repositories/github/openbiox/bget")
  -f, --overwrite                Logical indicating that whether to overwrite existing files.
      --proxy string             HTTP proxy to download.
  -q, --quiet                    No output.
  -n, --remote-name              Use remote defined filename.
  -r, --retries int              Retry specifies the number of attempts to retrieve the data. (default 5)
      --retries-sleep-time int   Sleep time after one retry. (default 5)
      --save-log                 Save download log to local file]. (default true)
  -s, --seperator string         Optional 'url1{seperator}url2' for multiple keys, urls, or seqs. (default ",")
      --task-id string           Task ID (random). (default "wpyk8931z450351")
  -t, --thread int               Concurrency download thread. (default 1)
      --thread-axel int          Set the thread of axel. (default 5)
      --timeout int              Set the timeout of per request. (default 35)
  -u, --uncompress               Uncompress download files for .zip, .tar.gz, and .gz suffix files.
      --version                  version for bget

Use "bget [command] --help" for more information about a command.

You can use DOI to download article and its supplementary Data. The supported website and journals will be continue increased.

Warn: If you do not follow the policies of the relevant website (i.e. continuous download or limited copyright), you will lose the authorization to use this tool.

Can be used to access files via DOI. More see here https://github.com/openbiox/bget.

Usage:
  bget doi [doi1 doi2 doi3...] [flags]

Examples:
  bget doi 10.5281/zenodo.3363060 10.5281/zenodo.3357455 10.5281/zenodo.3351812 -t 3
  bget doi 10.1016/j.devcel.2017.03.001 10.1016/j.stem.2019.07.009 10.1016/j.celrep.2018.03.072 -t 2

  bapi ncbi -q '((The PARK10 gene USP24 is a negative regulator of autophagy and ULK1 protein stability[Title]) OR Coordinate regulation of autophagy and the ubiquitin proteasome system by MTOR[Title])' -o titleSearch.XML
  dois=`bapi ncbi --xml2json pubmed titleSearch.XML |grep Doi| tr -d ' ,(Doi:)"'`
  echo ${dois}
  bget doi ${dois}
  bget doi 10.1080/15548627.2018.1505155 --proxy http://username:password@hostname:port

Flags:
  -g, --engine string      Point the download engine: go-http, wget, curl, axel, git, and rsync. (default "go-http")
      --full-text          Access full text. (default true)
  -h, --help               help for doi
  -l, --list-file string   A file contains dois for download.
  -m, --mirror string      Set the mirror of resources.
      --pmc                Try PMC database.
      --suppl              Access supplementary files.
      --thread-axel int    Set the thread of axel. (default 5)

bget seq can be used to access Gene Expression Omnibus (GEO), Sequence Read Archive (SRA), and GDC Data Portal are supported.

Can be used to access sequence data via unique id or manifest files. More see here https://github.com/openbiox/bget.

Usage:
  bget seq [id1 id2 id3... | manifest1 manifest2 manifest3...] [flags]

Examples:
  bget seq ERR3324530 SRR544879 # download files from SRA databaes
  bget seq GSE23543 # download files from GEO databaes (auto download SRA acc list and run info)
  bget dbgap.krt # download files from dbGap database using krt files

  # download TCGA files using file id
  bget seq b7670817-9d6b-494e-9e22-8494e2fd430d

  # download TCGA files using manifest files
  # split for parallel
  split -a 3 --additional-suffix=.txt -l 100 gdc_manifest.2019-08-23-TCGA.txt -d
  for i in x*.txt
  do
    head -n 1 x000.txt > ${i}.tmp
    cat ${i} >> ${i}.tmp
    mv ${i}.tmp ${i}
  done
  sed -i '1d' x000.txt
  bget seq *.txt -t 5

  # support auto (if you do not have *.krt, TCGA manifest, please not include it for test)
  bget seq SRR544879 GSE23543 b7670817-9d6b-494e-9e22-8494e2fd430d dbgap.krt *.txt -t 5

Flags:
  -g, --engine string      Point the download engine: go-http, wget, curl, and axel. (default "go-http")
      --gdc-token string   Token to access TCGA portal files.
  -h, --help               help for seq
  -l, --list-file string   A file contains seq id (e.g. SRR) or manifest files for download.
  -u, --uncompress         Uncompress download files for .zip, .tar.gz, and .gz suffix files (now support GEO database).

bget url can be used to access files via input URLs. Golang http, wget, curl, axel and git, and rsync are support for download process.

Can be used to access URLs via Golang http, wget, curl, axel and git, and rsync. More see here https://github.com/openbiox/bget.

Usage:
  bget url [url1 url2 url3...] [flags]

Examples:
  urls="https://dldir1.qq.com/weixin/Windows/WeChatSetup.exe,http://download.oray.com/pgy/windows/PgyVPN_4.1.0.21693.exe,https://dldir1.qq.com/qqfile/qq/PCQQ9.1.6/25786/QQ9.1.6.25786.exe" && echo $urls | tr "," "\n"> /tmp/urls.list

  bget url ${urls}
  bget url https://dldir1.qq.com/weixin/Windows/WeChatSetup.exe https://dldir1.qq.com/qqfile/qq/PCQQ9.1.6/25786/QQ9.1.6.25786.exe
  bget url ${urls} -t 2 -o /tmp/download
  bget url ${urls} -t 3 -o /tmp/download -f -g wget
  bget url ${urls} -t 3 -o /tmp/download -g wget --ignore
  bget url -l /tmp/urls.list -o /tmp/download -f -t 3

Flags:
  -g, --engine string      Point the download engine: go-http, wget, curl, axel, git, and rsync. (default "go-http")
  -h, --help               help for url
  -l, --list-file string   A file contains URLs for download.
  -m, --mirror string      Set the mirror of resources.
      --thread-axel int    Set the thread of axel. (default 5)
  -u, --uncompress         Uncompress download files for .zip, .tar.gz, and .gz suffix files.

Maintainer

License

Apache 2.0

You can’t perform that action at this time.