Skip to content

jduc/geoDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

geoDL

Please note that geoDL is in beta version, therefore expect bugs

geoDL/logo.png

geoDL is a python program to download FASTQ files from GEO-NCBI. The program inputs a #GEO access number and perform a search on the EMBL-EBI/ENA website to gather metadata and download FASTQ files. The metadata are used to rename the samples with the experiment sample names (rather than the SRR numbers).

Dependencies

  • geoDL should work with both Python3 and Python2 but test have to be run still
  • Beautifulsoup4, colorama and six python package are required
  • wget is used internally and thus is a dependency of geoDL

Install

On linux and MacOSx

$ pip install --user geoDL

Note it is possible that the flag --pre is needed for installing the beta version.

Usage

  usage: geoDL.py [-h] [--dry] [--samples [SAMPLES [SAMPLES ...]]] [--colname COLNAME]
                  {geo,meta,ena} GSE|metadata|ENA

{geo,meta,ena}        Specify which type of input
GSE|metadata|ENA      geo:  GSE accession number, eg: GSE13373
                            Map the GSE accession to the ENA study accession and fetch the metadata

                      meta: Use metadata file instead of fetching it on ENA website (bypass GEO)
                            Meta data should include at minima the following columns: ['Fastq files
                            (ftp)', 'Submitter's sample name']

                      ena:  ENA study accession number, eg: PRJEB13373
                            Fetch the metadata directely on the ENA website

  optional arguments:
    -h, --help            show this help message and exit
    --dry                 Don't actually download anything, just print the wget
                          cmds
    --samples [SAMPLES [SAMPLES ...]]
                          Space separated list of GSM samples to download. For
                          ENA mode, subset the metadata
    --colname COLNAME     Name of the column to use in the metadata file to name
                          the samples

Example

Download metadata and all the samples of the serie GSE13373 and rename them to their sample names:

$ geoDL geo GSE13373

Download only some samples:

$ geoDL GSE13373 -s GSM00001 GSM00003

Download use a pre downloaded metadata and use column run_alias as name for the samples:

$ geoDL meta my_metadata.txt --column run_alias

Use a ENA code instead of a GSE code:

$ geoDL ena PRJEB13373

Beta test

  • Test python2 support
  • Test handling of wget

Changelog

changelog

About

Download FASTQ files from GEO with ease

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages