geofetch
is a command-line tool that downloads and organizes data and metadata from GEO and SRA. When given one or more GEO/SRA accessions, geofetch
will:
- Download either raw or processed data from either SRA or GEO
- Produce a standardized PEP sample table. This makes it really easy to run looper-compatible pipelines on public datasets by handling data acquisition and metadata formatting and standardization for you.
- Prepare a project to run with sraconvert to convert SRA files into FASTQ files.
- Works with GEO and SRA metadata
- Combines samples from different projects
- Standardizes output metadata
- Filters type and size of processed files (from GEO) before downloading them
- Easy to use
- Fast execution time
- Can search GEO to find relevant data
- Can be used either as a command-line tool or from within Python using an API
geofetch
runs on the command line. This command will download the raw data and metadata for the given GSE number.
geofetch -i GSE95654
You can add --processed
if you want to download processed files from the given experiment.
geofetch -i GSE95654 --processed
You can add --just-metadata
if you want to download metadata without the raw SRA files or processed GEO files.
geofetch -i GSE95654 --just-metadata
geofetch -i GSE95654 --processed --just-metadata
Note: We ensure that GEOfetch is compatible with Unix, Linux, and Mac OS X. However, due to dependencies, some features of GEOfetch may not be available on Windows.
- Now geofetch is available as Python API package. Geofetch can initialize peppy projects without downloading any soft files. Example:
from geofetch import Geofetcher
# initiate Geofetcher with all necessary arguments:
geof = Geofetcher(processed=True, acc_anno=True, discard_soft=True)
# get projects by providing as input GSE or file with GSEs
geof.get_projects("GSE160204")
- Now to find GSEs and save them to file you can use
Finder
- GSE finder tool:
from geofetch import Finder
# initiate Finder (use filters if necessary)
find_gse = Finder(filters='bed')
# get all projects that were found:
gse_list = find_gse.get_gse_all()
Find more information here: GSE Finder
For more details, check out the usage reference, installation instructions, or head on over to the tutorial for raw data and tutorial for processed data for a detailed walkthrough.