Skip to content
Tooling to build OmicIDX apps and data resources
Python Dockerfile Nextflow
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
omicidx_builder
tests
workflows
.dockerignore
.gitignore
Dockerfile
README.org
config_example.toml
poetry.lock
pyproject.toml
sra_es_mapping.json

README.org

OmicIDX Builder

OmicIDX Builder includes supporting code to process and build the OmicIDX applications and data resources. It is not meant for end-users and requires a Google Cloud Project ($$) to use.

Related OmicIDX projects can be found on the OmicIDX Github Organization.

Roadmap

  • [-] Bigquery tables
    • [X] SRA
    • [X] Biosample
    • [ ] GEO
  • [-] JSON dump files
    • [X] SRA
    • [X] Biosample
    • [ ] GEO
  • [-] REST API
    • [X] SRA
    • [X] Biosample
    • [ ] GEO
  • [-] GraphQL API
    • [ ] SRA
    • [ ] Biosample
    • [ ] GEO

Installation for local usage

Installation

pip install poetry
poetry install omicidx_builder

Google setup

TODO

Pipeline

The data processing pipelines are run from the command-line. Notes are below.

SRA

omicidx_builder sra --help
omicidx_builder sra download NCBI_SRA_Mirroring_20190801_Full
cd NCBI_SRA_Mirroring_20190801_Full
omicidx_builder sra parse-entity study
omicidx_builder sra parse-entity sample
omicidx_builder sra parse-entity experiment
omicidx_builder sra parse-entity run
cd ..
omicidx_builder sra upload NCBI_SRA_Mirroring_20190801_Full
omicidx_builder sra load-sra-data-to-bigquery
omicidx_builder sra sra-to-bigquery
omicidx_builder sra sra-bigquery-for-elasticsearch
omicidx_builder sra gcs-dump
omicidx_builder sra gcs-to-elasticsearch

Biosample

omicidx_builder biosample --help

Here are the steps. This requires about 20GB of local storage.

omicidx_builder biosample download
omicidx_builder biosample parse biosample_set.xml.gz biosample.json
omicidx_builder biosample upload
omicidx_builder biosample load
omicidx_builder biosample etl-to-public
omicidx_builder biosample gcs-dump
omicidx_builder biosample gcs-to-elasticsearch

elasticsearch

import elasticsearch_dsl
import omicidx_builder.elasticsearch_utils as es
searcher = elasticsearch_dsl.Search(using = es.get_client())
from elasticsearch_dsl import Search

s = (searcher.index("sra_study")
    .query("match", title="cancer")   
    .exclude("match", description="cancer"))

response = s.execute()

for hit in response:
    print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

Development

running tests

poetry run pytest --cov=omicidx_builder tests

Running long-running tests:

LONG_TESTS=1 poetry run pytest --cov=omicidx_builder tests
You can’t perform that action at this time.