Cohorts

Cohorts is a library for analyzing and plotting clinical data, mutations and neoepitopes in patient cohorts.

It calls out to external libraries like topiary and caches the results for easy manipulation.

Cohorts requires Python 3 (3.3+). We are no longer maintaining compatability with Python 2. For context, see this Python 3 statement.

Installation

You can install Cohorts using pip:

pip install cohorts

Features

Data management: construct a Cohort consisting of Patients with Samples.
Use varcode and topiary to generate and cache variant effects and predicted neoantigens.
Provenance: track the state of the world (package and data versions) for a given analysis.
Aggregation functions: built-in functions such as missense_snv_count, neoantigen_count, expressed_neoantigen_count; or create your own functions.
Plotting: survival curves via lifelines, response/no response plots (with Mann-Whitney and Fisher's Exact results), ROC curves. Example: cohort.plot_survival(on=missense_snv_count, how="pfs").
Filtering: filter collections of variants/effects/neoantigens by, for example, variant statistics.
Pre-define data sets to work with. Example: cohort.as_dataframe(join_with=["tcr", "pdl1"]).

In addition, several other libraries make use of cohorts:

Quick Start

One way to get started using Cohorts is to use it to analyze TCGA data.

As an example, we can create a cohort using query_tcga:

from query_tcga import cohort, config

# provide authentication token
config.load_config('config.ini')

# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
                                     project_data_dir='data')

# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
                                 cache_dir='data-cache')

Then, use plot_survival() to summarize a potential biomarker (e.g. snv_count) by survival:.

from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')

Which should produce a summary of results including this plot:

We could alternatively use plot_benefit() to summarize OS>12mo instead of survival:

blca_cohort.plot_benefit(snv_count)

See the full example in the quick-start notebook

Building from Scratch

patient_1 = Patient(
    id="patient_1",
    os=70,
    pfs=24,
    deceased=True,
    progressed=True,
    benefit=False
)
    
patient_2 = Patient(
    id="patient_2",
    os=100,
    pfs=50,
    deceased=False,
    progressed=True,
    benefit=False
)

cohort = Cohort(
    patients=[patient_1, patient_2],
    cache_dir="/where/cohorts/results/get/saved"
)

cohort.plot_survival(on="os")

sample_1_tumor = Sample(
    is_tumor=True,
    bam_path_dna="/path/to/dna/bam",
    bam_path_rna="/path/to/rna/bam"
)

patient_1 = Patient(
    id="patient_1",
    ...
    snv_vcf_paths=["/where/my/mutect/vcfs/live",
                   "/where/my/strelka/vcfs/live"]
    indel_vcfs_paths=[...],
    tumor_sample=sample_1_tumor,
    ...
)

cohort = Cohort(
    ...
    patients=[patient_1]
)

Name		Name	Last commit message	Last commit date
Latest commit History 513 Commits
cohorts		cohorts
docs		docs
test		test
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.github_changelog_generator		.github_changelog_generator
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASING.md		RELEASING.md
lint.sh		lint.sh
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cohorts

Installation

Features

Quick Start

Building from Scratch

About

Releases 1

Packages

Contributors 6

Languages

License

hammerlab/cohorts

Folders and files

Latest commit

History

Repository files navigation

Cohorts

Installation

Features

Quick Start

Building from Scratch

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages