This repository contains the source for the PathCORE-T demo Flask application.
How to run the application locally
Requires Python 3.6.2.
Read about the web application database setup in PathCORE-T-analysis. Register for an MLab account and create a new database based on those instructions.
Set the following environment variables (more information about the MongoDB database below):
How to set an environment variable:
If you choose run this application using
heroku local, you can use this article to set up your local environment variables in a
After installing dependencies (
pip install -r requirements.txt), launch the Flask application by running
(This works because we have the following lines of code in
if __name__ == "__main__": app.run(debug=True, host="0.0.0.0")
Deploy to Heroku
Follow this guide. Steps to read through at minimum: "Introduction" to "View logs," and then "Push local changes" to "Define config vars."
Heroku-specific files provided for you
runtime.txt(required to specify Python-3.6.2, per this article)
- templates: The HTML files for each of the pages needed for the routes.
- home.html: The project homepage.
- pathcore-vis.html: The PathCORE-T network pages (PAO1, TCGA).
- network.html: This is used in
pathcore-vis.html. It is the formatting for the window that displays the D3.js network on the PathCORE-T network page.
- edge.html: The edge page (see example).
- no_edge.html: This contains the text for an edge that has no genes with odds ratio above 1.
- experiment.html: The experiment page (see example)
- quickview.html: Users can upload and view their own PathCORE-T-generated network.
- layout.html: Used in all the above templates. Specifies the same
- static: Static files (CSS, JS, fonts, data files). PathCORE-T-specific ones described here:
- css/pathcore.css: Styling specific to this application (a lot of it is for the network styling)
- data: The data files used in the PathCORE-T network pages
- js/pathcore-heatmap.js: JS functions to load the heatmaps and allow a user to interact with heatmaps (particularly to fetch the sample annotation information)
- js/pathcore-network.js: JS functions to load the D3.js PathCORE-T network visualization. Allows users to interact with the network as well (e.g. drag the pathways to areas that make the network as a whole easier to read)
The MongoDB database
Information about the P. aeruginosa data compendium and genes are stored in several MongoDB collections using scripts in the PathCORE-T-analysis repository.
- Please see the instructions in PathCORE-T-analysis. Note that the files needed to populate the MongoDB database are already available, save for a
.ymlcredentials file you need to create. Provided for the PAO1 demo server:
The collections that are accessed in this application's GET requests are as follows:
- pathcore_edge_data: This is data needed to load the edge page in the PathCORE-T demo server. Notably, the
gene_namesare the rows seen on the heatmap, the
least_expressed_samplesthe columns, and the
least_expressed_heatmapthe data for each of the heatmaps.
edge: list[str (pathway 0), str (pathway 1)], the two pathways in this co-occurrence relationship
weight_oddsratio: float, the weight of the edge divided by the expected odds ratio
gene_names: list[str (gene names)], the PA locus tag or the common name of each gene (up to 20). The ordering of this list is dependent on the genes' corresponding odds ratio values (they are sorted in descending order).
odds_ratios: list[float], the genes' odds ratio values.
pathway_owner: list[int], whether each of the genes is annotated to pathway 0 or pathway 1. (0 = pathway 0, 1 = pathway 1, 2 = both)
most_expressed_samples: list[str (sample CEL file)], the "most expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)
least_expressed_samples: list[str (sample CEL file)], the "least expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)
most_expressed_heatmap: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the expression value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.
least_expressed_heatmap: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.
- sample_annotations: Contains the sample annotation information that shows up alongside a heatmap when you hover over a heatmap square. All columns in the sample annotations file are loaded into this collection.
- Additionally, there is a
sample_idfield that we add to each document in the collection. We do this because the genes collection, described in the next point, contains the expression value of this gene in every single sample in the compendium. The expression values are ordered according to the samples' ordering in the compendium, and this
sample_idtracks the position of each sample (and so allows us to fetch the correct expression value for a gene-by-sample).
- Additionally, there is a
- genes: Information about the genes in the compendium.
gene: str, the PA gene locus tag
common_name: str, the common name when available
expression: list[float], a vector of expression values, corresponding to the gene row in the compendium.