This repository contains the source for the PathCORE-T demo Flask application.
Requires Python 3.6.2.
Read about the web application database setup in PathCORE-T-analysis. Register for an MLab account and create a new database based on those instructions.
Set the following environment variables (more information about the MongoDB database below):
MDB_USER
MDB_PW
MDB_NAME
MLAB_URI
SESSION_SECRET
How to set an environment variable:
export MDB_USER=kathy
If you choose run this application using heroku local
, you can use this article to set up your local environment variables in a .env
file.
After installing dependencies (pip install -r requirements.txt
), launch the Flask application by running
python app.py
(This works because we have the following lines of code in app.py
):
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0")
Follow this guide. Steps to read through at minimum: "Introduction" to "View logs," and then "Push local changes" to "Define config vars."
Procfile
app.json
runtime.txt
(required to specify Python-3.6.2, per this article)
- top-level:
- templates: The HTML files for each of the pages needed for the routes.
- home.html: The project homepage.
- pathcore-vis.html: The PathCORE-T network pages (PAO1, TCGA).
- network.html: This is used in
pathcore-vis.html
. It is the formatting for the window that displays the D3.js network on the PathCORE-T network page. - edge.html: The edge page (see example).
- no_edge.html: This contains the text for an edge that has no genes with odds ratio above 1.
- experiment.html: The experiment page (see example)
- quickview.html: Users can upload and view their own PathCORE-T-generated network.
- layout.html: Used in all the above templates. Specifies the same
<head>
data. - static: Static files (CSS, JS, fonts, data files). PathCORE-T-specific ones described here:
- css/pathcore.css: Styling specific to this application (a lot of it is for the network styling)
- data: The data files used in the PathCORE-T network pages
- js/pathcore-heatmap.js: JS functions to load the heatmaps and allow a user to interact with heatmaps (particularly to fetch the sample annotation information)
- js/pathcore-network.js: JS functions to load the D3.js PathCORE-T network visualization. Allows users to interact with the network as well (e.g. drag the pathways to areas that make the network as a whole easier to read)
Information about the P. aeruginosa data compendium and genes are stored in several MongoDB collections using scripts in the PathCORE-T-analysis repository.
- Please see the instructions in PathCORE-T-analysis. Note that the files needed to populate the MongoDB database are already available, save for a
.yml
credentials file you need to create. Provided for the PAO1 demo server:- The results from running the PathCORE-T software on the P. aeruginosa gene compendium and KEGG definitions.
- The directory containing the compendium samples annotations file and additional gene information. (See the
data/README
file in PathCORE-T-analysis for citation information.)
The collections that are accessed in this application's GET requests are as follows:
- pathcore_edge_data: This is data needed to load the edge page in the PathCORE-T demo server. Notably, the
gene_names
are the rows seen on the heatmap, themost_expressed_samples
andleast_expressed_samples
the columns, and themost_expressed_heatmap
andleast_expressed_heatmap
the data for each of the heatmaps.edge
: list[str (pathway 0), str (pathway 1)], the two pathways in this co-occurrence relationshipweight_oddsratio
: float, the weight of the edge divided by the expected odds ratiogene_names
: list[str (gene names)], the PA locus tag or the common name of each gene (up to 20). The ordering of this list is dependent on the genes' corresponding odds ratio values (they are sorted in descending order).odds_ratios
: list[float], the genes' odds ratio values.pathway_owner
: list[int], whether each of the genes is annotated to pathway 0 or pathway 1. (0 = pathway 0, 1 = pathway 1, 2 = both)most_expressed_samples
: list[str (sample CEL file)], the "most expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)least_expressed_samples
: list[str (sample CEL file)], the "least expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)most_expressed_heatmap
: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the expression value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.least_expressed_heatmap
: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.
- sample_annotations: Contains the sample annotation information that shows up alongside a heatmap when you hover over a heatmap square. All columns in the sample annotations file are loaded into this collection.
- Additionally, there is a
sample_id
field that we add to each document in the collection. We do this because the genes collection, described in the next point, contains the expression value of this gene in every single sample in the compendium. The expression values are ordered according to the samples' ordering in the compendium, and thissample_id
tracks the position of each sample (and so allows us to fetch the correct expression value for a gene-by-sample).
- Additionally, there is a
- genes: Information about the genes in the compendium.
gene
: str, the PA gene locus tagcommon_name
: str, the common name when availableexpression
: list[float], a vector of expression values, corresponding to the gene row in the compendium.