Skip to content

kathyxchen/PathCORE-T-demo

Repository files navigation

Description

This repository contains the source for the PathCORE-T demo Flask application.

How to run the application locally

Requires Python 3.6.2.

Read about the web application database setup in PathCORE-T-analysis. Register for an MLab account and create a new database based on those instructions.

Set the following environment variables (more information about the MongoDB database below):

  • MDB_USER
  • MDB_PW
  • MDB_NAME
  • MLAB_URI
  • SESSION_SECRET

How to set an environment variable: export MDB_USER=kathy

If you choose run this application using heroku local, you can use this article to set up your local environment variables in a .env file.

After installing dependencies (pip install -r requirements.txt), launch the Flask application by running

python app.py

(This works because we have the following lines of code in app.py):

if __name__ == "__main__":
	app.run(debug=True, host="0.0.0.0")

Deploy to Heroku

Follow this guide. Steps to read through at minimum: "Introduction" to "View logs," and then "Push local changes" to "Define config vars."

Heroku-specific files provided for you

  • Procfile
  • app.json
  • runtime.txt (required to specify Python-3.6.2, per this article)

Directory structure

  • top-level:
    • app.py: Initializes the Flask application using the environment variables you set (both locally and on Heroku). Also imports the routes (URLs) for the app.
    • routes.py: The routes available in the application.
    • utils.py: Utility functions for retrieving information needed in each route.
  • templates: The HTML files for each of the pages needed for the routes.

The MongoDB database

Information about the P. aeruginosa data compendium and genes are stored in several MongoDB collections using scripts in the PathCORE-T-analysis repository.

  • Please see the instructions in PathCORE-T-analysis. Note that the files needed to populate the MongoDB database are already available, save for a .yml credentials file you need to create. Provided for the PAO1 demo server:
    • The results from running the PathCORE-T software on the P. aeruginosa gene compendium and KEGG definitions.
    • The directory containing the compendium samples annotations file and additional gene information. (See the data/README file in PathCORE-T-analysis for citation information.)

The collections that are accessed in this application's GET requests are as follows:

  • pathcore_edge_data: This is data needed to load the edge page in the PathCORE-T demo server. Notably, the gene_names are the rows seen on the heatmap, the most_expressed_samples and least_expressed_samples the columns, and the most_expressed_heatmap and least_expressed_heatmap the data for each of the heatmaps.
    • edge: list[str (pathway 0), str (pathway 1)], the two pathways in this co-occurrence relationship
    • weight_oddsratio: float, the weight of the edge divided by the expected odds ratio
    • gene_names: list[str (gene names)], the PA locus tag or the common name of each gene (up to 20). The ordering of this list is dependent on the genes' corresponding odds ratio values (they are sorted in descending order).
    • odds_ratios: list[float], the genes' odds ratio values.
    • pathway_owner: list[int], whether each of the genes is annotated to pathway 0 or pathway 1. (0 = pathway 0, 1 = pathway 1, 2 = both)
    • most_expressed_samples: list[str (sample CEL file)], the "most expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)
    • least_expressed_samples: list[str (sample CEL file)], the "least expressed" samples, where "most/least expressed" is based on a summary score that was computed as a function of the genes, their odds ratios, and their expression values in each sample of the compendia. (In descending order.)
    • most_expressed_heatmap: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the expression value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.
    • least_expressed_heatmap: list[dict("value": float, "col_index": int, "row_index": int)], a list of dicts/objects corresponding to the value of each cell in the most expressed heatmap (at position specified by the row and col indices). Rows correspond to genes and columns correspond to samples.
  • sample_annotations: Contains the sample annotation information that shows up alongside a heatmap when you hover over a heatmap square. All columns in the sample annotations file are loaded into this collection.
    • Additionally, there is a sample_id field that we add to each document in the collection. We do this because the genes collection, described in the next point, contains the expression value of this gene in every single sample in the compendium. The expression values are ordered according to the samples' ordering in the compendium, and this sample_id tracks the position of each sample (and so allows us to fetch the correct expression value for a gene-by-sample).
  • genes: Information about the genes in the compendium.
    • gene: str, the PA gene locus tag
    • common_name: str, the common name when available
    • expression: list[float], a vector of expression values, corresponding to the gene row in the compendium.