A web-based tool that enables multi-parameter data analysis for high-throughput screening.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
font
server
src
typings
.gitignore
LICENSE
README.md
bower.json
config.js
index.html
style.css
tangelo_config.yaml
tsconfig.json

README.md

Screenit

A web-based tool that enables multi-parameter data analysis for high-throughput screening. Screenit was developed at the Visual Computing Group

Installation

Install the following software packages:

  • Python 2.7
  • Pip

Bash:

sudo apt-get install python python-dev python-pip

Use Pip to install the following Python packages:

  • tangelo >= 0.9.0
  • numpy >= 1.10.1
  • pandas >= 0.17.0
  • scipy >= 0.13
  • scikit-learn >= 0.17
  • wrapt >= 1.10.5

Bash:

sudo pip install tangelo numpy pandas scipy scikit-learn

Download and place the code at a location that you like, but first look at Database and Images sections for space requirements.

Install npm and then bower. Bash:

sudo apt-get install npm nodejs-legacy
sudo npm install -g bower

Use bower to install additional Javascript libraries. Bash, in the root directory of the prototype:

bower install

Add an directory session to the root.

Run tangelo to launch the server. Bash, in the root directory of the prototype:

sudo tangelo -c tangelo_config.yaml

Browse to the server's address to try out the prototype.

Data

Data sets are stored in the dataset directory, which you will have to create upon installation. Multiple data sets are supported via sub-directories. For example, dataset/DataSetName contains all files for a data set named DataSetName.

Image feature data is stored as a NumPy array dump per image feature in dataset/DataSetName/columns. Every object (in a well) has a value in such an array, where the array index of an object is consistent across all columns. Everything is therefore stored at the object level, including eventual well and plate information (sacrificing disk space for sake of computation speed). Two special files mds0 and mds1 can be included in the columns as well, these provide the coordinates for the landscape plot that can for example be a 2D projection of the high-dimensional feature space.

Well annotation data is stored as a tab-delimited file dataset/DataSetName/wells.tab. The file contains plate column row columns to designate the well, and additional columns for annotations categories. A single well can be given multiple annotations of a single category by giving a list of annotations as annotation1|annotation2|annotation3.

The example CellMorph data is 1.5GB and can be downloaded from Google Drive for now: https://drive.google.com/open?id=0B4zuo4p8QBcaSThHMm1jX2kwUkU The dataset/CellMorph directory already contains the config.py file for CellMorph, which also contains explanatory comments per configuration option.

The code for converting the CellMorph (per plate) tab-delimited files to NumPy columns can be found in wrangle/numpyFill.py.

The system expects the following columns to be present in dataset/DataSetName/columns:

  • plate, integer ranging from 0 to N, that encodes the containing plate of the object, out of N plates
  • column, integer ranging from 0 to C, that encodes the column coordinate the containing well of the object, out of C columns on a plate
  • row, integer ranging from 0 to R, that encodes the row coordinate of the containing well of the object, out of R rows on a plate
  • x, float that specifies the x-coordinate of the object in its well, in pixel space of the well images
  • y, float that specifies the y-coordinate of the object in its well, in pixel space of the well images

All other columns in the dataset/DataSetName/columns directory are assumed to contain image features as floating point values. The CellMorph data stores image features as 32-bit floats to reduce storage and increase performance, but more precise floats are supported as well.

Code organization

server contains all server-side Python code. Currently, most files serve as API delegators for the Tangelo web server:

  • compute.py contains all interactive computation code
  • numpyData.py the data retrieval backend, which can be swapped in the future.

wrangle/numpyFill contains code that can be used to scrape all image feature data from the CellMorph comma-separated files (per plate) and store it as NumPy columns in the Data section.

src contains all client-side code, which is written primarily in Typescript. typings contains type definition files that interface TypeScript with common JavaScript libraries found in bower_components and configured in bower.json.