A web-based tool that enables multi-parameter data analysis for high-throughput screening. Screenit was developed at the Visual Computing Group
Install the following software packages:
sudo apt-get install python python-dev python-pip
Use Pip to install the following Python packages:
sudo pip install tangelo numpy pandas scipy scikit-learn
Download and place the code at a location that you like, but first look at Database and Images sections for space requirements.
Install npm and then bower. Bash:
sudo apt-get install npm nodejs-legacy sudo npm install -g bower
Add an directory session to the root.
Run tangelo to launch the server. Bash, in the root directory of the prototype:
sudo tangelo -c tangelo_config.yaml
Browse to the server's address to try out the prototype.
Data sets are stored in the dataset directory, which you will have to create upon installation. Multiple data sets are supported via sub-directories. For example, dataset/DataSetName contains all files for a data set named DataSetName.
Image feature data is stored as a NumPy array dump per image feature in dataset/DataSetName/columns. Every object (in a well) has a value in such an array, where the array index of an object is consistent across all columns. Everything is therefore stored at the object level, including eventual well and plate information (sacrificing disk space for sake of computation speed). Two special files mds0 and mds1 can be included in the columns as well, these provide the coordinates for the landscape plot that can for example be a 2D projection of the high-dimensional feature space.
Well annotation data is stored as a tab-delimited file dataset/DataSetName/wells.tab. The file contains plate column row columns to designate the well, and additional columns for annotations categories. A single well can be given multiple annotations of a single category by giving a list of annotations as annotation1|annotation2|annotation3.
The example CellMorph data is 1.5GB and can be downloaded from Google Drive for now: https://drive.google.com/open?id=0B4zuo4p8QBcaSThHMm1jX2kwUkU The dataset/CellMorph directory already contains the config.py file for CellMorph, which also contains explanatory comments per configuration option.
The code for converting the CellMorph (per plate) tab-delimited files to NumPy columns can be found in wrangle/numpyFill.py.
The system expects the following columns to be present in dataset/DataSetName/columns:
- plate, integer ranging from 0 to N, that encodes the containing plate of the object, out of N plates
- column, integer ranging from 0 to C, that encodes the column coordinate the containing well of the object, out of C columns on a plate
- row, integer ranging from 0 to R, that encodes the row coordinate of the containing well of the object, out of R rows on a plate
- x, float that specifies the x-coordinate of the object in its well, in pixel space of the well images
- y, float that specifies the y-coordinate of the object in its well, in pixel space of the well images
All other columns in the dataset/DataSetName/columns directory are assumed to contain image features as floating point values. The CellMorph data stores image features as 32-bit floats to reduce storage and increase performance, but more precise floats are supported as well.
server contains all server-side Python code. Currently, most files serve as API delegators for the Tangelo web server:
- compute.py contains all interactive computation code
- numpyData.py the data retrieval backend, which can be swapped in the future.
wrangle/numpyFill contains code that can be used to scrape all image feature data from the CellMorph comma-separated files (per plate) and store it as NumPy columns in the Data section.