Visualisation of count compositional data and associated factors in R.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
images
.RData
.gitignore
LICENSE
README.md
Tutorial.md
VisCount.R
citation.md
visualisation.R

README.md

viscount logo

Description

VisCount is a versatile visualisation tool for exploring count data. This tool enables you to quickly investigate and visualise relationships within your dataset.

image of program in action

If you provide this program with a file containing relative count data (such as output from QIIME), it will generate an interactive plot in the browser that will:

  • Compare the abundance of two different count data variables against one another and select which variables to compare interactively
  • Visualise and filter data for up to 3 different factors
  • Zoom in to regions of the plot: the program calculates the best scale to use
  • Interactively modify the opacity and size of your datapoints
  • The stroke (line around points) can be interactively thickened to become the main feature of the point
  • Select whether or not you want to log your data and what to replace 0 values with
  • Download the plot you have customised

Contents

Demonstration

An interactive demonstration of the tool using the lemur dataset will be available on a website shortly. This will allow you to try out VisCount in the browser without having to install a thing.

Using VisCount

Program Requirements

This program requires the R programming environment, which is freely available here. This tool uses the ggvis and dplyr packages, although it will install these automatically when you run the program for the first time.

Using the Terminal

In order to run the R program, you'll need to navigate into the VisCount folder you downloaded via the terminal. To simplify things, you can copy your the file containing your data into the data folder located within the VisCount folder.

Assuming you have copied the VisCount folder into your home directory on Mac or Linux, to navigate into this folder you should open up the terminal and type in:

cd ~/VisCount-master

Once this has worked, you can then run the command to visualise your data:

Rscript VisCount.R data/FILENAME.txt tab 0.000001 factor1 factor2 factor3

These are the parameters we would need to run in order to launch the example dataset, to understand how to customise these commands for your own data, please read the "Example" section below

To increase the size of the plot (if it is too small or parts are not visible), please click the little triangle icon on the bottom-right of the figure and drag down to the expand the plot.

Using RStudio

If you open up "VisCount.R" file in RStudio, you can run the code directly from there by highlighting every line in the file and pressing control-enter. This will run the example program, although there are a few lines you will have to modify. These lines have been marked with a #-------------------> MODIFY comment to help them stand out.

setwd("~/Research/VisCount/")
data.file.name <- "data/lemurs.txt"
data.file.sep <- "\t"
csv.file.min.value <- 0.000001
csv.file.factors <- c('Life.stage', 'Species', 'Individual')

The first line requires that you tell R where the directory in which VisCount and your data are located. The second line is the path and name of your data file and the third is whether this file is tab or comma delimited. You then have to tell the program if you want to log the data by giving it a min-value to replace the 0s in your dataset (if you don't want to log the data, just set this value to 0). Finally, you pass R a list of the different factors in your dataset. If you have no factors, just create an empty list like this: csv.file.factors <- c().

Data Requirements

Data is required to be in the form of a CSV (comma separated file) or TSV (tab separated file) and to have a header for each column. Factors should be in their own column, and you can have between 0-3 factors.

how data looks

Components of a count or frequency table (CFT) used as input for VisCount: Row 1 contains all variables for the data set, including (A) up to three factors and (B) elements of numerical data. Rows 2-n contain sample entries, with (C) the level and (D) count or frequency data for each row-sample.

For an example of the correct data structure, please see the example dataset in the "data" directory.

Example

Once you have installed R and navigated into the VisCount directory in terminal , you can run the program using the example lemur dataset provided in the "data" directory:

Rscript VisCount.R data/lemurs.txt tab 0.000001 Life.stage Species Individual

This line has 8 words in it that do the following things:

Parameter Purpose
Rscript Calling the R language
VisCount.R VisCount Program
data/lemurs.txt Data file, includes path relative to program
tab Separator in data file: can be "tab" or "comma" separated
0.000001 Value to replace 0s with if you log the data
Life.stage First factor in data
Species Second factor in data
Individual Third factor in data

Please note that the order of the parameters in the command is important: while you can vary the number of factors (from 0-3), the factors you name must always come last when you run the program.

Factors

The VisCount program can support up to three factors. Depending on how many factors your dataset contains, the following will happen:

Factors Action
0 The points will be plotted without colour.
1 Points will be coloured and filtered based on this factor.
2 Points will be filtered based on both factors. The factor with the most levels will be used to colour the points, while the other factor will provide the coloured stroke around the points.
3 Points will be filtered based on all 3 factors. Same as with 2 factors and the factor with the fewest levels will be used to set the shape of the points. Please ensure that this factor has 6 or fewer levels, as there are only so many shapes we can use before we have to repeat them in the legend.

Please note that if none of the factor levels are checked, all points will be automatically replotted to avoid an empty figure.

Logging Data

If you do not wish to have a log transform applied to your data, retain all 0s when you run the program.

Saving Figures

Figures can be downloaded by clicking the gear icon in the top left of the page and selecting the SVG or canvas option.

Tutorial

Erin has put together a data exploration tutorial using her data which is available in the Tutorial.md file in this repository.

Applications

VisCount has unlimited potential due to the increasing popularity of count/frequency tables across fields. Here we several ideas for application to demonstrate VisCount's interdisciplinary potential:

  • Next generation sequencing of DNA/RNA (see tutorial)
  • Ecology: monitor symbiotic relationships
  • Chemistry: identify cofactors
  • Public health: risk factors for disease
  • Public policy: factors associated with crime
  • Linguistics: translation, cryptography
  • Education: factors associated with student performance
  • Music: elements of composition associated with different genres

Advanced

The section of the code that generates this graphic has been contained in the visualisation() function within the "visualisation.R" file. Feel free to tinker with the code for your own use, just please provide attribution.

Citing VisCount

In progress.

Authors

Jack Bruce Simpson1, Erin A. McKenney2, David Lovell3

1Australian National University, Canberra, ACT, Australia
2Duke University, Durham, North Carolina, USA
3Queensland University of Technology, Brisbane, Queensland, Australia

Jack Simpson Erin McKenney David Lovell
Jack Simpson Erin McKenney David Lovell

Licence

This software is shared under the MIT license which means you're free to do whatever you like with the code so long as you provide attribution.