Skip to content

maflot/Diceplot

Repository files navigation

CRAN_Status_Badge CRAN Downloads

Note

This repository is in active development

DicePlot

The DicePlot package allows you to create visualizations (dice plots) for datasets with more than two categorical variables and additional continuous variables. This tool is particularly useful for exploring complex categorical data and their relationships with continuous variables.

Installation

To install the DicePlot package, follow these steps:

1. Install R

Ensure that you have R installed on your system. You can download it from The Comprehensive R Archive Network (CRAN). Or use conda:

conda create -n diceplot -c conda-forge r-base -y
conda activate diceplot

2. Install Required Packages

The DicePlot package depends on several other R packages. Install them by running:

install.packages(c(
    "devtools",
    "dplyr",
    "ggplot2",
    "tidyr",
    "data.table",
    "ggdendro"
))

3.0 install Diceplot directly from CRAN

install.packages("diceplot")

3.1 Install Diceplot from GitHub

You can install the DicePlot package directly from GitHub using the devtools package

# Install devtools if you haven't already
install.packages("devtools")
# Install DicePlot from GitHub
devtools::install_github("maflot/DicePlot/diceplot")

3.2 Install Diceplot from files

Download the repository and run following code to install the package

install.packages("$path on your local machine$/DicePlot/diceplot",repos = NULL, type="source")

4. Load the Package

After installation, load the DicePlot package into your R session:

library(diceplot)

Example Usage

Here is a simple example of how to use the DicePlot v0.1.2 package.
For more examples check the tests/ folder.

# Load necessary libraries
library(diceplot)
library(tidyr)
library(data.table)
library(ggplot2)
library(dplyr)
library(tibble)
library(grid)
library(cowplot)
library(RColorBrewer)

First, we define the cell types, pathways, pathway groups, pathology variables, and assign colors to pathology variables.

# Define common variables
cell_types <- c("Neuron", "Astrocyte", "Microglia", "Oligodendrocyte", "Endothelial")
pathways <- c(
  "Apoptosis", "Inflammation", "Metabolism", "Signal Transduction", "Synaptic Transmission",
  "Cell Cycle", "DNA Repair", "Protein Synthesis", "Lipid Metabolism", "Neurotransmitter Release",
  "Oxidative Stress", "Energy Production", "Calcium Signaling", "Synaptic Plasticity", "Immune Response"
)

# Assign groups to pathways
pathway_groups <- data.frame(
  Pathway = pathways,
  Group = c(
    "Linked", "UnLinked", "Other", "Linked", "UnLinked",
    "UnLinked", "Other", "Other", "Other", "Linked",
    "Other", "Other", "Linked", "UnLinked", "Other"
  ),
  stringsAsFactors = FALSE
)

pathology_variables <- c("AD", "Cancer", "Flu", "ADHD", "Age", "Weight")

# Assign colors to pathology variables
n_colors <- length(pathology_variables)
colors <- brewer.pal(n = n_colors, name = "Set1")
cat_c_colors <- setNames(colors, pathology_variables)

Explanation:

  • Cell Types: A list of different cell types involved in the study.
  • Pathways: Biological pathways relevant to the cell types.
  • Pathway Groups: Categorization of pathways into ‘Linked’, ‘UnLinked’, or ‘Other’.
  • Pathology Variables: Medical conditions or variables of interest.
  • Colors Assignment: Assigning a unique color to each pathology variable for visualization.

Function to Create and Plot Dice Plots Now we finalize the data and plot the diceplot

# Create dummy data
set.seed(123)
data <- expand.grid(CellType = cell_types, Pathway = pathways, stringsAsFactors = FALSE)

data <- data %>%
  rowwise() %>%
  mutate(
    PathologyVariable = list(sample(pathology_variables, size = sample(1:length(pathology_variables), 1)))
  ) %>%
  unnest(cols = c(PathologyVariable))

# Function to create and plot dice plots
# Merge the group assignments into the data
data <- data %>%
  left_join(pathway_groups, by = "Pathway")
  
# Use the dice_plot function
# min dot_size is giving the minimal size of a point size the dots can be
# with larger dataframe it might be necessary to set it to a smaller value
p = dice_plot(
  data = data, 
  cat_a = "CellType", 
  cat_b = "Pathway", 
  cat_c = "PathologyVariable", 
  group = "Group",
  group_alpha = 0.6,
  title = "Dice Plot with 6 Pathology Variables",
  cat_c_colors = cat_c_colors, 
  custom_theme = theme_minimal(),
  min_dot_size = 2,
  max_dot_size = 4
)

print(p)
# simply save the plot using the ggplot functions
# ggsave("./diceplot_example.png",p,width = 8, height = 9)

Explanation:

  • Data Creation: We create a data frame that contains all combinations of cell types and pathways.
  • Assign Pathology Variables: For each combination, we randomly assign one or more pathology variables.
  • Merge Groups: We add the group information to each pathway.
  • Plotting: We directly call dice_plot to generate and display the dice plot with the specified parameters.

This code example provides a clear definition of the data and demonstrates how to create a dice plot without using a nested function.

Output

Example output

Use the dice plots in python

for using dice plots in python please refer to pyDicePlot

Documentation

For full documentation and additional examples, please refer to the documentation

  • Visualize Complex Data: Easily create plots for datasets with multiple categorical variables.
  • Customization: Customize plots with titles, labels, and themes.
  • Integration with ggplot2: Leverages the power of ggplot2 for advanced plotting capabilities.

Contributing

We welcome contributions from the community! If you'd like to contribute:

  1. Fork the repository on GitHub.
  2. Create a new branch for your feature or bug fix.
  3. Submit a pull request with a detailed description of your changes.

Contact

If you have any questions, suggestions, or issues, please open an issue on GitHub.

Citation

If you use this code or the R and python packages for your own work, please cite diceplot as:

M. Flotho, P. Flotho, A. Keller, “Diceplot: A package for high dimensional categorical data visualization,” arxiv, 2024. doi:10.48550/arXiv.2410.23897

BibTeX entry

@article{flotea2024,
    author = {Flotho, M. and Flotho, P. and Keller, A.},
    title = {Diceplot: A package for high dimensional categorical data visualization},
    year = {2024},
    journal = {arXiv preprint},
    doi = {https://doi.org/10.48550/arXiv.2410.23897}
}

Change-log v0.1.4

  • handling factors in diceplot
  • cluster_by_row argument defaults to TRUE, false it will use the factor levels for ordering
  • cluster_by_column argument defaults to TRUE, false it will use the factor levels for ordering
  • show_legend defaults to TRUE, show or omit the legend plot
  • cat_b_order argument removed, will throw an error in a future version

About

R package for visualising high-dimensional categorical data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages