Skip to content
Filtering of Poison noise on a single-cell RNA-seq UMI count matrix
MATLAB C++ Python Makefile Dockerfile R Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
reproducibility
src
Dockerfile
README.md

README.md

Sanity

Sampling Noise based Inference of Transcription ActivitY : Filtering of Poison noise on a single-cell RNA-seq UMI count matrix

Sanity infers the log expression levels xgc of gene g in cell c by filtering out the Poisson noise on the UMI count matrix ngc of gene g in cell c.

See our preprint for more details.

Reproducibility

The raw and normalized datasets mentionned in the preprint are available on DOI. Files are named [dataset name]_UMI_counts.txt.gz and [dataset name]_[tool name]_normalization.txt.gz.

The scripts used for running the bechmarked normalization methods and for making the figures of the preprint are in the reproducibility folder.

Input

  • Count Matrix: (Ng x Nc) matrix with Ng the number of genes and Nc the number of cells. Format: tab-separated, comma-separated, or space-separated values. ('path/to/text_file')
GeneID Cell 1 Cell 2 Cell 3 ...
Gene 1 1.0 3.0 0.0
Gene 2 2.0 6.0 1.0
...
  • (Alternatively) Matrix Market File Format: Sparse matrix of UMI counts. Automatically recognized by .mtx extension of the input file. Named matrix.mtx by cellranger 2.1.0 and 3.1.0 (10x Genomics). ('path/to/text_file.mtx')
    • (optional) Gene ID file: Named genes.tsv by cellranger 2.1.0 and features.tsv by cellranger 3.1.0 (10x Genomics). ('path/to/text_file')
    • (optional) Cell ID file: Named barcodes.tsv by cellranger 2.1.0 and 3.1.0 (10x Genomics). ('path/to/text_file')
  • (optional) Destination folder ('path/to/output/folder')
  • (optional) Number of threads (integer)
  • (optional) Print extended output (Boolean, 'true', 'false', '1' or '0')
  • (optional) Minimal and maximal considered values of the variance in log transcription quotients (double)
  • (optional) Number of bins for the variance in log transcription quotients (integer)

Output

  • log_transcription_quotients.txt: (Ng x Nc) table of inferred log expression levels. The gene expression levels are normalized to 1, meaning that the summed expression of all genes in a cell is approximately 1. Note that we use the natural logarithm, so to change the normalization one should multiply the exponential of the expression by the wanted normalization (e.g. mean or median number of captured gene per cell).

    GeneID Cell 1 Cell 2 Cell 3 ...
    Gene 1 0.25 -0.29 -0.54
    Gene 2 -0.045 -0.065 0.11
    ...
  • ltq_error_bars.txt : (Ng x Nc) table of error bars on inferred log expression levels

    GeneID Cell 1 Cell 2 Cell 3 ...
    Gene 1 0.015 0.029 0.042
    Gene 2 0.0004 0.0051 0.0031
    ...

Extended output (optional)

  • mu.txt : (Ng x 1) vector of inferred mean log expression levels

  • d_mu.txt : (Ng x 1) vector of inferred error bars on mean log expression levels

  • variance.txt : (Ng x 1) vector of inferred variance per gene in log expression levels

  • delta.txt : (Ng x Nc) matrix of inferred log expression levels centered in 0

  • d_delta.txt : (Ng x Nc) matrix of inferred error bars log expression levels centered in 0

  • likelihood.txt : (Ng+1 x Nb) matrix of normalized variance likelihood per gene, with Nb the number of bins on the variance.

    Variance 0.01 0.0107 0.0114 ...
    Gene 1 0.018 0.019 0.020
    Gene 2 0.0006 0.0051 0.0031
    ...

Usage

  ./Sanity <option(s)> SOURCES
  Options:
	-h,--help		Show this help message
	-v,--version		Show the current version
	-f,--file		Specify the input transcript count text file (.mtx for Matrix Market File Format)
	-mtx_genes,--mtx_gene_name_file	Specify the gene name text file (only needed if .mtx input file)
        -mtx_cells,--mtx_cell_name_file	Specify the cell name text file (only needed if .mtx input file)
	-d,--destination	Specify the destination path (default: pwd)
	-n,--n_threads		Specify the number of threads to be used (default: 4)
	-e,--extended_output	Option to print extended output (default: false)
	-vmin,--variance_min	Minimal value of variance in log transcription quotient (default: 0.01)
	-vmax,--variance_max	Maximal value of variance in log transcription quotient (default: 20)
	-nbin,--number_of_bins	Number of bins for the variance in log transcription quotient  (default: 116)

Installation

  • Clone the GitHub repository
git clone https://github.com/jmbreda/Sanity.git
  • Install OpenMP library

    • On Linux
      If not already installed (Check with ldconfig -p | grep libgomp, no output if not installed), do
     sudo apt-get update
     sudo apt-get install libgomp1
    
    • On mac OS using macports
      Install the gcc9 package
     port install gcc9
    

           Change the first line of src/Makefile from CC=g++ to CC=g++-mp-9

    • On mac OS using brew
      Install the gcc9 package
     brew install gcc9
    

           Change the first line of src/Makefile from CC=g++ to CC=g++-9

  • Move to the source code directory and compile. The compilation takes a few seconds.

cd Sanity/src
make
  • The binary file is located in
Sanity/bin/Sanity
  • Alternatively, the already compiled binary for macOS is located in
Sanity/bin/Sanity_macOS

Help

For any questions or assistance regarding Sanity, please post your question the issues section or contact us at jeremie.breda@unibas.ch

You can’t perform that action at this time.