Skip to content

Comparative genome analysis using substring-free sample-specific strings (SFS)


Notifications You must be signed in to change notification settings


Repository files navigation

C/C++ CI

Sample-specific string detection from accurate long reads

Efficient computation of A-specific string w.r.t. a set {B,C,...,Z} of other long reads samples. A A-specific string is a string which occur only in sample A and not in the others.

Note: This repository is now depracated and maintained for historical reasons only. Please use SVDSS instead.

  • compute strings specific to child w.r.t. parents
  • compute strings specific to individual A from population PA w.r.t. individual B from population PB


C++11-compliant compiler (GCC 8.2 or newer), ropebwt2 and htslib. For convenience, ropebwt2 and htslib are included in the repository.

Download and Installation

git clone --recursive
cd PingPong 
cd ropebwt2 ; make ; cd ..
cd htslib ; make ; cd ..

You can now run PingPong by adding the clone directory to PATH. Because the package uses an internal clone of htslib, the shared objects will be in non-standard locations and have to be manually specified before running:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/clone/dir/htslib


Let's assume we have 3 samples A, B, and, C. To compute A-specific strings we have to:

  1. Index samples B and C:
./PingPong index --binary --fastq /path/to/sample/B --index B.index.bin
./PingPong index --append B.index.bin --fastq /path/to/sample/C --index BC.index.fmd
  1. Search for A-specific strings in the index
./PingPong search --index [B.index.bin] --fastq /path/to/sample/A --threads [nthreads]

The algorithm will output multiple files named solutions_batch_<i>.sfs with the list of A-specific strings. Each string is defined in terms of:

  • identifier of the read it comes from (a * means "same identifier as previous SFS")
  • sequence
  • starting position on the read
  • length
  • number of occurrences (we note that from this first pass, this number is always set to 1)
  1. Convert the n .sfs files to FASTQ (output to stdout):
./PingPong convert --batches n > /path/to/all-sfs.fq

PingPong Algorithm Usage

Usage: PingPong index [--binary] [--append /path/to/binary/index] --fastq /path/to/fastq --index /path/to/output/index

Optional arguments:
    -b, --binary          output index in binary format
    -a, --append          append to existing index (must be stored in binary)

Usage: PingPong search --index /path/to/index/file --fastq /path/to/fastq [--threads threads]

Optional arguments:
    --workdir             create output files in this directory (default:.)
    --overlap -1/0        run the exact algorithm (-1) or the relaxed one (0) (default:0)
    -t, --threads         number of threads (default:4)

Usage: PingPong convert --batches num_sfs_files

Optional arguments:
    --workdir             create output files in this directory (default:.)
  • To append (-a) to an existing index, the existing index must be stored in binary format (-b option)
  • An index built with --binary cannot be queried. Use --binary only for indices that are meant to be later appended to.
  • The output file iscreated in the current directory (if --workdir is not set)
  • Even when indexing a FASTA file, pass it with the --fastq option.


./PingPong index --binary --fastq example/father.fq --index example/father.fq.bin
./PingPong index --append example/father.fq.bin --fastq example/mother.fq --index example/index.fmd
./PingPong search --index example/index.fmd --fastq example/child.fq --overlap -1 --workdir example --threads 1

This will output strings that are specific to child.fq in example/solution_batch_0.sfs. To convert it to .fq, run:

./PingPong convert --workdir example --batches 1 > example/child-sfs.fq


For inquiries on this software please open an issue or contact either Parsoa Khorsand or Luca Denti.


PingPong is now published in Bioinformatics Advances.


Comparative genome analysis using substring-free sample-specific strings (SFS)