GitHub

LCSLCIS

An R library to compute the Longuest Common Subsequence between 2 lists of items.

Abstract

Genetic maps order genetic markers along chromosomes. They are, for instance, extensively used in marker-assisted selection to accelerate breeding program. Even for the same species, people often have to deal with several alternative maps obtained using différent ordering methods or différent dataset, e.g. resulting from différent segregating populations. Having efficient tools to identify the consistency and discrepancy of alternative maps is thus essential to facilitate genetic map comparisons.
We propose to encode genetic maps by bucket order, a kind of order, which takes into account the blur parts of the marker order while being an efficient data structure to achieve low complexity algorithms. We identify the Longest Common Subsequences of bucket orders to produce a strict consensus genetic map. The main result of this paper is an O(nlog(n)) procedure to identify the largest agreements between two bucket orders of n elements, hence providing an efficient solution to highlight discrepancies between two genetic maps.

Installation

The LCSLCIS library is not on CRAN. However you can install it from this github repository:

library(devtools) 
install_github("holtzy/LCSLCIS")
library(LCSLCIS)

Basic example

We represent marker order along a chromosome by two lists, the first list provides the marker names and the second their position along the chromosome. Let’s consider two marker orders o1 and o2

o1 <- data.frame( 
    V1=c("g","h","c","f","e","d","m","q","r","a","b","n","o","p"), 
    V2=c(1.1,1.1,3.4,3.4,3.4,3.4,3.5,3.5,6.6,6.6,7.2,7.2,8,8)
    )
o2 <- data.frame( 
    V1=c("k","a","b","l","c","e","d","f","i","j","h","g"), 
    V2=c(0.1,1.2,1.2,4.2,4.2,5,5,5,5.3,5.3,6.7,6.7)
    )

The LCSLCIS package provides a show_connection function that allows to visualize the relationship between these 2 lists:

library(tidyverse)
library(ggrepel)
show_connection(o1, o2)

We can find the Longest Common Subsequences (LCS) of bucket orders to produce a strict consensus list. This is done running the LCS function of the package:

res <- LCS(o1,o2)

The LCS function returns a list. Several outputs are of interest:

res$LLCS is the length of the returned LCS

res$LLCS

## [1] 4

res$LCS provides the items of the LCS:

res$LCS

## [1] "c" "d" "e" "f"

It is straigth forward to plot only the items of the LCS:

show_connection(o1, o2, tokeep=res$LCS)

You can also draw items that are part of the LCS in grey, and other in red:

show_connection(o1, o2, highlight =res$LCS)

Applying to genetic maps

The LCSLCIS library provides 2 genetic maps of wheat (format chromosome, marker name, position) as toy example. Load THEM using:

data(geneticMap)

You can have a look to this 2 genetic maps doing:

head(map1)
head(map2)

We need to reformat these maps to have a valid input for LCSLCIS: 2 columns called V1 (item name) and V2 (item position)

o1 <- map1 %>% filter(V0=="1A") %>% select(V1, V2) %>% arrange(V2)
o2 <- map2 %>% filter(V0=="1A") %>% select(V1, V2) %>% arrange(V2)

Let's have a look to the common items relationship:

show_connection(o1, o2, showName=FALSE)

Let's run LCIS to keep THE LARGEST subset of markers with no discrepencies:

res <- LCIS(o1,o2)
show_connection(o1, o2, tokeep=res$LCIS, showName=FALSE)

A note on speed

We compared the LCSLCSI packages with the LCS function of the qualV package that performs a similar job. Computation time was compared for several list lengths (see script). LCSLCSI is faster than qualV as described on the chart below:

Citing

If you find LCSLCSI useful, please cite: Efficient algorithms for Longest Common Subsequence of two bucket orders to speed up pairwise genetic map comparison. L. De Mattéo, Y. Holtz, V. Ranwez and S Bérard. PLoS ONE (2018) 13(12): e0208838. online version

Authors

Lisa De Mattéo: linkedin
Vincent Ranwez: homepage | vincent.ranwez@supagro.fr | corresponding author
Sèverine Bérard: homepage | Severine.Berard@umontpellier.fr

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
R		R
Readme_files/figure-markdown_github		Readme_files/figure-markdown_github
data		data
man		man
speed_test		speed_test
src		src
.Rhistory		.Rhistory
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
Readme.Rmd		Readme.Rmd
Readme.md		Readme.md
note_pkg_creation.txt		note_pkg_creation.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LCSLCIS

Abstract

Installation

Basic example

Applying to genetic maps

A note on speed

Citing

Authors

About

Releases

Packages

Contributors 2

Languages

holtzy/LCSLCIS

Folders and files

Latest commit

History

Repository files navigation

LCSLCIS

Abstract

Installation

Basic example

Applying to genetic maps

A note on speed

Citing

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages