Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 79 lines (64 sloc) 2.194 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

> {-# LANGUAGE OverloadedStrings #-}

We use Criterion to run a number of micro benchmarks that match
different regular expressions against strings.

> import Text.RegExp
> import Text.RegExp.Matching.Leftmost as Leftmost
> import Text.RegExp.Matching.Longest as Longest
> import Text.RegExp.Matching.LeftLong as LeftLong
>
> import Criterion.Main
>
> main :: IO ()
> main = defaultMain
> [ bgroup "full"
> [ bgroup mode
> [ bench name $ call re str
> | (name, re, str) <-
> [ ("phone", phone're, phone'str)
> , ("html" , html're , html'str)
> ]
> ]
> | (mode, call) <-
> [ ("accept", whnf . acceptFull)
> , ("count" , whnf . (matchingCount :: RegExp Char -> String -> Int))
> ]
> ]
> , bgroup "partial"
> [ bgroup mode
> [ bench name $ call re str
> | (name, re, str) <-
> [ ("rna", rna're, rna'str)
> ]
> ]
> | (mode, call) <-
> [ ("accept" , whnf . acceptPartial)
> , ("leftmost", whnf . Leftmost.matching)
> , ("longest" , whnf . Longest.matching)
> , ("leftlong", whnf . LeftLong.matching)
> ]
> ]
> ]

The following regular expression for phone numbers matches uniquely
against phone numbers like the one given below.

> phone're :: RegExp Char
> phone're = "[0-9]+(-[0-9]+)*"
>
> phone'str :: String
> phone'str = "0431-880-7267"

As an example for an ambiguous match we match the following regular
expression wich reminds one of HTML documents.

> html're :: RegExp Char
> html're = "(<\\w*>.*</\\w*>)*"

This expressions matches the string below in two different ways.

> html'str :: String
> html'str = "<p>some</p><p>text</p>"

To benchmark partial matchings we search for a protein sequence in an
RNA sequence. Protein sequences start with `AUG`, followed by codons
(triplets) built from the bases adenin (`A`), cytosine (`C`), guanin
(`G`), and uracil (`U`), and end with `UAG`, `UGA`, or `UAA`.

> rna're :: RegExp Char
> rna're = "AUG([ACGU][ACGU][ACGU])*(UAG|UGA|UAA)"

For example, the following RNA sequence contains the protein sequence
`AUGACACUUGAAUGA`.

> rna'str :: String
> rna'str = "UUACGGAUGACACUUGAAUGACUGA"

Something went wrong with that request. Please try again.