Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Examples of using ragel and ruby together
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
test
.gitignore
README.rdoc
Rakefile

README.rdoc

Ruby and Ragel examples

Say you have a file like

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer a
tristique lectus. Vestibulum ante ipsum primis in faucibus orci luctus et
ultrices posuere cubilia Curae; Cum sociis natoque penatibus et magnis dis
parturient montes, nascetur ridiculus mus. Cras ultrices lectus sed justo
scelerisque sit amet euismod orci bibendum. Nunc dui nibh, vulputate eget
aliquet laoreet, iacSTARTFOOThere are lots of great ideas here.ENDFOOulis
a lorem. Integer interdum, dolor aliquam accumsan eleifend, nisl tortor
mollis ipsum, et semper arcu mi nec felis. Nunc scelerisque cursus dolor
eu tristique. Mauris porta pulvinar dolor. Integer egestas lacinia leo, ut
mollis sapien fermentum non. Maecenas ultricies nibh at justo ornare eu
ullamcorper justo aliquet. Cras id augue eget nunc auctor mattis vitae
quis massa. STARTFOOYou just have to look closelyENDFOOMauris suscipit
justo in erat scelerisque imperdiet. Nullam placerat aliquet fringilla.
Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere
cubilia Curae; Ut blandit metus nec neque sollicitudin cursus. Praesent
pulvinar elementum lorem quis consectetur. Maecenas bibendum sagittis nibh
et vehicula. Praesent non eleifend turpis. Sed imperdiet diam eget turpis
vehicula lobortis. Integer tincidunt pulvinar sem, porttitor porttitor
metus convallis nec.

You want to pull out

There are lots of great ideas here.
You just have to look closely

You can use Ragel for that.

simple_scanner.rl vs. simple_tokenizer.rl

Two examples are provided. simple_scanner.rl uses Ragel's scanner functionality (|**|) while simple_tokenizer.rl does not.

Benchmarking code styles

Most of the tests are run against multiple code styles. You can see which one is the fastest (usually -F1 because the tokenizer is so simple.)

Code style -G2 doesn't seem to work for my examples.

Running the tests

Try the tests by running

rake

You can see benchmarks with different code styles with

BENCHMARK=true rake

Example output:

$ BENCHMARK=true rake
[...]
Started
...Benchmarking rl_filename=simple_tokenizer.rl chunk_size=1000000 code_style=F1: 0.1708228588104248s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=100 code_style=F1: 2.376127004623413s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=31 code_style=F1: 7.216774940490723s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=30 code_style=F1: 7.53369402885437s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=29 code_style=F1: 7.948101997375488s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=28 code_style=F1: 8.361572742462158s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=27 code_style=F1: 8.208178997039795s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=26 code_style=F1: 8.584808826446533s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=25 code_style=F1: 9.248464345932007s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=24 code_style=F1: 10.399993181228638s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=23 code_style=F1: 10.160030841827393s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=22 code_style=F1: 12.187470197677612s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=21 code_style=F1: 14.167765140533447s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=20 code_style=F1: 13.967772006988525s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=19 code_style=F1: 13.83250904083252s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=18 code_style=F1: 13.831825256347656s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=17 code_style=F1: 14.376168966293335s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=16 code_style=F1: 15.051374912261963s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=15 code_style=F1: 15.7113778591156s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=14 code_style=F1: 17.260392904281616s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=13 code_style=F1: 19.219007968902588s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=12 code_style=F1: 22.87133479118347s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=11 code_style=F1: 23.93314814567566s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=10 code_style=F1: 25.299525022506714s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=9 code_style=F1: 27.885208129882812s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=8 code_style=F1: 31.59194016456604s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=7 code_style=F1: 38.41472101211548s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=6 code_style=F1: 45.09998416900635s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=5 code_style=F1: 52.02148509025574s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=4 code_style=F1: 64.59367489814758s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=3 code_style=F1: 86.717942237854s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=2 code_style=F1: 127.80321931838989s
.Benchmarking rl_filename=simple_tokenizer.rl chunk_size=1 code_style=F1: 263.08475399017334s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=1000000 code_style=F1: 0.24325203895568848s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=100 code_style=F1: 0.2598850727081299s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=31 code_style=F1: 0.3361678123474121s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=30 code_style=F1: 0.38150882720947266s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=29 code_style=F1: 0.38400912284851074s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=28 code_style=F1: 0.37177205085754395s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=27 code_style=F1: 0.3484499454498291s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=26 code_style=F1: 0.3678452968597412s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=25 code_style=F1: 0.3777611255645752s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=24 code_style=F1: 0.37799668312072754s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=23 code_style=F1: 0.3881850242614746s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=22 code_style=F1: 0.3997948169708252s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=21 code_style=F1: 0.37964701652526855s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=20 code_style=F1: 0.41425299644470215s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=19 code_style=F1: 0.4084770679473877s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=18 code_style=F1: 0.411909818649292s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=17 code_style=F1: 0.4202427864074707s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=16 code_style=F1: 0.40560388565063477s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=15 code_style=F1: 0.44637084007263184s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=14 code_style=F1: 0.473844051361084s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=13 code_style=F1: 0.4782578945159912s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=12 code_style=F1: 0.5120618343353271s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=11 code_style=F1: 0.5282502174377441s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=10 code_style=F1: 0.528923749923706s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=9 code_style=F1: 0.5536479949951172s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=8 code_style=F1: 0.6265339851379395s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=7 code_style=F1: 0.6716248989105225s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=6 code_style=F1: 0.6908829212188721s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=5 code_style=F1: 0.818579912185669s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=4 code_style=F1: 0.9440000057220459s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=3 code_style=F1: 1.1561110019683838s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=2 code_style=F1: 1.5758142471313477s
.Benchmarking rl_filename=simple_scanner.rl chunk_size=1 code_style=F1: 2.6399059295654297s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=1000000 code_style=F1: 0.18734478950500488s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=100 code_style=F1: 2.7623708248138428s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=31 code_style=F1: 8.500878095626831s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=30 code_style=F1: 8.775735139846802s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=29 code_style=F1: 9.006378889083862s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=28 code_style=F1: 9.380748987197876s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=27 code_style=F1: 9.430941104888916s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=26 code_style=F1: 9.831702947616577s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=25 code_style=F1: 10.298200130462646s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=24 code_style=F1: 10.699872016906738s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=23 code_style=F1: 11.192219257354736s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=22 code_style=F1: 11.843315124511719s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=21 code_style=F1: 12.719645977020264s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=20 code_style=F1: 12.985512256622314s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=19 code_style=F1: 13.535040140151978s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=18 code_style=F1: 14.50395131111145s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=17 code_style=F1: 15.044336795806885s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=16 code_style=F1: 15.898829936981201s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=15 code_style=F1: 17.144059896469116s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=14 code_style=F1: 18.294903993606567s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=13 code_style=F1: 20.031293153762817s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=12 code_style=F1: 21.469314098358154s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=11 code_style=F1: 23.639351844787598s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=10 code_style=F1: 25.875664949417114s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=9 code_style=F1: 28.37715792655945s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=8 code_style=F1: 31.748258113861084s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=7 code_style=F1: 35.7734580039978s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=6 code_style=F1: 42.2883780002594s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=5 code_style=F1: 52.21942615509033s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=4 code_style=F1: 72.1146559715271s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=3 code_style=F1: 88.86984777450562s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=2 code_style=F1: 131.512836933136s
.Benchmarking rl_filename=xml_tokenizer.rl chunk_size=1 code_style=F1: 260.85131072998047s
.
Finished in 2130.170261 seconds.

102 tests, 108 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 60251

Wishlist

The filenames are not very good, but in a nutshell:

  • simple_tokenizer.rl - DONE (demonstrates ()*)

  • simple_tokenizer2.rl - NOT DONE (demonstrates ()** “Longest Match Kleene Star”)

  • simple_scanner.rl - DONE (demonstrates |**|)

  • simple_scanner2.rl - NOT DONE (demonstrates |**| with an example that needs backtracking)

Copyright

Copyright 2011 Seamus Abshere

Something went wrong with that request. Please try again.