Skip to content

Commit

Permalink
Create a GFM (Github Flavored Markdown) parser
Browse files Browse the repository at this point in the history
As a first step, it supports "fenced code blocks" delimited by three or more
backticks.
  • Loading branch information
plexus committed Jul 22, 2013
1 parent 6cb8349 commit 5a02866
Show file tree
Hide file tree
Showing 6 changed files with 119 additions and 0 deletions.
12 changes: 12 additions & 0 deletions doc/parser/gfm.page
@@ -0,0 +1,12 @@
---
title: GFM Parser
---
# GFM Parser

## Introduction

Parse ["Github Flavored Markdown"](https://help.github.com/articles/github-flavored-markdown). This is a format of Markdown that is used on Github.com for most places where textual input is required, such as issues and comments. Some of the extensions, notably the "backtick fenced code blocks" are also used on other sites, for example StackOverflow.

## Conformance

At the moment this parser is identical to the Kramdown parser, except that it adds support for fenced code blocks using three or more backticks to delimit the block.
1 change: 1 addition & 0 deletions lib/kramdown/parser.rb
Expand Up @@ -20,6 +20,7 @@ module Parser
autoload :Kramdown, 'kramdown/parser/kramdown'
autoload :Html, 'kramdown/parser/html'
autoload :Markdown, 'kramdown/parser/markdown'
autoload :GFM, 'kramdown/parser/gfm'

end

Expand Down
31 changes: 31 additions & 0 deletions lib/kramdown/parser/gfm.rb
@@ -0,0 +1,31 @@
module Kramdown
module Parser
class GFM < Kramdown::Parser::Kramdown

def initialize(source, options)
super
@block_parsers.unshift(:gfm_codeblock_fenced)
end

GFM_FENCED_CODEBLOCK_START = /^`{3,}/
#GFM_FENCED_CODEBLOCK_MATCH = /^(`{3,})\s*?(\w+)?\s*?\n(.*?)^\1`*\s*?\n/m
GFM_FENCED_CODEBLOCK_MATCH = /^`{3,} *(\w+)?\s*?\n(.*?)^`{3,}\s*?\n/m

# Parse the fenced codeblock at the current location.
def parse_gfm_codeblock_fenced
if @src.check(GFM_FENCED_CODEBLOCK_MATCH)
@src.pos += @src.matched_size
el = new_block_el(:codeblock, @src[2])
lang = @src[1].to_s.strip
el.attr['class'] = "language-#{lang}" unless lang.empty?
@tree.children << el
true
else
false
end
end
define_parser(:gfm_codeblock_fenced, GFM_FENCED_CODEBLOCK_START)

end
end
end
29 changes: 29 additions & 0 deletions test/test_files.rb
Expand Up @@ -17,6 +17,7 @@
class TestFiles < Test::Unit::TestCase

EXCLUDE_KD_FILES = [('test/testcases/block/04_header/with_auto_ids.text' if RUBY_VERSION <= '1.8.6'), # bc of dep stringex not working
'test/testcases/block/06_codeblock/backticks-syntax.text'
].compact

# Generate test methods for kramdown-to-xxx conversion
Expand Down Expand Up @@ -48,6 +49,7 @@ class TestFiles < Test::Unit::TestCase
'test/testcases/span/03_codespan/highlighting.html', # bc of span elements inside code element
'test/testcases/block/04_header/with_auto_ids.html', # bc of auto_ids=true option
'test/testcases/block/04_header/header_type_offset.html', # bc of header_offset option
'test/testcases/block/06_codeblock/backticks-syntax.html', # only in GFM
]
Dir[File.dirname(__FILE__) + '/testcases/**/*.{html,html.19,htmlinput,htmlinput.19}'].each do |html_file|
next if EXCLUDE_HTML_FILES.any? {|f| html_file =~ /#{f}(\.19)?$/}
Expand Down Expand Up @@ -85,6 +87,7 @@ def tidy_output(out)
EXCLUDE_LATEX_FILES = ['test/testcases/span/01_link/image_in_a.text', # bc of image link
'test/testcases/span/01_link/imagelinks.text', # bc of image links
'test/testcases/span/04_footnote/markers.text', # bc of footnote in header
'test/testcases/block/06_codeblock/backticks-syntax.text' # only in GFM
]
Dir[File.dirname(__FILE__) + '/testcases/**/*.text'].each do |text_file|
next if EXCLUDE_LATEX_FILES.any? {|f| text_file =~ /#{f}$/}
Expand Down Expand Up @@ -121,6 +124,7 @@ def tidy_output(out)
'test/testcases/span/extension/comment.text', # bc of comment text modifications (can this be avoided?)
'test/testcases/block/04_header/header_type_offset.text', # bc of header_offset being applied twice
('test/testcases/block/04_header/with_auto_ids.text' if RUBY_VERSION <= '1.8.6'), # bc of dep stringex not working
'test/testcases/block/06_codeblock/backticks-syntax.text' # only in GFM
].compact
Dir[File.dirname(__FILE__) + '/testcases/**/*.text'].each do |text_file|
next if EXCLUDE_TEXT_FILES.any? {|f| text_file =~ /#{f}$/}
Expand Down Expand Up @@ -155,6 +159,7 @@ def tidy_output(out)
'test/testcases/block/04_header/header_type_offset.html', # bc of header_offset option
'test/testcases/block/16_toc/toc_exclude.html', # bc of different attribute ordering
'test/testcases/span/autolinks/url_links.html', # bc of quot entity being converted to char
'test/testcases/block/06_codeblock/backticks-syntax.html' # only in GFM
]
Dir[File.dirname(__FILE__) + '/testcases/**/*.{html,html.19}'].each do |html_file|
next if EXCLUDE_HTML_KD_FILES.any? {|f| html_file =~ /#{f}(\.19)?$/}
Expand All @@ -171,6 +176,30 @@ def tidy_output(out)
end
end

EXCLUDE_GFM_FILES = []

# Generate test methods for kramdown-to-gfm conversion
Dir[File.dirname(__FILE__) + '/testcases/**/*.text'].each do |text_file|
next if EXCLUDE_GFM_FILES.any? {|f| text_file =~ /#{f}$/}
html_file = text_file.sub(/\.text$/, '.html')
html_file += '.19' if RUBY_VERSION >= '1.9' && File.exist?(html_file + '.19')
basename = text_file.sub(/\.text$/, '')
next if (RUBY_VERSION >= '1.9' && File.exist?(html_file + '.19')) ||
(RUBY_VERSION < '1.9' && html_file =~ /\.19$/)
define_method('test_gfm_' + text_file.tr('.', '_') + "_to_html") do
opts_file = html_file.sub(/\.html(\.19)?$/, '.options')
opts_file = File.join(File.dirname(html_file), 'options') if !File.exist?(opts_file)
options = File.exist?(opts_file) ? YAML::load(File.read(opts_file)) : {:auto_ids => false, :footnote_nr => 1}
doc = Kramdown::Document.new(File.read(text_file), options.merge(:input => 'GFM'))
if File.read(html_file) != doc.to_html
i = rand(1000)
File.write("/tmp/#{i}_expected.html", File.read(html_file))
File.write("/tmp/#{i}_actual.html", doc.to_html)
puts "diff /tmp/#{i}_expected.html /tmp/#{i}_actual.html"
end
assert_equal(File.read(html_file), doc.to_html)
end
end


# Generate test methods for asserting that converters don't modify the document tree.
Expand Down
23 changes: 23 additions & 0 deletions test/testcases/block/06_codeblock/backticks-syntax.html
@@ -0,0 +1,23 @@
<pre><code>Three backticks
</code></pre>

<pre><code>Four backticks
</code></pre>

<pre><code>Unbalanced bottom heavy
</code></pre>

<pre><code>Unbalanced top heavy
</code></pre>

<div><div class="CodeRay">
<div class="code"><pre><span class="line-numbers"><a href="#n1" name="n1">1</a></span>language no space
</pre></div>
</div>
</div>

<div><div class="CodeRay">
<div class="code"><pre><span class="line-numbers"><a href="#n1" name="n1">1</a></span>language with space
</pre></div>
</div>
</div>
23 changes: 23 additions & 0 deletions test/testcases/block/06_codeblock/backticks-syntax.text
@@ -0,0 +1,23 @@
```
Three backticks
```

````
Four backticks
````

```
Unbalanced bottom heavy
``````

``````
Unbalanced top heavy
````

````ruby
language no space
````

```` ruby
language with space
````

0 comments on commit 5a02866

Please sign in to comment.