Skip to content

Commit

Permalink
Markdown reader: revised definition list syntax (closes #1429).
Browse files Browse the repository at this point in the history
* This change brings pandoc's definition list syntax into alignment
  with that used in PHP markdown extra and multimarkdown (with the
  exception that pandoc is more flexible about the definition markers,
  allowing tildes as well as colons).

* Lazily wrapped definitions are now allowed; blank space is required
  between list items; and the space before definition is used to
  determine whether it is a paragraph or a "plain" element.

* For backwards compatibility, a new extension,
  `compact_definition_lists`, has been added that restores the behavior
  of pandoc 1.12.x, allowing tight definition lists with no blank space
  between items, and disallowing lazy wrapping.
  • Loading branch information
jgm committed Jul 20, 2014
1 parent cdc4ecb commit 4af8eed
Show file tree
Hide file tree
Showing 5 changed files with 148 additions and 36 deletions.
68 changes: 55 additions & 13 deletions README
Expand Up @@ -1437,8 +1437,8 @@ If default list markers are desired, use `#.`:

**Extension: `definition_lists`**

Pandoc supports definition lists, using a syntax inspired by
[PHP Markdown Extra] and [reStructuredText]:[^3]
Pandoc supports definition lists, using the syntax of
[PHP Markdown Extra] with some extensions.[^3]

Term 1

Expand All @@ -1455,25 +1455,41 @@ Pandoc supports definition lists, using a syntax inspired by
Each term must fit on one line, which may optionally be followed by
a blank line, and must be followed by one or more definitions.
A definition begins with a colon or tilde, which may be indented one
or two spaces. The body of the definition (including the first line,
aside from the colon or tilde) should be indented four spaces. A term may have
multiple definitions, and each definition may consist of one or more block
elements (paragraph, code block, list, etc.), each indented four spaces or one
tab stop.

If you leave space after the definition (as in the example above),
the blocks of the definitions will be considered paragraphs. In some
or two spaces.

A term may have multiple definitions, and each definition may consist of one or
more block elements (paragraph, code block, list, etc.), each indented four
spaces or one tab stop. The body of the definition (including the first line,
aside from the colon or tilde) should be indented four spaces. However,
as with other markdown lists, you can "lazily" omit indentation except
at the beginning of a paragraph or other block element:

Term 1

: Definition
with lazy continuation.

Second paragraph of the definition.

If you leave space before the definition (as in the example above),
the text of the definition will be treated as a paragraph. In some
output formats, this will mean greater spacing between term/definition
pairs. For a compact definition list, do not leave space between the
definition and the next term:
pairs. For a more compact definition list, omit the space before the
definition:

Term 1
~ Definition 1

Term 2
~ Definition 2a
~ Definition 2b

[^3]: I have also been influenced by the suggestions of [David Wheeler](http://www.justatheory.com/computers/markup/modest-markdown-proposal.html).
Note that space between items in a definition list is required.
(A variant that loosens this requirement, but disallows "lazy"
hard wrapping, can be activated with `compact_definition_lists`: see
[Non-pandoc extensions](#non-pandoc-extensions), below.)

[^3]: I have been influenced by the suggestions of [David Wheeler](http://www.justatheory.com/computers/markup/modest-markdown-proposal.html).

[PHP Markdown Extra]: http://www.michelf.com/projects/php-markdown/extra/

Expand Down Expand Up @@ -2629,6 +2645,32 @@ these, so they are presently just ignored.
Parses multimarkdown style header identifiers (in square brackets,
after the header but before any trailing `#`s in an ATX header).

**Extension: `compact_definition_lists`**\
Activates the definition list syntax of pandoc 1.12.x and earlier.
This syntax differs from the one described [above](#definition-lists)
in several respects:

- No blank line is required between consecutive items of the
definition list.
- To get a "tight" or "compact" list, omit space between consecutive
items; the space between a term and its definition does not affect
anything.
- Lazy wrapping of paragraphs is not allowed: the entire definition must
be indented four spaces.[^6]

[^6]: To see why laziness is incompatible with relaxing the requirement
of a blank line between items, consider the following example:

bar
: definition
foo
: definition

Is this a single list item with two definitions of "bar," the first of
which is lazily wrapped, or two list items? To remove the ambiguity
we must either disallow lazy wrapping or require a blank line between
list items.

Markdown variants
-----------------

Expand Down
2 changes: 2 additions & 0 deletions src/Text/Pandoc/Options.hs
Expand Up @@ -84,6 +84,8 @@ data Extension =
| Ext_lists_without_preceding_blankline -- ^ Allow lists without preceding blank
| Ext_startnum -- ^ Make start number of ordered list significant
| Ext_definition_lists -- ^ Definition lists as in pandoc, mmd, php
| Ext_compact_definition_lists -- ^ Definition lists without
-- space between items, and disallow laziness
| Ext_example_lists -- ^ Markdown-style numbered examples
| Ext_all_symbols_escapable -- ^ Make all non-alphanumerics escapable
| Ext_intraword_underscores -- ^ Treat underscore inside word as literal
Expand Down
61 changes: 38 additions & 23 deletions src/Text/Pandoc/Readers/Markdown.hs
Expand Up @@ -846,38 +846,53 @@ defListMarker = do
else mzero
return ()

definitionListItem :: MarkdownParser (F (Inlines, [Blocks]))
definitionListItem = try $ do
-- first, see if this has any chance of being a definition list:
lookAhead (anyLine >> optional blankline >> defListMarker)
term <- trimInlinesF . mconcat <$> manyTill inline newline
optional blankline
raw <- many1 defRawBlock
state <- getState
let oldContext = stateParserContext state
-- parse the extracted block, which may contain various block elements:
definitionListItem :: Bool -> MarkdownParser (F (Inlines, [Blocks]))
definitionListItem compact = try $ do
rawLine' <- anyLine
raw <- many1 $ defRawBlock compact
term <- parseFromString (trimInlinesF . mconcat <$> many inline) rawLine'
contents <- mapM (parseFromString parseBlocks) raw
updateState (\st -> st {stateParserContext = oldContext})
optional blanklines
return $ liftM2 (,) term (sequence contents)

defRawBlock :: MarkdownParser String
defRawBlock = try $ do
defRawBlock :: Bool -> MarkdownParser String
defRawBlock compact = try $ do
hasBlank <- option False $ blankline >> return True
defListMarker
firstline <- anyLine
rawlines <- many (notFollowedBy blankline >> indentSpaces >> anyLine)
trailing <- option "" blanklines
cont <- liftM concat $ many $ do
lns <- many1 $ notFollowedBy blankline >> indentSpaces >> anyLine
trl <- option "" blanklines
return $ unlines lns ++ trl
return $ firstline ++ "\n" ++ unlines rawlines ++ trailing ++ cont
let dline = try
( do notFollowedBy blankline
if compact -- laziness not compatible with compact
then () <$ indentSpaces
else (() <$ indentSpaces)
<|> notFollowedBy defListMarker
anyLine )
rawlines <- many dline
cont <- liftM concat $ many $ try $ do
trailing <- option "" blanklines
ln <- indentSpaces >> notFollowedBy blankline >> anyLine
lns <- many dline
return $ trailing ++ unlines (ln:lns)
return $ trimr (firstline ++ "\n" ++ unlines rawlines ++ cont) ++
if hasBlank || not (null cont) then "\n\n" else ""

definitionList :: MarkdownParser (F Blocks)
definitionList = do
guardEnabled Ext_definition_lists
items <- fmap sequence $ many1 definitionListItem
definitionList = try $ do
lookAhead (anyLine >> optional blankline >> defListMarker)
compactDefinitionList <|> normalDefinitionList

compactDefinitionList :: MarkdownParser (F Blocks)
compactDefinitionList = do
guardEnabled Ext_compact_definition_lists
items <- fmap sequence $ many1 $ definitionListItem True
return $ B.definitionList <$> fmap compactify'DL items

normalDefinitionList :: MarkdownParser (F Blocks)
normalDefinitionList = do
guardEnabled Ext_definition_lists
items <- fmap sequence $ many1 $ definitionListItem False
return $ B.definitionList <$> items

--
-- paragraph block
--
Expand Down
41 changes: 41 additions & 0 deletions tests/Tests/Readers/Markdown.hs
Expand Up @@ -16,6 +16,10 @@ markdown = readMarkdown def
markdownSmart :: String -> Pandoc
markdownSmart = readMarkdown def { readerSmart = True }

markdownCDL :: String -> Pandoc
markdownCDL = readMarkdown def { readerExtensions = Set.insert
Ext_compact_definition_lists $ readerExtensions def }

infix 4 =:
(=:) :: ToString c
=> String -> (String, c) -> Test
Expand Down Expand Up @@ -222,6 +226,43 @@ tests = [ testGroup "inline code"
-- , testGroup "round trip"
-- [ property "p_markdown_round_trip" p_markdown_round_trip
-- ]
, testGroup "definition lists"
[ "no blank space" =:
"foo1\n : bar\n\nfoo2\n : bar2\n : bar3\n" =?>
definitionList [ (text "foo1", [plain (text "bar")])
, (text "foo2", [plain (text "bar2"),
plain (text "bar3")])
]
, "blank space before first def" =:
"foo1\n\n : bar\n\nfoo2\n\n : bar2\n : bar3\n" =?>
definitionList [ (text "foo1", [para (text "bar")])
, (text "foo2", [para (text "bar2"),
plain (text "bar3")])
]
, "blank space before second def" =:
"foo1\n : bar\n\nfoo2\n : bar2\n\n : bar3\n" =?>
definitionList [ (text "foo1", [plain (text "bar")])
, (text "foo2", [plain (text "bar2"),
para (text "bar3")])
]
, "laziness" =:
"foo1\n : bar\nbaz\n : bar2\n" =?>
definitionList [ (text "foo1", [plain (text "bar baz"),
plain (text "bar2")])
]
, "no blank space before first of two paragraphs" =:
"foo1\n : bar\n\n baz\n" =?>
definitionList [ (text "foo1", [para (text "bar") <>
para (text "baz")])
]
]
, testGroup "+compact_definition_lists"
[ test markdownCDL "basic compact list" $
"foo1\n: bar\n baz\nfoo2\n: bar2\n" =?>
definitionList [ (text "foo1", [plain (text "bar baz")])
, (text "foo2", [plain (text "bar2")])
]
]
, testGroup "lists"
[ "issue #1154" =:
" - <div>\n first div breaks\n </div>\n\n <button>if this button exists</button>\n\n <div>\n with this div too.\n </div>\n"
Expand Down
12 changes: 12 additions & 0 deletions tests/testsuite.txt
Expand Up @@ -270,40 +270,49 @@ Tight using spaces:

apple
: red fruit

orange
: orange fruit

banana
: yellow fruit

Tight using tabs:

apple
: red fruit

orange
: orange fruit

banana
: yellow fruit

Loose:

apple

: red fruit

orange

: orange fruit

banana

: yellow fruit

Multiple blocks with italics:

*apple*

: red fruit

contains seeds,
crisp, pleasant to taste

*orange*

: orange fruit

{ orange code block }
Expand All @@ -315,18 +324,21 @@ Multiple definitions, tight:
apple
: red fruit
: computer

orange
: orange fruit
: bank

Multiple definitions, loose:

apple

: red fruit

: computer

orange

: orange fruit

: bank
Expand Down

0 comments on commit 4af8eed

Please sign in to comment.