Skip to content

Commit

Permalink
RST reader: remove support for nested inlines, closes jgm#4581
Browse files Browse the repository at this point in the history
  • Loading branch information
danse committed Apr 23, 2018
1 parent dab3330 commit 5c45bba
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 9 deletions.
23 changes: 14 additions & 9 deletions src/Text/Pandoc/Readers/RST.hs
Expand Up @@ -1308,19 +1308,24 @@ table = gridTable False <|> simpleTable False <|>

inline :: PandocMonad m => RSTParser m Inlines
inline = choice [ note -- can start with whitespace, so try before ws
, whitespace
, link
, str
, endline
, strong
, emph
, code
, subst
, interpretedRole
, smart
, hyphens
, escapedChar
, symbol ] <?> "inline"
, inlineContent ] <?> "inline"

-- strings, spaces and other characters that can appear either by
-- themselves or within inline markup
inlineContent :: PandocMonad m => RSTParser m Inlines
inlineContent = choice [ whitespace
, str
, smart
, hyphens
, escapedChar
, symbol ] <?> "inline content"

parseInlineFromString :: PandocMonad m => String -> RSTParser m Inlines
parseInlineFromString = parseFromString' (trimInlines . mconcat <$> many inline)
Expand Down Expand Up @@ -1363,11 +1368,11 @@ atStart p = do

emph :: PandocMonad m => RSTParser m Inlines
emph = B.emph . trimInlines . mconcat <$>
enclosed (atStart $ char '*') (char '*') inline
enclosed (atStart $ char '*') (char '*') inlineContent

strong :: PandocMonad m => RSTParser m Inlines
strong = B.strong . trimInlines . mconcat <$>
enclosed (atStart $ string "**") (try $ string "**") inline
enclosed (atStart $ string "**") (try $ string "**") inlineContent

-- Note, this doesn't precisely implement the complex rule in
-- http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#inline-markup-recognition-rules
Expand Down Expand Up @@ -1475,7 +1480,7 @@ explicitLink = try $ do
char '`'
notFollowedBy (char '`') -- `` marks start of inline code
label' <- trimInlines . mconcat <$>
manyTill (notFollowedBy (char '`') >> inline) (char '<')
manyTill (notFollowedBy (char '`') >> inlineContent) (char '<')
src <- trim <$> manyTill (noneOf ">\n") (char '>')
skipSpaces
string "`_"
Expand Down
8 changes: 8 additions & 0 deletions test/Tests/Readers/RST.hs
Expand Up @@ -188,4 +188,12 @@ tests = [ "line block with blank line" =:
] =?>
para ("foo" <> note (para "bar"))
]
, testGroup "inlines"
[ "links can contain an URI without being parsed twice" =:
"`http://loc <http://loc>`__" =?>
para (link "http://loc" "" "http://loc")
, "inline markup cannot be nested" =:
"**a*b*c**" =?>
para (strong "a*b*c")
]
]

4 comments on commit 5c45bba

@jgm
Copy link

@jgm jgm commented on 5c45bba Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that moving the str and whitespace parsers down in the list will have a significant performance impact: have you measured this?
If it is significant enough, it might be a good idea not to reuse inlineContent in inline, as you do above.

@danse
Copy link
Member Author

@danse danse commented on 5c45bba Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot spot any performance loss due to this change. The following
mean times are for the RST writer on the current master branch
(a2816cc):

mean                 8.185 ms   (8.028 ms .. 8.455 ms)
mean                 8.121 ms   (8.016 ms .. 8.299 ms)
mean                 8.172 ms   (8.025 ms .. 8.408 ms)
mean                 8.710 ms   (8.532 ms .. 8.916 ms)

The following times are for branch italia/4581 which i just rebased
on master:

mean                 8.371 ms   (8.265 ms .. 8.569 ms)
mean                 8.443 ms   (8.322 ms .. 8.665 ms)
mean                 8.072 ms   (7.948 ms .. 8.182 ms)
mean                 8.130 ms   (8.039 ms .. 8.239 ms)

Anyway you make a good point, it would be better to run shorter parsers before the longer ones. Would it work if, in the expression for inline, we moved inlineContent up to the second position after note?

@jgm
Copy link

@jgm jgm commented on 5c45bba Apr 27, 2018 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danse
Copy link
Member Author

@danse danse commented on 5c45bba Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe we lose some performance points but i find that the code is less error-prone and more readable this way. It makes sense to first try to parse potential wrappers, and then parse for plain content. So is there anything we still want to change in this pull request?

Please sign in to comment.