Skip to content
Newer
Older
100644 107 lines (83 sloc) 4.07 KB
600ec82 @jgm Markdownified README
authored
1 What is this?
2 -------------
3
4 I started working on pandoc before I knew much Haskell, and before
5 there were many Haskell libraries available. In retrospect, I regret
6 some of the early design decisions. This repository is a place to
7 explore some architectural improvements.
8
58c0e78 @jgm Updated README.
authored
9 So far, there's a definition of the basic data structure, a
600ec82 @jgm Markdownified README
authored
10 builder DSL, a markdown reader, and an HTML writer. The package
58c0e78 @jgm Updated README.
authored
11 includes an executable, `pandoc2` -- `pandoc2 --help` will give
12 usage instructions. With the `--strict` flag, the program passes
13 all of the tests from the Markdown test suite.
14
15 The following pandoc markdown extensions have been implemented:
16
17 * smart typography (enable with `--smart`)
18 * delimited code blocks
b6184d2 @jgm Updated README.
authored
19 * markdown inside HTML block-level tags
58c0e78 @jgm Updated README.
authored
20 * TeX math
21 * footnotes
22 * fancy list markers
3b506eb @jgm Updated README.
authored
23 * automatic header identifiers
24 * superscripts
25 * subscripts
26 * strikeout
27 * definition lists
58c0e78 @jgm Updated README.
authored
28
29 There are a few changes in how lists work. The most important is
30 that changes in style now trigger a new list. The following is one
31 list in pandoc, and two lists in pandoc2:
32
33 + one
34 + two
35
36 - three
37 - four
600ec82 @jgm Markdownified README
authored
38
39 Some differences from pandoc 1
40 ------------------------------
41
42 * We now use `Sequence`s of `Inline` and `Block` elements instead of lists.
43 This makes sense for text, since appending to the end of a `Sequence`
44 is computationally cheap. These sequences are wrapped in newtypes, `Inlines`
45 and `Blocks`. Thus, the `Emph` constructor now has the type
46 `Inlines -> Inline` rather than `[Inline] -> Inline`.
47 `mappend` is defined for `Inlines` in a way that builds in normalization:
48 so, for example, if you append an `Inlines` that begins with a space onto an
49 `Inlines` that ends with a space, there will only be one space. Similarly,
50 adjacent `Emph` `Inline`s will be merged, and so on.
51
52 * The individual inline and block parsers return an `Inlines` or `Blocks`
53 instead of an `Inline` or `Block`; this allows them to return nothing, or
54 multiple elements, where before we had to return a single elements. (So,
55 for example, `pReference` can return `mempty` instead of a `Null` block.)
56
57 * `Text` is used throughout instead of `String`.
58
a3c51b5 @jgm Updated README.
authored
59 * The input text is tokenized, and the tokens fed to the parser. This
60 makes the parsers simpler in some cases (especially in handling
61 line endings) and seems to boost performance. Tabs are converted in the
62 tokenization phase.
63
600ec82 @jgm Markdownified README
authored
64 * IO actions are now possible in the parsers. This should make it
a3c51b5 @jgm Updated README.
authored
65 possible to handle things like LaTeX `\include`. But it is also
66 possible for the user to run the parsers in a pure Monad.
67 (See the `PMonad` class.)
600ec82 @jgm Markdownified README
authored
68
69 * It is also now easy to issue warnings and informational messages
70 during parsing, to alert the user if information is being lost,
71 for example.
72
73 * The old markdown parser made two passes--one to get a list of
74 references, and then again to parse the document, using this
75 list of references. The new parser makes just one pass,
6e7a87f @jgm Updated README
authored
76 and fills in the references at the end.
600ec82 @jgm Markdownified README
authored
77
78 * The old parser handled embedded blocks (block quotations,
79 sublists) by first parsing out a "raw" chunk of text (omitting
80 opening `>`'s and indentation, for example), then parsing this
81 raw text using block parsers. The new parser avoids the need
82 for multiple passes by storing an "endline" and "block separator"
83 parser in state.
84
85 * The old parser required space after block elements, so that
86 newlines would generally have to be added to the input. The
87 new parser does not.
88
89 * blaze-html is now used (instead of the old xhtml package) for HTML
90 generation.
91
92 Observations
93 ------------
94
a3c51b5 @jgm Updated README.
authored
95 The code is cleaner and shorter.
600ec82 @jgm Markdownified README
authored
96
a3c51b5 @jgm Updated README.
authored
97 Performance is significantly faster than pandoc, even with the `--strict`
0975451 @jgm Replaced generics with hand-coded resolveRefs.
authored
98 flag. `resolveRefs` was made much faster by hand-coding it instead of
6e7a87f @jgm Updated README
authored
99 using generics. A further improvement was gained by removing `resolveRefs`
100 entirely, and having the parsers return functions from references to
101 values, which are then run at the end of parsing.
600ec82 @jgm Markdownified README
authored
102
103 To run the Markdown test suite, do `make test`. To run the PHP Markdown test
3b506eb @jgm Updated README.
authored
104 suite, do `make phptests`. Several of the PHP tests will fail, but in
105 these cases I disagree about what behavior is normative.
600ec82 @jgm Markdownified README
authored
106
Something went wrong with that request. Please try again.