New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated tests. #15

Merged
merged 1 commit into from Mar 28, 2014

Conversation

Projects
None yet
2 participants
@cirosantilli
Contributor

cirosantilli commented Mar 26, 2014

This goes a long way to solve #5: now we have a script that does all the tests, all we need is find the best DOM normalizer to compare DOM trees.

Features

Install with:

sudo pip install -r requirements.txt

All engines

./run-tests.py

Sample output:

gfm       |FF     F     F   FFFF  F    F                        FFFF   FF               FF   F       F           | 104.34s  102   20  19%
kramdown  |      FFF     FF       FF       FF FFFFFFFFFFFFF                       F                              |  31.69s  102   23  22%
pandoc    |FF           F F FFFF  FFFFF    FF FFFFFFFFFFFFF     FFFF     FF            FFF FFF                  F|   0.98s  102   41  40%
redcarpet |                                                                                                      |  22.01s  102    0   0%

Extensions:

gfm       |F |   1.17s    2    1  50%
kramdown  |  |   0.62s    2    0   0%
pandoc    |F |   0.01s    2    1  50%
redcarpet |  |   0.43s    2    0   0%

Detailed diagnosis for a single engine

 ./run-tests.py pandoc

Sample output for a single error:

# extensions/pandoc/fenced-code-block
======================================================================

```
a
```

----------------------------------- output:

<html>
<body>
<pre><code>a</code></pre>
</body>
</html>

----------------------------------- expected:

<html>
<body>
<pre><code>
a
</code></pre>
</body>
</html>

======================================================================

Extras

Configurations can be done under config_local.py.

Implementation

This is currently using Beautiful Soup 4 prettify to normalize DOM outputs.

As far as I can tell there are two problems with BS4:

  • this is not what prettify is intended for, but works for most things, except one: multiple spaces and newlines vs spaces. Even so, many tests pass, and all Redcarpet tests do.
  • Some attributes added by several engines: e.g. header IDs are added by both Kramdown and Pandoc. This is harder to solve since the DOMs really are different, and I don't think BS4 has a way to strip only the right attributes (might not even be computable since some attributes like href must stay).

Still I think this is already useful on its own.

If anyone finds a better normalizer, you just have to modify one method: run-tests.dom_normalize.

Implementation Notes

  • add utf8 tests like tests/extensions/utf8.md to make sure I'm not getting encodings wrong
  • the third line of tests/blockquote-multiline-1-space-end.md ended in a UTF-8 non breakable space. If we really need that, lets make it very explicit and put on a separate tests.
@karlcow

This comment has been minimized.

karlcow commented on run-tests.py in 735a029 Mar 26, 2014

did you try to use

soup = bs4.BeautifulSoup(html, "html5lib")

You will need to install html5lib first.

@karlcow

This comment has been minimized.

karlcow commented on run-tests.py in 735a029 Mar 26, 2014

could we use logging instead of print statements? That would be great.

If you need guidance tell me, or also check this good article

@karlcow

This comment has been minimized.

karlcow commented on 735a029 Mar 26, 2014

A few comments. in the code.

@cirosantilli

This comment has been minimized.

Contributor

cirosantilli commented Mar 27, 2014

@karlcow I tried html5lib but the results were the same. I'll try asking on the Beautifulsoup forum.

I don't know if in HTML DOM multiple spaces / newlines are all equivalent in the specs and implementations.

This SO question asks that.

. stripped_strings only strips trailing and heading whitespace. I would look more at an output formatter rather than a new parser. The function formatter might work well, but I don't know if it can differentiate between attributes and text (in attributes multiple / trailing spaces matter?)


logging

About the logging, I started doing it after you told be about it, but then I noticed that there are only two kinds of things being printed:

  • stuff that I want every user to see in a fixed order without any special format (INFO to stdout?)
  • fatal errors immediately followed by an exit(1) with an ERROR: prefix (ERROR to stderr?)

The only advantage I can see of logging in this case would be to standardize error prefix, but that could be done with a string variable. Is there something else? Or are you mostly looking ahead?

I'll definitely implement it, just trying to understand the rationale before so I do things right. =)

@cirosantilli

This comment has been minimized.

Contributor

cirosantilli commented Mar 27, 2014

@karlcow

This comment has been minimized.

Owner

karlcow commented Mar 28, 2014

I will move the discussion to issue #5 about the DOM/html5lib

@karlcow

This comment has been minimized.

Owner

karlcow commented Mar 28, 2014

@cirosantilli I will merge this one, but for next time.

Two recommendations:

  1. Atomic commits for things which go together
  2. Separated Pull Requests. For example a pull request for test framework and a different pull request for documentation, and a different pull request for tests fixing/adding. Thanks. :)

I guess it's time to add a CONTRIBUTING.md :)

@cirosantilli

This comment has been minimized.

Contributor

cirosantilli commented Mar 28, 2014

Sorry about that, I was lazy to open 4 MRs =) Will do next time.

karlcow added a commit that referenced this pull request Mar 28, 2014

@karlcow karlcow merged commit 20c113f into karlcow:master Mar 28, 2014

@cirosantilli cirosantilli deleted the cirosantilli:add-tests branch Apr 6, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment