Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve output normalization with custom parser. #31
I have come up with a normalization that seems to overcome all trivial errors.
The details are well explained in the doctests for
This uses Python's HTML parser, and formats the output in a way that is useful for our use case. No more BeautifulSoup dependencies.
Built on top of #30, please only consider the last commit of this PR.
The number of errors falls from to:
@karlcow I'm sorry, but: