I discovered this while trying to fix some parsing errors for pages with charsets other than UTF-8.
Floki allows changing the underlying HTML parser, which you might want to do because e.g. if you want a faster parsing speed. However, selecting htlm5ever parser break things:
# in config/config.exs
config :floki, :html_parser, Floki.HTMLParser.Html5ever
Now summarize is broken:
iex(1)> Readability.summarize("https://medium.com/@kenmazaika/why-im-betting-on-elixir-7c8f847b58")
** (FunctionClauseError) no function clause matching in Readability.Helper.remove_tag/2
The following arguments were given to Readability.Helper.remove_tag/2:
# 1
{:doctype, "html", "", ""}
# 2
#Function<0.45730907/1 in Readability.Helper.normalize/2>
Attempted function clauses (showing 4 out of 4):
def remove_tag(content, _) when is_binary(content)
def remove_tag([], _)
def remove_tag([h | t], fun)
def remove_tag({tag, attrs, inner_tree} = html_tree, fun)
(readability 0.12.1) lib/readability/helper.ex:62: Readability.Helper.remove_tag/2
(readability 0.12.1) lib/readability/helper.ex:66: Readability.Helper.remove_tag/2
(readability 0.12.1) lib/readability.ex:92: Readability.summarize/2
iex:1: (file)
I discovered this while trying to fix some parsing errors for pages with charsets other than UTF-8.
Floki allows changing the underlying HTML parser, which you might want to do because e.g. if you want a faster parsing speed. However, selecting
htlm5everparser break things:Now
summarizeis broken: